cerebras.modelzoo.data_preparation.nlp.chunk_data_processing.data_reader.split_entry_by_paragraph_or_sentence#

cerebras.modelzoo.data_preparation.nlp.chunk_data_processing.data_reader.split_entry_by_paragraph_or_sentence(entry: str, entry_size: int, chunk_size: int) Iterator[str][source]#

Split a large entry into chunks by sentence or paragraph end.

Parameters
  • entry (str) – The text entry.

  • entry_size (int) – Size of the input entry.

  • chunk_size (int) – The desired chunk size.

Returns

Yields chunks of the text.

Return type

Iterator[str]