cerebras.modelzoo.data_preparation.data_preprocessing.data_reader.split_entry_by_paragraph_or_sentence#

cerebras.modelzoo.data_preparation.data_preprocessing.data_reader.split_entry_by_paragraph_or_sentence(entry, entry_size, chunk_size)[source]#

Split a large entry into chunks by sentence or paragraph end.

Parameters
  • entry (str) – The text entry.

  • entry_size (int) – Size of the input entry.

  • chunk_size (int) – The desired chunk size.

Returns

Yields chunks of the text.

Return type

Iterator[str]