cerebras.modelzoo.data_preparation.data_preprocessing.data_reader.split_entry_by_paragraph_or_sentence#
- cerebras.modelzoo.data_preparation.data_preprocessing.data_reader.split_entry_by_paragraph_or_sentence(entry, entry_size, chunk_size)[source]#
Split a large entry into chunks by sentence or paragraph end.
- Parameters
entry (str) – The text entry.
entry_size (int) – Size of the input entry.
chunk_size (int) – The desired chunk size.
- Returns
Yields chunks of the text.
- Return type
Iterator[str]