cerebras.modelzoo.data_preparation.nlp.chunk_data_processing.data_reader.split_entry_by_paragraph_or_sentence#
- cerebras.modelzoo.data_preparation.nlp.chunk_data_processing.data_reader.split_entry_by_paragraph_or_sentence(entry: str, entry_size: int, chunk_size: int) Iterator[str] [source]#
Split a large entry into chunks by sentence or paragraph end.
- Parameters
entry (str) – The text entry.
entry_size (int) – Size of the input entry.
chunk_size (int) – The desired chunk size.
- Returns
Yields chunks of the text.
- Return type
Iterator[str]