modelzoo.transformers.data_processing.scripts.chunk_preprocessing.data_reader.split_entry_by_paragraph_or_sentence#
- modelzoo.transformers.data_processing.scripts.chunk_preprocessing.data_reader.split_entry_by_paragraph_or_sentence(entry: str, entry_size: int, chunk_size: int) Iterator[str] [source]#
Split a large entry into chunks by sentence or paragraph end.
- Parameters
entry (str) – The text entry.
entry_size (int) – Size of the input entry.
chunk_size (int) – The desired chunk size.
- Returns
Yields chunks of the text.
- Return type
Iterator[str]