cerebras.modelzoo.data_preparation.nlp.chunk_data_processing.data_reader.set_doc_idx#

cerebras.modelzoo.data_preparation.nlp.chunk_data_processing.data_reader.set_doc_idx(df, file_idx, start_doc_idx, end_doc_idx) None[source]#

This is used to set metadata for a given dataframe

Parameters
  • file_idx – The file index of the current dataframe

  • start_doc_idx – The starting doc index of the current dataframe

  • end_doc_idx – The ending doc index of the current dataframe