Dataloaders for TensorFlow#
Cerebras recommends using TFRecords to create TensorFlow data loaders with optimal performance. TFRecords is primarily chosen because of the storage efficiency that it offers and can be read very fast using parallel I/O operations that the CS systems can take advantage of.
To use TF Records, you need to convert your data offline from raw data to TFRecords-supported format.
Note
Alternatively, you can create TensorFlow dataloaders that ingest other native formats, like numpy
, etc. Hoever, they may run at sub-optimal performance
Cerebras Model Zoo Dataloaders#
Cerebras Model Zoo dtaloaders extend the TfRecordsProcessor that creates datasets from pre-compiled TFRecords using the map_fn
provided in the child class. These are some examples
BertTfRecordsProcessor - Static-based data loader that creates dataset for BERT from pre-compiled TFRecords
BertMlmOnlyTfRecordsStaticMaskProcessor - Creates dataset from pre-compiled TFRecords for MLM task only; these TFRecords do not contain
segment_ids
andnext_sentence_labels
feature)BertMlmOnlyTfRecordsDynamicMaskProcessor - Reads TFRecords containing sequences of tokens, adds MLM features on the fly; resulting dataset is for MLM task only; these TFRecords do not contain
segment_ids
andnext_sentence_labels
GptTfRecordsProcessor - Creates dataset from pre-compiled TFRecords
T5DynamicDataProcessor - Dataset generator for
T5
model, it also performs on-the-fly processing of data from textDAGM2007Dataset - Creates dataset for DAGM 2007 Dataset which is
PNG
images and labels inCSV
; it also performs on-the-fly augmentationsSeverstalTFRecordsDataset - Creates dataset for Severstal Dataset, which is
PNG
images and labels inCSV
; it also performs on-the-fly augmentations
Create a custom dataLoader with TensorFlow#
To create your own dataloader keep in mind these tips:
1. Coherence between output of the dataloader and input of the neural network model. For example, if you are using one of the models from Cerebras Model Zoo, every README file of every model explains the format required for this model to run and be trained. For instance, if you are using GPT-2, you must ensure your own input function produces a features dictionary.
Cerebras supported file types. You can create your own dataset by extending one of the native dataset types. Currently, Cerebras ecosystem only supports files of types
TSV
,CSV
,TXT
, andPNG
. Other file types have not been tested.
Performance Analyzer#
An additional tool that Cerebras provides for TensorFlow is a performance analyzer called perf_input_fn
. This function takes in three arguments: input_fn
, params
, and time_stop
. You can use it to estimate the number of steps (and samples) per second for a given input function input_fn
and params.
To achieve this, you can create a python file trial.py
as shown below, where the test is running for 30 seconds, as indicated in time_stop=30
:
from cerebras.tf.tf_helper import perf_input_fn from data import input_fn from utils import get_params params = get_params(params_file) perf_input_fn(input_fn, params, time_stop=30)
Then you run csrun_cpu python trial.py
. For example, we run the train_input_fn
of the FC-MNIST model with its default parameters configs/params.yaml with this function, and we get the following output:
total steps: 19221, time: 30.00224627985225, perf: 644.3517513735975 steps/sec/worker Without counting first step, total steps: 19331, time: 25.694870948791504, perf: 752.32913364330508 steps/sec/worker total number of inputs: 2 Shapes: {features: (100, 784), labels: (100,), } (19332, 30.002246379852295, 4.307375431060791)
In this example, the following parameters are as follows:
total steps
is the number of training steps.We set the time as an argument to this function, which is the amount of time to measure the performance of the
input_fn
.
perf
is an estimated number of training steps per second per worker.It also has the same statistics without counting the first step, which is due to loading the model and its activations, which is essentially a more accurate estimate of the average performance of the
input_fn
.In Shapes:
features
: 100 is the batch size and 784 is the number of features per example
labels
: because we have 100 examples in the batch, we have 100 labelsThe three numbers in the last line are:
total steps
time
, which is the same astime_stop
Time taken for the first step