Optimize Input Function#

To train your TensorFlow model on the Cerebras CS system, your input function must satisfy a few requirements. These requirements exist to ensure optimal training performance on the CS system.

To help you optimize the input function for these requirements, the Cerebras compiler analyzes your input function during the compile stage. The compiler then generates a detailed log identifying any missing functions and provides recommendations on parameter values to enhance the training performance on the CS system.

Automatic execution of the analyzer#

By default, when you compile or run on the CS system, the compiler will automatically generate the output.

Important

The analyzer will run automatically either with validate_only or compile_only option. You do not need to run your network on the CS system to generate this input function analysis report.

The analyzer will display its recommendations in the output log of the compiler run, usually displayed on stdout.

Example analyzer output on stdout#

The following shows an example of the analyzer output, displayed as a part of the compiler output log on the terminal (or stdout).

Hint

In the compiler log output on stdout, each analyzer output statement will begin with the text string [input_fn].

WARNING:root:[input_fn] - interleave(): in ParallelInterleaveDatasetV3, cycle_length is not being set to CS_AUTOTUNE. Currently, it is set to 64. If determinism is not required, Using CS_AUTOTUNE is likely to improve performance unless you are deliberately using a fine-tuned value.e.g. dataset = dataset.interleave(map_func, cycle_length=cerebras.tf.tools.analyze_input_fn.CS_AUTOTUNE)
WARNING:root:Tensorflow recommends that most dataset input pipelines end with a call to prefetch, but ShuffleDataset used in input_fn after prefetch(). Unless this is a careful design choice, consider calling prefetch last
WARNING:root:[input_fn] - interleave(): in ParallelInterleaveDatasetV3_1, cycle_length is not being set to CS_AUTOTUNE. Currently, it is set to 64. If determinism is not required, Using CS_AUTOTUNE is likely to improve performance unless you are deliberately using a fine-tuned value.e.g. dataset = dataset.interleave(map_func, cycle_length=cerebras.tf.tools.analyze_input_fn.CS_AUTOTUNE)
WARNING:root:Tensorflow recommends that most dataset input pipelines end with a call to prefetch, but RepeatDataset used in input_fn after prefetch(). Unless this is a careful design choice, consider calling prefetch last
WARNING:root:Tensorflow recommends that most dataset input pipelines end with a call to prefetch, but BatchDatasetV2 used in input_fn after prefetch(). Unless this is a careful design choice, consider calling prefetch last
WARNING:root:Map is called prior to Batch. Consider reverting the order and performing the map function in a batched fashion to increase the performance of the input function
WARNING:root:[input_fn] - flat_map(): use map() instead of flat_map() to improve performance and parallelize reads. If you are not calling flat_map directly, check if you are using: from_generator, TextLineDataset, TFRecordDataset, or FixedLenthRecordDataset. If so, set num_parallel_reads to > 1 or cerebras.tf.tools.analyze_input_fn.CS_AUTOTUNE, and map() will be used automatically.

See Input Function Report for a detailed description of the input function analyzer report.