Best Practices

Sharding dataset

When sharding data in TensorFlow, follow below guidelines:

  • Be sure to shard before using any randomizing operator (such as shuffle).

  • Do not shuffle the list of files.

  • Generally, it is best if the shard operator is used early in the dataset pipeline. For example, when reading from a list of TFRecord files, perform sharding before converting the dataset to input samples. This avoids reading every file on every worker. The following is an example of an efficient sharding strategy within a complete pipeline:

    d = Dataset.list_files(pattern, shuffle=False)
    d = d.shard(num_workers, worker_index)
    d = d.shuffle(shuffle_buffer_size)
    d = d.interleave(tf.data.TFRecordDataset, cycle_length=CS_AUTOTUNE, block_length=blk_len, num_parallel_calls=CS_AUTOTUNE)
    d = d.map(parser_fn, num_parallel_calls=CS_AUTOTUNE)
    d = d.batch(batch_size=size, drop_remainder=True)
    d = d.repeat()
    d = d.prefetch(buffer_size=CS_AUTOTUNE)
    

See also the TensorFlow documentation.

Shuffling dataset

When sharding data in TensorFlow, follow below guidelines for data shuffling:

  • Choose a shuffle_buffer of size greater than the size of the dataset.

    • In a multi-worker scenario with sharding, each worker has access to 1/num_shards subset of dataset. Theoretically in this case, the shuffle_buffer used should be greater than (1/num_shards) * dataset_size.

  • While large buffer sizes help shuffle the data more thoroughly, they can take a lot of memory and significant time to fill. A practical value for shuffle_buffer is 10 * batch_size.

  • Introduce randomness into the data loading pipeline by shuffling the data when writing into multiple files, splitting dataset across multiple workers by sharding, interleaving and map with parallel calls and shuffling with a decent sized shuffle_buffer.

Changes in TensorFlow 1.14+

TensorFlow 1.14 includes significant changes in preparation for transition to TensorFlow 2.0. These changes include:

  • Keras layers are the recommended way to build your model.

  • Mixed precision is now a first-class feature through Keras mixed precision policy.

  • Significant portions of tf.contrib have been removed in favor of the officially integrated alternatives.

Update to TensorFlow 1.14+

To update your code for use with TensorFlow 1.14+ and avoid using deprecated features, you should:

  • Use tf.keras.layers instead of tf.layers.

  • Use Keras mixed precision policy to specify running your model in mixed precision.

  • Make changes to your model and input functions to remove deprecation warnings seen during execution. The warning will indicate the replacement you should make. This usually involves updating a function to use an alternative from tf.compat.v1. See an example warning below:

WARNING:tensorflow:From onlinenorm_test.py:67: The name tf.logging.info is deprecated. Please use tf.compat.v1.logging.info instead.

Tip

Run the model first in --mode validate_only while removing the deprecation warnings. This will skip latter compilation stages and will speed up the iteration.

  • Avoid using tf.contrib. This has been entirely deprecated and does not exist in TensorFlow 2.x. Searching for the exact function you are currently using should allow you to find a suitable replacement.

  • Finally, make sure to follow documentation for TensorFlow 1.15.