Best Practices
On This Page
Best Practices#
Shuffling dataset#
When sharding data in TensorFlow, follow below guidelines for data shuffling:
Choose a
shuffle_buffer
of size greater than the size of the dataset.In a multi-worker scenario with sharding, each worker has access to
1/num_shards
subset of dataset. Theoretically in this case, theshuffle_buffer
used should be greater than(1/num_shards) * dataset_size
.
While large buffer sizes help shuffle the data more thoroughly, they can take a lot of memory and significant time to fill. A practical value for
shuffle_buffer
is10 * batch_size
.Introduce randomness into the data loading pipeline by shuffling the data when writing into multiple files, splitting dataset across multiple workers by sharding, interleaving and map with parallel calls and shuffling with a decent sized
shuffle_buffer
.
Changes in TensorFlow 1.14+#
TensorFlow 1.14 includes significant changes in preparation for transition to TensorFlow 2.0. These changes include:
Keras layers are the recommended way to build your model.
Mixed precision is now a first-class feature through Keras mixed precision policy.
Significant portions of
tf.contrib
have been removed in favor of the officially integrated alternatives.
Update to TensorFlow 1.14+#
To update your code for use with TensorFlow 1.14+ and avoid using deprecated features, you should:
Use
tf.keras.layers
instead oftf.layers
.Use Keras mixed precision policy to specify running your model in mixed precision.
Make changes to your model and input functions to remove deprecation warnings seen during execution. The warning will indicate the replacement you should make. This usually involves updating a function to use an alternative from
tf.compat.v1
. See an example warning below:
WARNING:tensorflow:From onlinenorm_test.py:67: The name
tf.logging.info
is deprecated. Please usetf.compat.v1.logging.info
instead.
Tip
Run the model first in --mode validate_only
while removing the deprecation warnings. This will skip latter compilation stages and will speed up the iteration.
Avoid using
tf.contrib
. This has been entirely deprecated and does not exist in TensorFlow 2.x. Searching for the exact function you are currently using should allow you to find a suitable replacement.Finally, make sure to follow documentation for TensorFlow 1.15.