On This Page
Using a custom hook called
CerebrasEarlyStoppingHook you can terminate early a neural network training based on some logic. This hook is similar to the Keras EarlyStopping class. The
CerebrasEarlyStoppingHook can be used in Tensorflow either on the CS system or on a CPU.
Early stopping with
CerebrasEarlyStoppingHook is currently supported only on the data accessible by the
model_fn. This means that if you are running training,
CerebrasEarlyStoppingHook will only compute the stopping condition based on the data provided for the training run. If you are running evaluation,
CerebrasEarlyStoppingHook will only compute the stopping condition based on the validation data.
See the following Tensorflow example.
def acc_early_stop(logits, labels): train_acc = tf.compat.v1.metrics.accuracy( tf.argmax(labels, 1), tf.argmax(logits, 1) ) # Return True if training accuracy is greater than 90%. return tf.math.greater(train_acc, tf.constant(0.9)) def loss_early_stop(loss, threshold): # Return True if training loss is lower than threshold. return tf.math.less(train_acc, tf.constant(threshold)) def model_fn(features, labels, mode, params): ... # Specify the model. ... training_hooks = [ # Check acc_early_stop every 1000th iteration and stop training if True. CerebrasEarlyStoppingHook(acc_early_stop, [logits, labels], every_n_iter=1000), # Check loss_early_stop every 500th iteration and stop training if True. CerebrasEarlyStoppingHook(loss_early_stop, [loss, 0.01], every_n_iter=500) ] ... spec = CSEstimatorSpec( ... training_hooks=training_hooks ... ) return spec
In the above example, the function
True if the training accuracy is greater than 90%, and the function
True if the training loss is lower than the
CerebrasEarlyStoppingHook in the
training_hooks list evaluates the
acc_early_stop function once every 1000 iterations. If
acc_early_stop function evaluates to
True, then training is stopped. If the training accuracy is not greater than 90% then
acc_early_stop function is evaluated at the next 1000th iteration.
Similarly the second
loss_early_stop function every 500th iteration and stops the training if
A function like
loss_early_stop must return a 0 rank Boolean tensor. There are no other restrictions on the computation that occurs inside such a function. This function runs on the host.