Early Stopping#

Using a custom hook called CerebrasEarlyStoppingHook you can terminate early a neural network training based on some logic. This hook is similar to the Keras EarlyStopping class. The CerebrasEarlyStoppingHook can be used in Tensorflow either on the CS system or on a CPU.

Important

Early stopping with CerebrasEarlyStoppingHook is currently supported only on the data accessible by the model_fn. This means that if you are running training, CerebrasEarlyStoppingHook will only compute the stopping condition based on the data provided for the training run. If you are running evaluation, CerebrasEarlyStoppingHook will only compute the stopping condition based on the validation data.

Example#

See the following Tensorflow example.

def acc_early_stop(logits, labels):
    train_acc = tf.compat.v1.metrics.accuracy(
        tf.argmax(labels, 1), tf.argmax(logits, 1)
    )
    # Return True if training accuracy is greater than 90%.
return tf.math.greater(train_acc[0], tf.constant(0.9))

def loss_early_stop(loss, threshold):
    # Return True if training loss is lower than threshold.
    return tf.math.less(train_acc, tf.constant(threshold))

def model_fn(features, labels, mode, params):
    ...
    # Specify the model.
    ...
    training_hooks = [
        # Check acc_early_stop every 1000th iteration and stop training if True.
        CerebrasEarlyStoppingHook(acc_early_stop, [logits, labels], every_n_iter=1000),
        # Check loss_early_stop every 500th iteration and stop training if True.
        CerebrasEarlyStoppingHook(loss_early_stop, [loss, 0.01], every_n_iter=500)
    ]
    ...
    spec = CSEstimatorSpec(
        ...
        training_hooks=training_hooks
        ...
    )
    return spec

In the above example, the function acc_early_stop returns True if the training accuracy is greater than 90%, and the function loss_early_stop returns True if the training loss is lower than the threshold argument.

The first CerebrasEarlyStoppingHook in the training_hooks list evaluates the acc_early_stop function once every 1000 iterations. If acc_early_stop function evaluates to True, then training is stopped. If the training accuracy is not greater than 90% then acc_early_stop function is evaluated at the next 1000th iteration.

Similarly the second CerebrasEarlyStoppingHook evaluates loss_early_stop function every 500th iteration and stops the training if True.

Note

A function like acc_early_stop or loss_early_stop must return a 0 rank Boolean tensor. There are no other restrictions on the computation that occurs inside such a function. This function runs on the host.