common.input package#

Submodules#

common.input.analyze_bucketing module#

Utilities for generating buckets and estimating VTS speedups.

This script can do three things 1. Give an overview of the average sequence length of a dataset and potential

for throughput increase

  1. Analyze a bucketing scheme supplied by the user for estimate throughput

    increase and data distribution within buckets

  2. Generate a new set of bucket boundaries for a user’s dataset such that

    approximately the same fraction of the data falls in each bucket.

All throughput estimates are approximate and assume that the deltat for a batch is linear in the length of the longest sample in that batch. This is not true in general, and the result is that when using analyze or generate, the outputs are generally underestimates.

The data provided by the user is assumed to be a npy file containing a histogram of the frequencies of each sequence length. For example, data[100] should be the number of samples with length exactly 100.

common.input.analyze_bucketing.bucket_data(data, buckets)#
common.input.analyze_bucketing.bucketed_cost(data, buckets)#
common.input.analyze_bucketing.find_even_buckets(raw_data, num_buckets)#
common.input.analyze_bucketing.main(args)#
common.input.analyze_bucketing.parse_args()#

common.input.utils module#

common.input.utils.check_and_create_output_dirs(output_dir, filetype)#
common.input.utils.save_params(params, model_dir, fname='params.yaml')#

Writes and saves a dictionary to a file in the model_dir.

Parameters
  • params (dict) – dict we want to write to a file in model_dir

  • model_dir (string) – Directory we want to write to

  • fname (string) – Name of file in model_dir we want to save to.

Module contents#