common.input package#
Submodules#
common.input.analyze_bucketing module#
Utilities for generating buckets and estimating VTS speedups.
This script can do three things 1. Give an overview of the average sequence length of a dataset and potential
for throughput increase
- Analyze a bucketing scheme supplied by the user for estimate throughput
increase and data distribution within buckets
- Generate a new set of bucket boundaries for a user’s dataset such that
approximately the same fraction of the data falls in each bucket.
All throughput estimates are approximate and assume that the deltat for a batch is linear in the length of the longest sample in that batch. This is not true in general, and the result is that when using analyze or generate, the outputs are generally underestimates.
The data provided by the user is assumed to be a npy file containing a histogram of the frequencies of each sequence length. For example, data[100] should be the number of samples with length exactly 100.
- common.input.analyze_bucketing.bucket_data(data, buckets)#
- common.input.analyze_bucketing.bucketed_cost(data, buckets)#
- common.input.analyze_bucketing.find_even_buckets(raw_data, num_buckets)#
- common.input.analyze_bucketing.main(args)#
- common.input.analyze_bucketing.parse_args()#
common.input.utils module#
- common.input.utils.check_and_create_output_dirs(output_dir, filetype)#
- common.input.utils.save_params(params, model_dir, fname='params.yaml')#
Writes and saves a dictionary to a file in the model_dir.
- Parameters
params (dict) – dict we want to write to a file in model_dir
model_dir (string) – Directory we want to write to
fname (string) – Name of file in model_dir we want to save to.