model.denoising package
Subpackages
- model.denoising.data package
- model.denoising.fused_kernels package
- model.denoising.model package
- Submodules
- model.denoising.model.distributed module
- model.denoising.model.enums module
- model.denoising.model.fused_bias_gelu module
- model.denoising.model.fused_layer_norm module
- model.denoising.model.fused_softmax module
- model.denoising.model.module module
- model.denoising.model.transformer module
- model.denoising.model.utils module
- model.denoising.model.waveform_model module
- model.denoising.model.waveformer_model module
- Module contents
- model.denoising.mpu package
- Submodules
- model.denoising.mpu.data module
- model.denoising.mpu.initialize module
destroy_model_parallel()get_data_parallel_group()get_data_parallel_rank()get_data_parallel_world_size()get_embedding_group()get_model_parallel_group()get_pipeline_model_parallel_first_rank()get_pipeline_model_parallel_group()get_pipeline_model_parallel_last_rank()get_pipeline_model_parallel_next_rank()get_pipeline_model_parallel_prev_rank()get_pipeline_model_parallel_rank()get_pipeline_model_parallel_world_size()get_tensor_model_parallel_group()get_tensor_model_parallel_rank()get_tensor_model_parallel_src_rank()get_tensor_model_parallel_world_size()get_virtual_pipeline_model_parallel_rank()get_virtual_pipeline_model_parallel_world_size()initialize_model_parallel()is_pipeline_first_stage()is_pipeline_last_stage()is_unitialized()model_parallel_is_initialized()set_pipeline_model_parallel_rank()set_pipeline_model_parallel_world_size()set_tensor_model_parallel_rank()set_tensor_model_parallel_world_size()set_virtual_pipeline_model_parallel_rank()
- model.denoising.mpu.layers module
- model.denoising.mpu.mappings module
- model.denoising.mpu.random module
- model.denoising.mpu.utils module
- Module contents
- model.denoising.optimizer package
- Submodules
- model.denoising.optimizer.clip_grads module
- model.denoising.optimizer.grad_scaler module
- model.denoising.optimizer.optimizer module
FP32OptimizerFloat16OptimizerWithFloat16ParamsMegatronOptimizerMegatronOptimizer.clip_grad_norm()MegatronOptimizer.count_zeros()MegatronOptimizer.get_loss_scale()MegatronOptimizer.get_parameters()MegatronOptimizer.load_state_dict()MegatronOptimizer.param_groupsMegatronOptimizer.reload_model_params()MegatronOptimizer.scale_loss()MegatronOptimizer.stateMegatronOptimizer.state_dict()MegatronOptimizer.step()MegatronOptimizer.zero_grad()
- Module contents
Submodules
model.denoising.arguments module
Megatron arguments.
- model.denoising.arguments.parse_args(extra_args_provider=None, defaults={}, ignore_unknown_args=False)
Parse all arguments.
model.denoising.checkpointing module
Input/output checkpointing.
- model.denoising.checkpointing.check_checkpoint_args(checkpoint_args)
Ensure fixed arguments for a model are the same for the input arguments and the one retrieved from checkpoint.
- model.denoising.checkpointing.ensure_directory_exists(filename)
Build filename’s path if it does not already exists.
- model.denoising.checkpointing.fix_query_key_value_ordering(model, checkpoint_version)
Fix up query/key/value matrix ordering if checkpoint version is smaller than 2.0
- model.denoising.checkpointing.get_checkpoint_name(checkpoints_path, iteration, release=False)
A unified checkpoint name.
- model.denoising.checkpointing.get_checkpoint_tracker_filename(checkpoints_path)
Tracker file rescords the latest chckpoint during training to restart from.
- model.denoising.checkpointing.get_checkpoint_version()
- model.denoising.checkpointing.load_biencoder_checkpoint(model, only_query_model=False, only_context_model=False, custom_load_path=None)
selectively load retrieval models for indexing/retrieving from saved checkpoints
- model.denoising.checkpointing.load_checkpoint(model, optimizer, lr_scheduler, load_arg='load', strict=True)
Load a model checkpoint and return the iteration. strict (bool): whether to strictly enforce that the keys in
state_dictof the checkpoint match the names of parameters and buffers in model.
- model.denoising.checkpointing.save_checkpoint(iteration, model, optimizer, lr_scheduler)
Save a model checkpoint.
- model.denoising.checkpointing.set_checkpoint_version(value)
model.denoising.global_vars module
Megatron global variables.
- class model.denoising.global_vars.Timers
Bases:
objectGroup of timers.
- log(names, normalizer=1.0, reset=True)
Log a group of timers.
- write(names, writer, iteration, normalizer=1.0, reset=False)
Write timers to a tensorboard writer
- model.denoising.global_vars.get_adlr_autoresume()
ADLR autoresume object. It can be None so no need to check if it is initialized.
- model.denoising.global_vars.get_args()
Return arguments.
- model.denoising.global_vars.get_current_global_batch_size()
- model.denoising.global_vars.get_num_microbatches()
- model.denoising.global_vars.get_tensorboard_writer()
Return tensorboard writer. It can be None so no need to check if it is initialized.
- model.denoising.global_vars.get_timers()
Return timers.
- model.denoising.global_vars.get_tokenizer()
Return tokenizer.
- model.denoising.global_vars.set_global_variables(extra_args_provider=None, args_defaults={}, ignore_unknown_args=False)
Set args, tokenizer, tensorboard-writer, adlr-autoresume, and timers.
- model.denoising.global_vars.update_num_microbatches(consumed_samples, consistency_check=True)
model.denoising.initialize module
Megatron initialization.
- model.denoising.initialize.initialize_megatron(extra_args_provider=None, args_defaults={}, ignore_unknown_args=False, allow_no_cuda=False)
Set global variables, initialize distributed, and set autoresume and random seeds. allow_no_cuda should not be set unless using megatron for cpu only data processing. In general this arg should not be set unless you know what you are doing. Returns a function to finalize distributed env initialization (optionally, only when args.lazy_mpu_init == True)
- model.denoising.initialize.write_args_to_tensorboard()
Write arguments to tensorboard.
model.denoising.learning_rates module
Learning rate decay functions.
- class model.denoising.learning_rates.AnnealingLR(optimizer, max_lr, min_lr, warmup_steps, decay_steps, decay_style, use_checkpoint_lr_scheduler=True, override_lr_scheduler=False)
Bases:
objectAnneals the learning rate.
- get_lr()
Learning rate decay functions from: https://openreview.net/pdf?id=BJYwwY9ll pg. 4
- load_state_dict(sd)
- state_dict()
- step(increment)
Set lr for all parameters groups.
model.denoising.memory module
- class model.denoising.memory.MemoryBuffer(name, numel, dtype, track_usage)
Bases:
objectContiguous memory buffer. Allocate a contiguous memory of type dtype and size numel. It is used to reduce memory fragmentation.
- Usage: After the allocation, the _start index is set tot the first
index of the memory. A memory chunk starting from _start index can be allocated for an input tensor, with the elements of the tensor being coppied. The buffer can be reused by resetting the _start index.
- add(tensor)
Allocate a chunk of memory from the buffer to tensor and copy the values.
- get_data()
Return the data currently in use.
- is_in_use()
Whether the current buffer hold on to any memory.
- numel_in_use()
Return number of elements in use.
- print_average_usage()
Print memory usage average over time. We would like this value to be as high as possible.
- reset()
Reset the buffer start index to the beginning of the buffer.
- class model.denoising.memory.RingMemBuffer(name, num_buffers, numel, dtype, track_usage)
Bases:
objectA ring of memory buffers.
- get_next_buffer()
- model.denoising.memory.allocate_mem_buff(name, numel, dtype, track_usage)
Allocate a memory buffer.
- model.denoising.memory.get_mem_buff(name)
Get the memory buffer.
model.denoising.microbatches module
Megatron number of micro-batches calculators.
- class model.denoising.microbatches.ConstantNumMicroBatches(global_batch_size, micro_batch_size, data_parallel_size)
Bases:
NumMicroBatchesCalculator- update(consumed_samples, consistency_check)
- class model.denoising.microbatches.NumMicroBatchesCalculator
Bases:
ABC- get()
- get_current_global_batch_size()
- abstract update(consumed_samples, consistency_check)
- class model.denoising.microbatches.RampupBatchsizeNumMicroBatches(start_batch_size, batch_size_increment, ramup_samples, global_batch_size, micro_batch_size, data_parallel_size)
Bases:
NumMicroBatchesCalculator- update(consumed_samples, consistency_check)
- model.denoising.microbatches.build_num_microbatches_calculator(args)
model.denoising.p2p_communication module
- model.denoising.p2p_communication.recv_backward(timers=None)
Receive tensor from next rank in pipeline (backward receive).
- model.denoising.p2p_communication.recv_forward(tensor_shape=None, override_scatter_gather_tensors_in_pipeline=False, dtype_=None, timers=None)
Receive tensor from previous rank in pipeline (forward receive).
- model.denoising.p2p_communication.send_backward(input_tensor_grad, timers=None)
Send tensor to previous rank in pipeline (backward send).
- model.denoising.p2p_communication.send_backward_recv_backward(input_tensor_grad, recv_next, timers=None)
Batched recv from next rank and send to previous rank in pipeline.
- model.denoising.p2p_communication.send_backward_recv_forward(input_tensor_grad, timers=None)
Batched send and recv with previous rank in pipeline.
- model.denoising.p2p_communication.send_forward(output_tensor, timers=None, override_scatter_gather_tensors_in_pipeline=False, dtype_=None)
Send tensor to next rank in pipeline (forward send).
- model.denoising.p2p_communication.send_forward_backward_recv_forward_backward(output_tensor, input_tensor_grad, recv_prev, recv_next, timers=None)
Batched send and recv with previous and next ranks in pipeline.
- model.denoising.p2p_communication.send_forward_recv_backward(output_tensor, timers=None)
Batched send and recv with next rank in pipeline.
- model.denoising.p2p_communication.send_forward_recv_forward(output_tensor, recv_prev, timers=None)
Batched recv from previous rank and send to next rank in pipeline.
model.denoising.package_info module
model.denoising.schedules module
- model.denoising.schedules.backward_step(optimizer, input_tensor, output_tensor, output_tensor_grad)
Backward step through passed-in output tensor.
If last stage, output_tensor_grad is None, otherwise gradient of loss with respect to stage’s output tensor.
Returns gradient of loss with respect to input tensor (None if first stage).
- model.denoising.schedules.dummy_handler()
- model.denoising.schedules.forward_backward_no_pipelining(forward_step_func, data_iterator, model, optimizer, timers, forward_only, test_only)
Run forward and backward passes with no pipeline parallelism (no inter-stage communication).
Returns dictionary with losses.
- model.denoising.schedules.forward_backward_pipelining_with_interleaving(forward_step_func, data_iterator, model, optimizer, timers, forward_only)
Run interleaved 1F1B schedule (model split into model chunks), with communication between pipeline stages as needed.
Returns dictionary with losses if the last stage, empty dict otherwise.
- model.denoising.schedules.forward_backward_pipelining_without_interleaving(forward_step_func, data_iterator, model, optimizer, timers, forward_only)
Run non-interleaved 1F1B schedule, with communication between pipeline stages.
Returns dictionary with losses if the last stage, empty dict otherwise.
- model.denoising.schedules.forward_step(forward_step_func, data_iterator, model, input_tensor, losses_reduced)
Forward step for passed-in model.
If first stage, input tensor is obtained from data_iterator, otherwise passed-in input_tensor is used.
Returns output tensor.
- model.denoising.schedules.forward_step_wrapper(forward_step_func, data_iterator, model, input_tensor, losses_reduced, test_only)
Forward step for passed-in model.
If first stage, input tensor is obtained from data_iterator, otherwise passed-in input_tensor is used.
Returns output tensor.
- model.denoising.schedules.get_forward_backward_func()
model.denoising.training module
Pretrain utilities.
- model.denoising.training.build_train_valid_test_data_iterators(build_train_valid_test_datasets_provider)
XXX
- model.denoising.training.cyclic_iter(iter)
- model.denoising.training.evaluate(forward_step_func, data_iterator, model, verbose=False)
Evaluation.
- model.denoising.training.evaluate_and_print_results(prefix, forward_step_func, data_iterator, model, iteration, verbose=False)
Helper function to evaluate and dump results on screen.
- model.denoising.training.get_learning_rate_scheduler(optimizer)
Build the learning rate scheduler.
- model.denoising.training.get_model(model_provider_func)
Build the model.
- model.denoising.training.pretrain(train_valid_test_dataset_provider, model_provider, forward_step_func, extra_args_provider=None, args_defaults={})
Main training program.
- This function will run the followings in the order provided:
initialize Megatron.
setup model, optimizer and lr schedule using the model_provider.
call train_val_test_data_provider to get train/val/test datasets.
train the modle using the forward_step_func.
- Parameters:
train_valid_test_dataset_provider – a function that takes the size of train/valid/test dataset and returns train, valid, test datasets.
model_provider – a function that returns a vanilla version of the model. By vanilla we mean a simple model on cpu with no fp16 or ddp.
forward_step_func – a function that takes a data iterator and model, and returns a loss scalar with a dictionary with key:values being the info we would like to monitor during training, for example lm-loss: value. We also require that this function add batch generator to the timers class.
extra_args_provider – a function that takes a parser and adds arguments to it. It is used for programs to add their own arguments.
args_defaults – a dictionary from argument-name to argument-value. It to set already parse arguments.
- model.denoising.training.print_datetime(string)
Note that this call will sync across all ranks.
- model.denoising.training.save_checkpoint_and_time(iteration, model, optimizer, lr_scheduler)
- model.denoising.training.setup_model_and_optimizer(model_provider_func)
Setup model and optimizer.
- model.denoising.training.train(forward_step_func, model, optimizer, lr_scheduler, train_data_iterator, valid_data_iterator, test_data_iterator)
Train the model function.
- model.denoising.training.train_step(forward_step_func, data_iterator, model, optimizer, lr_scheduler)
Single training step.
- model.denoising.training.training_log(loss_dict, total_loss_dict, learning_rate, iteration, loss_scale, report_memory_flag, skipped_iter, grad_norm, params_norm, num_zeros_in_grad)
Log training information such as losses, timing, ….
- model.denoising.training.update_train_iters(args)
model.denoising.utils module
General utilities.
- model.denoising.utils.average_losses_across_data_parallel_group(losses)
Reduce a tensor of losses across all GPUs.
- model.denoising.utils.calc_params_l2_norm(model)
Calculate l2 norm of parameters
- model.denoising.utils.check_adlr_autoresume_termination(iteration, model, optimizer, lr_scheduler)
Check for autoresume signal and exit if it is received.
- model.denoising.utils.get_ltor_masks_and_position_ids(data, eod_token, reset_position_ids, reset_attention_mask, eod_mask_loss)
Build masks and position id for left to right model.
- model.denoising.utils.print_params_min_max_norm(optimizer, iteration)
Print min, max, and norm of all parameters.
- model.denoising.utils.report_memory(name)
Simple GPU memory report.
- model.denoising.utils.unwrap_model(model, module_instances=torch.nn.parallel.DistributedDataParallel)
Module contents
- model.denoising.is_last_rank()
- model.denoising.print_rank_0(message)
If distributed is initialized, print only on rank 0.
- model.denoising.print_rank_last(message)
If distributed is initialized, print only on last rank.