Trainer
This part introduces how to control the training including warmup strategies, optimization algorithms, adjusting learning rate, and so on.
Note
warmup is used to gradually raise the learning rate for smoothing the training, which contains ‘exp’, ‘linear’, and ‘no_scale_lr’.
The initial warmup learning rate equals ‘base_lr’ * ‘warmup_ratio’. The learning rate equals ‘base_lr’ * ‘total_batch_size’ after warmup, and the ‘total_batch_size’ equals ‘batch size’ * ‘gpu’.
‘lr’ in config is the learning rate of a single image. For example, using 8 GPUs with 2 batches and lr=0.00125 makes the final learning rate equal 0.00125 * 16 = 0.02.
Starting warmup by setting ‘warmup_epochs’ or ‘warmup_iter’ in the config. ‘warmup_epochs’ will be transformed into ‘warmup_iter’.
‘only_save_latest’ supports only keeping the latest model and will invalidate ‘save_freq’.
trainer: # Required.
max_epoch: 14 # total epochs for the training
test_freq: 14 # test every 14 epochs (Only tesing after all training when it is larger than max_epoch)
save_freq: 1 # save model every save_freq epoches.
# only_save_latest: False # if True, only keep the latest model and invalidate save_freq.
optimizer:
type: SGD
kwargs:
lr: 0.00125
momentum: 0.9
weight_decay: 0.0001
lr_scheduler: # lr_scheduler = MultStepLR(optimizer, milestones=[9,14],gamma=0.1)
warmup_epochs: 1 # set to be 0 to disable warmup.
# warmup_type: exp
type: MultiStepLR
kwargs:
milestones: [9,12] # epochs to decay lr
gamma: 0.1 # decay rate