Trainer

This part introduces how to control the training including warmup strategies, optimization algorithms, adjusting learning rate, and so on.

Note

  • warmup is used to gradually raise the learning rate for smoothing the training, which contains ‘exp’, ‘linear’, and ‘no_scale_lr’.

  • The initial warmup learning rate equals ‘base_lr’ * ‘warmup_ratio’. The learning rate equals ‘base_lr’ * ‘total_batch_size’ after warmup, and the ‘total_batch_size’ equals ‘batch size’ * ‘gpu’.

  • ‘lr’ in config is the learning rate of a single image. For example, using 8 GPUs with 2 batches and lr=0.00125 makes the final learning rate equal 0.00125 * 16 = 0.02.

  • Starting warmup by setting ‘warmup_epochs’ or ‘warmup_iter’ in the config. ‘warmup_epochs’ will be transformed into ‘warmup_iter’.

  • ‘only_save_latest’ supports only keeping the latest model and will invalidate ‘save_freq’.

trainer: # Required.
    max_epoch: 14 # total epochs for the training
    test_freq: 14 # test every 14 epochs (Only tesing after all training when it is larger than max_epoch)
    save_freq: 1 # save model every save_freq epoches.
    # only_save_latest: False # if True, only keep the latest model and invalidate save_freq.
    optimizer:
        type: SGD
        kwargs:
            lr: 0.00125
            momentum: 0.9
            weight_decay: 0.0001
    lr_scheduler: # lr_scheduler = MultStepLR(optimizer, milestones=[9,14],gamma=0.1)
        warmup_epochs: 1 # set to be 0 to disable warmup.
        # warmup_type: exp
        type: MultiStepLR
        kwargs:
            milestones: [9,12] # epochs to decay lr
            gamma: 0.1 # decay rate