Hooks

Hooks are used to monitor the training process including time, log, visualization, and so on. Details can be found in ‘hooks.py’.

Common hooks

All classes of hook inherit from the ‘Hook’ class. UP supports the following classes.

train_val_logger
auto_checkpoint
grad_clipper
auto_save_best
reload
memory_checkpoint

TrainValLogger

‘TrainValLogger’ is used to output the training log which prints losses, time consuming, time remaining, and so on. UP keeps the traing log which contains tensorboard in training. The log contains losses, accuracy, and so on.

hooks:
  - type: train_val_logger
    kwargs:
      freq: 2           # print log frequency
      skip_first_k: 5   # ignore 5 epochs time consume
      logdir: log       # tennsorboard log path
      summary_writer: tensorboard # choices = [tensorboard, pavi] # when use pavi, can not check log with tensorboard

AutoSaveBest

Keeping the checkpoint with highest accuracy.

trainer:
  max_epoch: 19              # total epoch nums
  test_freq: 19              # (best ckpt) validate interval
  test_start: 1              # (best ckpt) validate start epoch
  save_freq: 1               # ckpt save interval
  save_start: 1              # ckpt save start epoch
  save_cur_start: -1         # default -1，if the num> 0 means ckpt the epoch start to real-time save
  save_cur_freq: 2           # ckpt real-time save interval(epochs)

hooks:
  - type: auto_save_best

AutoCheckpoint

The checkpoint will be automatically saved when training is stopped.

hooks:
  - type: auto_checkpoint

Gradient Clip

Gradient clipping supports the following three modes.

Predefined norm

Averaged norm

Moving averaged norm

hooks:
  - type: grad_clipper
    kwargs:
      mode: pre_defined  # specific norm
      norm_type: 2
      max_norm: 10

or

hooks:
  - type: grad_clipper
    kwargs:
      mode: average    # average
      norm_type: 2
      tolerance: 2.0   # if over average, 2 times clip

or

hooks:
  - type: grad_clipper
    kwargs:
      mode: moving_average  # sliding average
      momentum: 0.9
      norm_type: 2
      tolerance: 5.0        # if over average, 2 times clip

MemoryCheckpoint

Dynamic checkpoint is a video memory optimization trick, which can be applied to the following situations

The input size of model training can be quantified

The input size of the model is the same, and the video memory occupation of the model is the same (or similar)

The input size of the model changes

Modify the hook field in the configuration file

hooks:
  - type: memory_checkpoint
    kwargs:
        enable: True
        checkpoint_patterns:
          backbone:
              patterns_mode: level
              level:
                num: 4 #Set to 2 when ResNet
          neck:
              patterns_mode: level
              level:
                num: 1
          roi_head:
              patterns_mode: level
              level:
                num: 1
              share_weight_num: 5
        dc_cfg:
          warmup_iters: 30 # The iterations that control the video memory occupation of the profiling model, the more it set, the more information about the video memory occupation is collected, and the more accurate the prediction model is
          max_memory: 8 # Control the upper limit of video memory usage (torch. CUDA. Memory_allocated) (GB) of DC
          debug_freq: 10 # Frequency of printing information
          strategy: greedy # memory_time or greedy
warmup_iters

The unit is iteration, which controls the iterations of the video memory occupation of the profiling model. The more settings are set, the more information about the video memory occupation is collected, and the more accurate the prediction model is. Profiling also consumes a lot of time, but the overhead is less than warmup_iters * iter_time

If it is a classification task, the video memory occupation has not changed, you can lower warmup_iters, such as 10.

It can be set to about 30 if it is a task with obvious relationship between input_size and video memory occupation.

If there are large changes in video memory under the same input, this task is not applicable in theory. If you use dynamic checkpoint, you need to set it to more values.

max_memory

The unit is GB, which is the upper limit of the video memory usage (torch. CUDA. Memory_allocated) (GB) of the control DC

When the task is a fixed input classification task, memory_threshold can be set higher.

If the task is a detection task of 2 stages, you need to lower the memory_threshold。 Because this type of task changes greatly when the same input exists, it needs to use more conservative settings.

If oom occurs during task execution, if it is affected by video memory fragments, you need to lower the memory_threshold, such as 1 GB or 0.5 GB; If the video memory optimization space of the backbone is insufficient, it may be necessary to replace other methods for optimization.

debug_freq

After warmup_iters, the frequency of output optimization schedule

strategy

You can choose memory_time or greedy

Main Process

In UP, it is managed by the DynamicCheckpointManager class, and the execution state of the current model is judged by the function interfaces such as before_forward, and relevant information is collected. The specific process is as follows:

step1 Collect video memory data, about 10 ~ 30 iters

before_forward: Record the current input size and video memory occupation, and reset the pytorch video memory statistics; Get the module set that should be checked at present (default is all Bottleeck, SwinTransformerBlock, Encoder, etc.)

after_update: Record the maximum video memory occupation, and calculate the video memory size required by the model from the beginning of forward to the end of update

step2 Optimize video memory occupation

before_forward: Record the current input size. If there is an optimization plan for this input in the cache (the input size will be round merged), the plan will be directly applied; On the contrary, the required checkpoint module set is generated by greedy or other algorithms, used as an optimization plan for application, and saved in the cache

dc_cast_forward(Not in Manager): Check whether the current module is in the checkpoint module set. If it is, execute checkpoint forward; Otherwise, forward is executed normally.