Classification

UP supports the whole pipline of training, deploying, and interfering;

Configs

It contains the illustration of common configs and deploying.

Data preprocessing

UP supports additional data augmentations for classification including: Mixup，RandomErase，Mixup+Cutmix, and so on. The detail is as followed.

UP directly introduces augmentations to configs.

Mixup:

mixup: &mixup
  type: torch_mixup
  kwargs:
    alpha: 0.2
    num_classes: 1000
    extra_input: True

RandomErase:

rand_erase: &rand_erase
  type: torch_randerase
  kwargs:
    probability: 0.25

Cutmix + Mixup:

cutmix_mixup: &cutmix_mixup
  type: torch_cutmix_mixup
  kwargs:
    mixup_alpha: 0.1
    cutmix_alpha: 1.0
    switch_prob: 0.5
    num_classes: 1000
    extra_input: True
    transform: True

Deploying models

‘CLSToKestrel’ is needed when models are transformed to kestrel models as followed.

to_kestrel:
  toks_type: cls   # settinf toks_type
  model_name: Res50  # prefix of tar-model filename and model_name in meta.json
  add_softmax: False
  pixel_means: [123.675, 116.28, 103.53]
  pixel_stds: [58.395, 57.12, 57.375]
  is_rgb: True
  save_all_label: True
  type: 'UNKNOWN'

High precision baseline

UP supports two kinds of settings of high precision baseline of resnet including bag of tricks and resnet strikes.

Bag of tricks

UP imports the precision improvement way from Bag of Tricks for Convolutional Neural Networks to resnet18 and resnet50. Specifically, 200 epochs, 5 epoch warmup, coslr learning rate decay, and mixup data augmentation. The mixup way is mentioned as above and the ‘coslr’ learning rate decay is shown as followed.

lr_scheduler:
  warmup_iter: 3130
  warmup_type: linear
  warmup_register_type: no_scale_lr
  warmup_ratio: 0.25
  type: CosineAnnealingLR
  kwargs:
      T_max: 200
      eta_min: 0.0
      warmup_iter: 3130

Resnet strikes

UP imports the precision improvement way from ResNet strikes back: An improved training procedure in timm to resnet18 and resnet50. Specifically, Random Augment increasing, ‘cutmix’, ‘mixup’, ‘LAMB’ optimization, ‘coslr’ learning rate decay, and BCE classification loss function. UP supports training configs of 100 epochs and 300 epochs. The using of ‘LAMB’ is as followed.

optimizer:
  momentum=0.9,weight_decay=0.0001)
  type: LAMB
  kwargs:
    lr: 0.008
    weight_decay: 0.02

The using of BCE classification loss function is as followed.

- name: post_process
type: base_cls_postprocess
kwargs:
   cls_loss:
     type: bce
     kwargs:
       {}

The using of Rand Augument Increasing is as followed (the augmentation can be enhanced by increasing n, m, and std.)

random_augmentation: &random_augmentation
  type: torch_random_augmentationIncre
  kwargs:
    n: 2  # Randomly choosing number.
    m: 7  # The strength of each operation and the highest is 10.
    magnitude_std: 0.5  # STD of strengthes.

Knowledge distill

UP gets high precision resnet18 model (top1:73.04) by knowledge distilling. The teacher model is resnet152 with bag of tricks and the student model resnet18 loads the pretrained result from imagenet-1k. The config of the teacher model is as followed.

teacher:
  - name: backbone              # backbone = resnet50(frozen_layers, out_layers, out_strides)
    type: resnet152
    kwargs:
      frozen_layers: []
      out_layers: [4]     # layer1...4, commonly named Conv2...5
      out_strides: [32]  # tell the strides of output features
      normalize:
        type: solo_bn
      initializer:
        method: msra
      deep_stem: True
      avg_down: True
  - name: head
    type: base_cls_head
    kwargs:
      num_classes: *num_classes
      in_plane: &teacher_out_channel 2048
      input_feature_idx: -1

The config of distillation is as followed.

mimic:
  mimic_name: res152_to_res18
  mimic_type: kl
  loss_weight: 1.0
  teacher:
    mimic_name: ['head.classifier']
    teacher_weight: /UP/resnet152_tricks/teacher.pth.tar
  student:
    mimic_name: ['head.classifier']
    student_weight: /UP/res18_s/res18.pth.tar

Runner chooses KDRunner during training. The config is as followed.

runtime:
  runner:
    type: kd

Illustration of downstream

UP supports the illustration of codes for the downstream classification task. The task needs downstream training datasets and pretrained models. The using of the dataset is as followed.

dataset:
  type: cls
  kwargs:
    meta_type: custom_cls
    meta_file: /cars_im_folder//train.txt
    image_reader:
       type: fs_pillow
       kwargs:
         image_dir: /cars_im_folder/train
         color_mode: RGB
    transformer: [*random_resized_crop, *random_horizontal_flip, *pil_color_jitter, *to_tensor, *normalize]

The config of loading pretrained models is as followed.

saver: # Required.
  save_dir: res50_car/checkpoints/cls_std     # dir to save checkpoints
  results_dir: res50_car/results_dir/cls_std  # dir to save detection results. i.e., bboxes, masks, keypoints
  auto_resume: True  # find last checkpoint from save_dir and resume from it automatically
  pretrain_model: united-perception/res50/ckpt_latest.pth

The setting of downstream classification tasks: initial learing rate is 0.1/0.01 times of the pretrained learning rate, 150 training epochs, and 0.1 learning rate decay every 50 epochs. Specifically,

optimizer:
  type: SGD
  kwargs:
    lr: 0.01
    nesterov: True
    momentum: 0.9
    weight_decay: 0.0005
lr_scheduler:
  warmup_iter: 0          # 1000 iterations of warmup
  warmup_type: linear
  warmup_register_type: no_scale_lr
  warmup_ratio: 0.25
  type: MultiStepLR
  kwargs:
    milestones: [50, 100]     # [60000, 80000]
    gamma: 0.1