数据相关配置与自定义

UP 支持两种格式的数据集:公用数据集(public datasets)和自定义数据集(custom dataset)。

公用数据集

目前,下列形式的公开数据集是被支持的:

CocoDataset

dataset: # Required.
  train:
    dataset:
      type: coco
      kwargs:
        meta_file: coco/annotations/instances_train2017.json
        image_reader:
          type: fs_opencv   # ['fs_opencv', 'fs_pillow', 'ceph_opencv', 'osg']
          kwargs:
            image_dir: coco/train2017
            color_mode: RGB
        transformer: [*flip, *train_resize, *to_tensor, *normalize]
  test:
    dataset:
      type: coco
      kwargs:
        meta_file: &gt_file coco/annotations/instances_val2017.json
        image_reader:
          type: fs_opencv   # ['fs_opencv', 'fs_pillow', 'ceph_opencv', 'osg']
          kwargs:
            image_dir: coco/val2017
            color_mode: RGB
        transformer: [*test_resize, *to_tensor, *normalize]
        evaluator:
          type: COCO               # choices = {'COCO'}
          kwargs:
            gt_file: *gt_file
            iou_types: [bbox]
  batch_sampler:
    type: aspect_ratio_group
    kwargs:
      sampler:
        type: dist
        kwargs: {}
      batch_size: 2
      aspect_grouping: [1,]
  dataloader:
    type: base
    kwargs:
      num_workers: 4
      alignment: 32
  • 你需要设置元文件(meta_file)和图像目录(image_dir)到数据集中,设置数据增广(augmentations)到变换器(transformer)中。

  • UP 将数据集和评估器(evaluator)分开以适应多种不同的数据集和评估器.

  • UP 对不同的数据集支持两种评估器:coco(CocoEvaluator) - CocoDataset, MREvaluator(CustomDataset) - CustomDataset.

在自定义数据集上训练

数据集结构

数据集结构组成示例如下:

datasets
├── image_dir
|   ├── train
|   ├── test
|   └── val
└── annotations
    ├── train.json
    ├── test.json
    └── val.json

标注文件格式

建议以json格式保存标注文件,单张图像标注格式如下:

{
    "filename": "000005.jpg",
    "image_height": 375,
    "image_width": 500,
    "instances": [                // List of labeled entities, optional for test
      {
        "is_ignored": false,
        "bbox": [262,210,323,338], // x1,y1,x2,y2
        "label": 9                 // Label id start from 1, total C classes corresponding  a range of [1, 2, ..., C]
      },
      {
        "is_ignored": false,
        "bbox": [164,263,252,371],
        "label": 9
      },
      {
        "is_ignored": false,
        "bbox": [4,243,66,373],
        "label": 9
      }
    ]
}

精度评估

配置文件示例:

evaluator:
  type: MR
  kwargs:
    gt_file: path/your/test.json
    iou_thresh: 0.5
    num_classes: *num_classes
    data_respective: False

Note

  • UP支持多验证测试集评估,通过设置data_respective为True,支持多数据集分别评估精度,为False时,多数据集统一评估精度

  • 当data_respective被设置为True时,建议将测试集的batch size设置为1,否则会由于不同数据间的padding影响最终精度

UP支持MR模式评估自定义数据集精度,包括两个指标:

配置文件示例

dataset:
  train:
    dataset:
      type: custom
      kwargs:
        num_classes: *num_classes
        meta_file: # fill in your own train annotation file
        image_reader:
          type: fs_opencv
          kwargs:
            image_dir: # fill in your own train data path
            color_mode: RGB
        transformer: [*flip, *resize, *to_tensor, *normalize]
  test:
    dataset:
      type: custom
      kwargs:
        num_classes: *num_classes
        meta_file: # fill in your own test annotation file
        image_reader:
          type: fs_opencv
          kwargs:
            image_dir: # fill in your own test data path
            color_mode: RGB
        transformer: [*resize, *to_tensor, *normalize]
        evaluator:
          type: MR # fill in your own evaluator
          kwargs:
            gt_file: # fill in your own test annotation file
            iou_thresh: 0.5
            num_classes: *num_classes
  batch_sampler:
    type: aspect_ratio_group
    kwargs:
      sampler:
        type: dist
        kwargs: {}
      batch_size: 2
      aspect_grouping: [1,]
  dataloader:
    type: base
    kwargs:
      num_workers: 4
      alignment: 1

Note

  • 数据集与评估方法类型需要设定为 custom and MR

  • 需要设置num_classes参数