数据相关配置与自定义
UP 支持两种格式的数据集:公用数据集(public datasets)和自定义数据集(custom dataset)。
公用数据集
目前,下列形式的公开数据集是被支持的:
CocoDataset
dataset: # Required. train: dataset: type: coco kwargs: meta_file: coco/annotations/instances_train2017.json image_reader: type: fs_opencv # ['fs_opencv', 'fs_pillow', 'ceph_opencv', 'osg'] kwargs: image_dir: coco/train2017 color_mode: RGB transformer: [*flip, *train_resize, *to_tensor, *normalize] test: dataset: type: coco kwargs: meta_file: >_file coco/annotations/instances_val2017.json image_reader: type: fs_opencv # ['fs_opencv', 'fs_pillow', 'ceph_opencv', 'osg'] kwargs: image_dir: coco/val2017 color_mode: RGB transformer: [*test_resize, *to_tensor, *normalize] evaluator: type: COCO # choices = {'COCO'} kwargs: gt_file: *gt_file iou_types: [bbox] batch_sampler: type: aspect_ratio_group kwargs: sampler: type: dist kwargs: {} batch_size: 2 aspect_grouping: [1,] dataloader: type: base kwargs: num_workers: 4 alignment: 32
你需要设置元文件(meta_file)和图像目录(image_dir)到数据集中,设置数据增广(augmentations)到变换器(transformer)中。
UP 将数据集和评估器(evaluator)分开以适应多种不同的数据集和评估器.
UP 对不同的数据集支持两种评估器:coco(CocoEvaluator) - CocoDataset, MREvaluator(CustomDataset) - CustomDataset.
在自定义数据集上训练
数据集结构
数据集结构组成示例如下:
datasets ├── image_dir | ├── train | ├── test | └── val └── annotations ├── train.json ├── test.json └── val.json
标注文件格式
建议以json格式保存标注文件,单张图像标注格式如下:
{ "filename": "000005.jpg", "image_height": 375, "image_width": 500, "instances": [ // List of labeled entities, optional for test { "is_ignored": false, "bbox": [262,210,323,338], // x1,y1,x2,y2 "label": 9 // Label id start from 1, total C classes corresponding a range of [1, 2, ..., C] }, { "is_ignored": false, "bbox": [164,263,252,371], "label": 9 }, { "is_ignored": false, "bbox": [4,243,66,373], "label": 9 } ] }
精度评估
配置文件示例:
evaluator: type: MR kwargs: gt_file: path/your/test.json iou_thresh: 0.5 num_classes: *num_classes data_respective: FalseNote
UP支持多验证测试集评估,通过设置data_respective为True,支持多数据集分别评估精度,为False时,多数据集统一评估精度
当data_respective被设置为True时,建议将测试集的batch size设置为1,否则会由于不同数据间的padding影响最终精度
UP支持MR模式评估自定义数据集精度,包括两个指标:
MR@FPPI=xxx: Miss rate while FPPI reaches some value.
Score@FPPI=xxx: Confidence score while FPPI reaches some value.
配置文件示例
dataset: train: dataset: type: custom kwargs: num_classes: *num_classes meta_file: # fill in your own train annotation file image_reader: type: fs_opencv kwargs: image_dir: # fill in your own train data path color_mode: RGB transformer: [*flip, *resize, *to_tensor, *normalize] test: dataset: type: custom kwargs: num_classes: *num_classes meta_file: # fill in your own test annotation file image_reader: type: fs_opencv kwargs: image_dir: # fill in your own test data path color_mode: RGB transformer: [*resize, *to_tensor, *normalize] evaluator: type: MR # fill in your own evaluator kwargs: gt_file: # fill in your own test annotation file iou_thresh: 0.5 num_classes: *num_classes batch_sampler: type: aspect_ratio_group kwargs: sampler: type: dist kwargs: {} batch_size: 2 aspect_grouping: [1,] dataloader: type: base kwargs: num_workers: 4 alignment: 1Note
数据集与评估方法类型需要设定为 custom and MR
需要设置num_classes参数