YOLOV and YOLOV++ for video object detection.

Update

April. 21th, 2024: Our enhanced model now achieves a 92.9 AP50(w.o post-processing) on the ImageNet VID dataset, thanks to a more robust backbone and algorithm improvements. It maintains a processing time of 26.5ms per image during batch inference on a 3090 GPU. Code release is forthcoming.
May. 8th, 2024: We release code, log and weights for YOLOV++.

Introduction

YOLOV series are high performance video object detector. Please refer to YOLOV on Arxiv for more details. Paper for YOLOV++ will be released soon.

This repo is an implementation of PyTorch version YOLOV and YOLOV++ based on YOLOX.

YOLOX Pretain Models on ImageNet VID

Model	size	mAP@50^val	Speed 2080Ti(batch size=1) (ms)	Speed 3090(batch size=32) (ms)	weights
YOLOX-s	576	69.5	9.4	1.4	google
YOLOX-l	576	76.1	14.8	4.2	google
YOLOX-x	576	77.8	20.4	-	google
YOLOX-SwinTiny	576	79.2	19.0	5.5	google
YOLOX-SwinBase	576	86.5	24.9	11.8	google
YOLOX-FocalLarge	576	89.7	42.2	25.7	-

Main result in YOLOV++

Model	size	mAP@50^val	Speed 3090(batch size=32) (ms)	weights	logs
YOLOV++ s	576	78.7	5.3	google	link
YOLOV++ l	576	84.2	7.6	google	-
YOLOV++ SwinTiny	576	85.6	8.4	google	link
YOLOV++ SwinBase	576	90.7	15.9	google	link
YOLOV++ FocalLarge	576	92.9	27.6	google	link
YOLOV++ FocalLarge + Post	576	93.2	-	-

Main result in YOLOV

Model	size	mAP@50^val	Speed 2080Ti(batch size=1) (ms)	weights
YOLOV-s	576	77.3	11.3	google
YOLOV-l	576	83.6	16.4	google
YOLOV-x	576	85.5	22.7	google
YOLOV-x + post	576	87.5	-	-

TODO

Finish Swin-Transformer based experiments.
Release updated code, model and log.

Quick Start

Installation

Install YOLOV from source.

git clone git@github.com:YuHengsss/YOLOV.git
cd YOLOV

Create conda env.

conda create -n yolov python=3.7

conda activate yolov

pip install -r requirements.txt

pip3 install -v -e .

Demo

Step1. Download a pretrained weights.

Step2. Run yolov demos. For example:

python tools/vid_demo.py -f [path to your yolov exp files] -c [path to your yolov weights] --path /path/to/your/video --conf 0.25 --nms 0.5 --tsize 576 --save_result

For online mode, exampled with yolov_l, you can run:

python tools/yolov_demo_online.py -f ./exp/yolov/yolov_l_online.py -c [path to your weights] --path /path/to/your/video --conf 0.25 --nms 0.5 --tsize 576 --save_result

For yolox models, please use python tools/demo.py for inferencing.

Reproduce our results on VID

Step1. Download datasets and weights:

Download ILSVRC2015 DET and ILSVRC2015 VID dataset from IMAGENET and organise them as follows:

path to your datasets/ILSVRC2015/
path to your datasets/ILSVRC/

Download our COCO-style annotations for training, FGFA version training annotation and video sequences. Then, put them in these two directories:

YOLOV/annotations/vid_train_coco.json
YOLOV/annotations/ILSVRC_FGFA_COCO.json
YOLOV/yolox/data/dataset/train_seq.npy

Change the data_dir in exp files to [path to your datasets] and Download our weights.

Step2. Generate predictions and convert them to IMDB style for evaluation.

python tools/val_to_imdb.py -f exps/yolov/yolov_x.py -c path to your weights/yolov_x.pth --fp16 --output_dir ./yolov_x.pkl

Evaluation process:

python tools/REPPM.py --repp_cfg ./tools/yolo_repp_cfg.json --predictions_file ./yolov_x.pkl --evaluate --annotations_filename ./annotations/annotations_val_ILSVRC.txt --path_dataset [path to your dataset] --store_imdb --store_coco  (--post)

(--post) indicates involving post-processing method. Then you will get:

{'mAP_total': 0.8758871720817065, 'mAP_slow': 0.9059275666099181, 'mAP_medium': 0.8691557352372217, 'mAP_fast': 0.7459511040452989}

Training example

python tools/vid_train.py -f exps/yolov/yolov_s.py -c weights/yoloxs_vid.pth --fp16

Roughly testing

python tools/vid_eval.py -f exps/yolov/yolov_s.py -c weights/yolov_s.pth --tnum 500 --fp16

tnum indicates testing sequence number.

Annotation format

Details

Training base detector

The train_coco.json is a COCO format annotation file. When trainig the base detector on your own dataset, try to convert the annotation to COCO format.

Training YOLOV Series

The train_seq.npy and val_seq.npy files are numpy arrays of lists. They can be loaded using the following command:

numpy.load('./yolox/data/datasets/train_seq.npy',allow_pickle=True)

Each list contains the paths to all images in a video. The specific annotations(xml annotation in VID dataset) are loaded via these image paths, refer to

YOLOV/yolox/data/datasets/vid.py

Line 125 in f5a57dd

def get_annotation(self,path,test_size):

for more details.

Acknowledgements

Expand

Cite YOLOV and YOLOV++

If YOLOV series are helpful for your research, please cite the following paper:

@article{shi2022yolov,
  title={YOLOV: Making Still Image Object Detectors Great at Video Object Detection},
  author={Shi, Yuheng and Wang, Naiyan and Guo, Xiaojie},
  journal={arXiv preprint arXiv:2208.09686},
  year={2022}
}

Name		Name	Last commit message	Last commit date
Latest commit History 50 Commits
.idea		.idea
annotations		annotations
assets		assets
demo		demo
docs		docs
exps		exps
tools		tools
yolox.egg-info		yolox.egg-info
yolox		yolox
.gitattributes		.gitattributes
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.readthedocs.yaml		.readthedocs.yaml
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.md		README.md
requirements.txt		requirements.txt
setup.cfg		setup.cfg
setup.py		setup.py

License

YuHengsss/YOLOV

Folders and files

Latest commit

History

Repository files navigation

YOLOV and YOLOV++ for video object detection.

Update

Introduction

YOLOX Pretain Models on ImageNet VID

Main result in YOLOV++

Main result in YOLOV

TODO

Quick Start

Annotation format

Acknowledgements

Cite YOLOV and YOLOV++

About

Resources

License

Stars

Watchers

Forks

Languages