ECCV 2022 MOTComplex Workshop | Multiple Object Tracking and Segmentation in Complex Environments Workshop in ECCV 2022. Call for challenge participation!

News

[October 22] All technical reports of top teams in four challenges are available now ! Thanks for their sharing.

[July 11] UVO Challenge is open today ! [Dataset Download] [Evaluation Server Image] [Evaluation Server Video]

[July 10] YouTubeVIS: Long Video Challenge is open today ! [Dataset Download] [Evaluation Server]

[July 5] OVIS Challenge is open today ! [Dataset Download] [Evaluation Server]

[July 4] DanceTrack Challenge is open today ! [Dataset Download] [Evaluation Server]

[July 3] Competition Phase 1 is postponed to July 11, 2022 (00:01am UTC Time). Apology for this delay.

Overview

Abstract

Multiple object tracking and segmentation aims to localize and associate objects of interest along time, and serves as fundamental technologies in many practical applications, such as visual surveillance, public security, video analysis, and human-computer interaction.

Computer vision systems nowadays have achieved great performance in simple tracking and segmentation scenes, such as MOT dataset and DAVIS dataset, but are not as robust as the human vision system, especially in complex environments.

To advance current vision systems performance in complex environments, our workshop explores four settings of multiple object tracking and segmentation: (a) long video (b) occluded object (c) diverse motion (d) open-world.

Four challenges consist of:

4th YouTubeVIS and Long Video Instance Segmentation Challenge
2nd Occluded Video Instance Segmentation Challenge
1st Multiple People Tracking in Group Dance Challenge
2nd Open-World Video Object Detection and Segmentation Challenge

Challenge

YouTubeVIS: Long Video

Video Instance Segmentation extends the image instance segmentation task from the image domain to the video domain. This problem aims at simultaneous detection, segmentation and tracking of object instances in videos. We extend VIS with long videos for validation and testing, consisting of:

141 additional long videos, 71 in validation, 70 in test
259 additional unique video instances with average duration of 49.8s
9304 additional high-quality instance masks

The additional long videos (L) are separately evaluated from previous short videos. We use average precision (AP_L) at different intersection-over-union (IoU) thresholds and average recall (AR_L) as our evaluation metrics. The IoU in video instance segmentation is the sum of intersection area over the sum of union area across the video. For more details about the dataset, please refer to our paper or website.

Dataset Download
Evaluation Server

OVIS

Occluded Video Instance Segmentation is a new large scale benchmark dataset designed with the philosophy of perceiving object occlusions in videos, which could reveal the complexity and the diversity of real-world scenes. OVIS consists of:

901 videos with severe object occlusions
25 commonly seen semantic categories
5,223 unique instances with average duration of 10.05s
296k high-quality instance masks

We use average precision (AP) at different intersection-over-union (IoU) thresholds and average recall (AR) as our evaluation metrics. The IoU in video instance segmentation is the sum of intersection area over the sum of union area across the video. For more details about the dataset, please refer to our paper or website.

Dataset Download
Evaluation Server

DanceTrack

DanceTrack is a multi-human tracking dataset with two emphasized properties, (1) uniform appearance: humans are in highly similar and almost undistinguished appearance, (2) diverse motion: humans are in complicated motion pattern and their relative positions exchange frequently. DanceTrack consists of:

100 videos of group dance, 40 training videos, 25 validation videos and 35 test videos
990 unique instances with average duration of 52.9s
877k high-quality bounding boxes

We use Higher Order Tracking Accuracy (HOTA) as the main metric, AssA and IDF1 to measure association performance, DetA and MOTA for detection quality. For more details about the dataset, please refer to our paper or website.

Dataset Download
Evaluation Server

UVO

Unidentified Video Objects benchmark is aimed at developing computer vision models that can detect and segment all objects that appear in images or videos regardless of their semantic concepts, either known or unknown. UVO is highlighted of:

high quality instance masks annotated at 30 fps on 1024 YouTube videos and 1fps on 10337 videos from Kinetics dataset
annotating ALL objects in each video, 13.5 objects per video on average
57% of objects are not covered by COCO categories

Dataset Download
Evaluation Server Image
Evaluation Server Video

Competition Schedule

Competition	Date
Competition Phase 1 (open the submission of the val results)	July 01, 2022 (00:01am UTC Time)
Competition Phase 2 (open the submission of the test results)	September 01, 2022 (00:01am UTC Time)
Deadline for Submitting the Final Predictions	October 01, 2022 (11:59pm UTC Time)
Decisions to Participants	October 05, 2022 (11:59pm UTC Time)

Top Teams

(* equal contribution)

Challenge	Rank	Team Name	Team Members	Organization	Technical Report
YouTubeVIS:Long Video	1st	IIG	Yong Liu^1,2, Jixiang Sun¹, Yitong Wang², Cong Wei¹, Yansong Tang¹, Yujiu Yang¹	¹Tsinghua Shenzhen International Graduate School, Tsinghua University, ²ByteDance Inc.	IIG
YouTubeVIS:Long Video	2nd	ByteVIS	Junfeng Wu¹, Yi Jiang², Qihao Liu³, Xiang Bai¹, Song Bai²	¹Huazhong University of Science and Technology, ²Bytedance, ³Johns Hopkins University	ByteVIS
OVIS	1st	BeyondSOTA	Fengliang Qi, Jing Xian, Zhuang Li, Bo Yan, Yuchen Hu, Hongbin Wang	Ant Group	BeyondSOTA
OVIS	2nd	IIG	Yong Liu^1,2, Jixiang Sun¹, Yitong Wang², Cong Wei¹, Yansong Tang¹, Yujiu Yang¹	¹Tsinghua Shenzhen International Graduate School, Tsinghua University, ²ByteDance Inc.	IIG
DanceTrack	1st	MOTRv2	Yuang Zhang^1,2, Tiancai Wang¹, Weiyao Lin², Xiangyu Zhang¹	¹MEGVII Technology, ²Shanghai Jiao Tong University	MOTRv2
DanceTrack	2nd	C-BIoU	Fan Yang, Shigeyuki Odashima, Shoichi Masui, Shan Jiang	Fujitsu Research	C-BIOU
DanceTrack	2nd	mt_iot	Feng Yan, Zhiheng Li, Weixin Luo, Zequn Jie, Fan Liang, Xiaolin Wei, Lin Ma	Meituan	mt_iot
DanceTrack	3rd	DLUT_IIAU	Guangxin Han¹, Mingzhan Yang¹, Yanxin Liu¹, Shiyu Zhu², Yuzhuo Han², Xu Jia¹, Huchuan Lu¹	¹Dalian University of Technology, ²Honor Device Co.Ltd	DLUT_IIAU
UVO	1st	TAL-BUPT	Jiajun Zhang^1, Boyu Chen^2, Zhilong Ji², Jinfeng Bai², Zonghai Hu¹	¹Beijing University of Posts and Telecommunications, ²Tomorrow Advancing Life	TAL-BUPT

Workshop

Invited Speakers

Xingyi Zhou Google Research

Fisher Yu ETH Zürich

Workshop Schedule

October 24^th, 9:00 am - 13:00 pm(UTC+3)

Time	Speaker	Topic
9:00-9:10 am	Organizers	Welcome
9:10-9:40 am	Invited speaker 1	Recognizing objects in long time and in a large-vocabulary
9:40-10:10 am	YouTubeVIS:Long Video winners teams	Solutions for 4th YouTubeVIS and Long Video Instance Segmentation Challenge
10:10-10:40 am	Invited speaker 2	Learning Robust Multiple Object Tracking and Segmentation
10:40-11:10 am	OVIS winners teams	Solutions for 2nd Occluded Video Instance Segmentation Challenge
11:10-11:20 am	Organizers	Break
11:20-11:50 am	DanceTrack winners teams	Solutions for 1st Multiple People Tracking in Group Dance Challenge
11:50-12:20 pm	UVO winners teams	Solutions for 2nd Open-World Video Object Detection and Segmentation Challenge
12:20-13:00 pm	Organizers	Closing