About Me
From 2019.09 to present, I am a PhD student in Department of Computer Science, Beijing Institute of Technology, advised by Prof. Jianbing Shen.
From 2022.06 to present, I am a research intern at Foundation model group in MEGVII Technology, supervised by Tiancai Wang and Xiangyu Zhang.
From 2024.09 to 2025.01, I am a visiting student at Mohamed bin Zayed University of Artificial Intelligence, supervised by Prof. Rao Muhammad Anwer and Prof. Fahad Shahbaz Khan.
In 2021.05-2022.05, I was a research intern at Inception Institute of Artificial Intelligence, UAE, supervised by Xingping Dong and Prof. Ling Shao.
In 2019.06, I recevied my Bachelor’s degree from the Class of Xu, Beijing Institute of Technology.
My current research interests are in the areas of (1) Vision-language learning, (2) Multi-modal LLM and (3) Embodied Agent. My ultimate goal is to build a human-like agent for perceiving real-life environments and performing decision-making.
News
- 2025.02: One paper (DrivingSphere) is accepted by CVPR2025.
- 2024.12: One paper (NuPrompt) is accepted by AAAI2025.
- 2024.07: One paper (Merlin) is accepted by ECCV2024.
- 2024.05: I’m awarded Excellent Doctoral Thesis Seedling Fund (优秀博士论文育苗基金).
- 2024.01: One paper (TopoMLP) is accepted by ICLR2024.
- 2023.10: I’m awarded National Scholarship!
Publications
Preprint Papers:

RAGNet: Large-scale Reasoning-based Affordance Segmentation Benchmark towards General Grasping
|In Submission|
Dongming Wu, Yanping Fu, Saike Huang, Yingfei Liu, Fan Jia, Nian Liu, Feng Dai, Tiancai Wang, Rao Muhammad Anwer, Fahad Shahbaz Khan, Jianbing Shen
- We present a large-scale reasoning-based affordance segmentation benchmark RAGNet and introduce a comprehensive affordance-based grasping framework AffordanceNet.
-
Is a 3D-Tokenized LLM the Key to Reliable Autonomous Driving?
Yifan Bai*, Dongming Wu*, Yingfei Liu, Fan Jia, Weixin Mao, Ziheng Zhang, Yucheng Zhao, Jianbing Shen, Xing Wei, Tiancai Wang, Xiangyu Zhang (*Equal Contributions)
|2024|Paper| -
Bootstrapping Referring Multi-Object Tracking
Yani Zhang, Dongming Wu, Wencheng Han, Xingping Dong
|2024|Paper|Code|
Conference Papers

DrivingSphere: Building a High-fidelity 4D World for Closed-loop Simulation
|CVPR 2025|Paper|Code|
Tianyi Yan, Dongming Wu, Wencheng Han, Junpeng Jiang, Xia Zhou, Kun Zhan, Cheng-zhong Xu, Jianbing Shen
- DrivingSphere is a novel geometry-aware closed-loop simulation framework that captures 2D visual and 3D geometric properties while seamlessly integrating with vision-based end-to-end driving agents.


Merlin:Empowering Multimodal LLMs with Foresight Minds
|ECCV 2024|Paper|Code|
En Yu, Liang Zhao, Yana Wei, Jinrong Yang, Dongming Wu, Lingyu Kong, Haoran Wei, Tiancai Wang, Zheng Ge, Xiangyu Zhang, Wenbing Tao
- Merlin is a groundbreaking model capable of generating natural language responses that are intricately linked with object trajectories of multiple images.

TopoMLP: A Simple yet Strong Pipeline for Driving Topology Reasoning
|ICLR 2024|Paper|Code|
Dongming Wu, Jiahao Chang, Fan Jia, Yingfei Liu, Tiancai Wang, Jianbing Shen
- TopoMLP is the 1st solution for 1st OpenLane Topology in Autonomous Driving Challenge. It suggests a first-detect-then-reason philosophy for better topology prediction.

OnlineRefer: A Simple Online Baseline for Referring Video Object Segmentation
|ICCV 2023|Paper|Code|
Dongming Wu, Tiancai Wang, Yuang Zhang, Xiangyu Zhang, Jianbing Shen
- OnlineRefer is the first to challenge the widespread belief that only offline models can deal well with RVOS and makes online RVOS great again.

Referring Multi-Object Tracking
|CVPR 2023|Paper|Code|
Dongming Wu, Wencheng Han, Tiancai Wang, Xingping Dong, Xiangyu Zhang, Jianbing Shen
- RMOT is a new referring understanding task that can detect and track an arbitrary number of objects following human instruction. We propose the first RMOT benckmark Refer-KITTI, and a baseline model TransRMOT.

Multi-Level Representation Learning with Semantic Alignment for Referring Video Object Segmentation
|CVPR 2022|Paper|
Dongming Wu, Xingping Dong, Ling Shao, Jianbing Shen
Journal Papers:
-
Person re-identification by context-aware part attention and multi-head collaborative learning(TIFS)
Dongming Wu, Mang Ye, Gaojie Lin, Xin Gao, Jianbing Shen
|2021|Paper| -
Reducing estimation bias via triplet-average deep deterministic policy gradient(TNNLS)
Dongming Wu, Xingping Dong, Jianbing Shen, Steven CH Hoi
|2020|Paper|
Technical Report:
- The 1st-place Solution for CVPR 2023 OpenLane Topology in Autonomous Driving Challenge
Dongming Wu, Fan Jia, Jiahao Chang, Zhuoling Li, Jianjian Sun, Chunrui Han, Shuailin Li, Yingfei Liu, Zheng Ge, Tiancai Wang
|2023|Paper|Code|
Honors
- Excellent Doctoral Thesis Seedling Fund(优秀博士论文育苗基金), Beijing Institute of Technology.
- National Scholarship, Ministry of Education China.
- The 1st place at OpenLane Topology in CVPR2023 Autonomous Driving Challenge ($15,000), Shanghai AI Lab and Huawei.
- ChinaCentury(华瑞世纪) Scholarship, Beijing Institute of Technology.
Service
Invited Reviewer for conferences:
- CVPR 2023,2024,2025
- ICCV 2023
- ECCV 2024
- ICLR 2025
- AAAI 2025
Invited Reviewer for journals:
- International Journal of Computer Vision (IJCV)
- IEEE Transactions on Image Processing (TIP)
- IEEE Transactions on Neural Networks and Learning Systems (TNNLS)
- IEEE Transactions on Multimedia (TMM)
- IEEE Transactions on Circuits and Systems for Video Technology (TCSVT)
- IEEE Transactions on Intelligent Vehicles (TIV)
- Pattern Recognition (PR)
- Neurocomputing