Dongming Wu

Dongming Wu

I will join in MMLab at Chinese University of Hong Kong as a Postdoctoral Fellow, working with Prof. Xiangyu Yue.

In 2025.06, I received my PhD degree in Department of Computer Science, Beijing Institute of Technology, advised by Prof. Jianbing Shen.
In 2019.06, I received my Bachelor degree from the Class of Xu at the same university.

Research Interests

My current research interests lie in vision-language learning, multimodal large language models (MLLMs), and embodied agents. (1) During my graduate studies, I focused on building intelligent perception models that understand visual and linguistic information. (2) More recently, I’ve been exploring decision-making systems capable of actively interacting with both humans and dynamic environments. Ultimately, my goal is to develop human-like agents that can perceive real-world environments and make autonomous decisions, moving us closer to achieving artificial general intelligence (AGI). Two articles that have deeply inspired my thinking are The Bitter Lesson and The Second Half.

I am always open to collaboration and discussions about the latest advancements in the field. Feel free to reach out!

News

  • 🎉 2025.06: One paper (RAGNet) is accepted by ICCV2025.
  • 🎓 2025.06: I successfully defense my Ph.D. thesis. I’m awarded Outstanding Graduates of Beijing (北京市优秀毕业生).
  • 2025.02: One paper (DrivingSphere) is accepted by CVPR2025.
  • 2024.12: One paper (NuPrompt) is accepted by AAAI2025.
  • 2024.07: One paper (Merlin) is accepted by ECCV2024.
  • 2024.05: I’m awarded Excellent Doctoral Thesis Seedling Fund (优秀博士论文育苗基金).
  • 2024.01: One paper (TopoMLP) is accepted by ICLR2024.

Experience

DexMal

Research Intern

Mentor: Yingfei Liu and Tiancai Wang

MBZUAI

Visiting Student

Mentor: Prof. Rao Muhammad Anwer and Prof. Fahad Shahbaz Khan

MEGVII

Research Intern

Mentor: Tiancai Wang and Xiangyu Zhang

IIAI

Research Intern

Mentor: Xingping Dong and Prof. Ling Shao

Publications

Preprint Papers


Conference Papers

ICCV 2025
sym
A Large-Scale Reasoning-based Affordance Segmentation Dataset and Model for Universal Robot Grasping
| ICCV 2025 | Code |
Dongming Wu, Yanping Fu, Saike Huang, Yingfei Liu, Fan Jia, Nian Liu, Feng Dai, Tiancai Wang, Rao Muhammad Anwer, Fahad Shahbaz Khan, Jianbing Shen
  • We present a large-scale reasoning-based affordance segmentation benchmark RAGNet and introduce a comprehensive affordance-based grasping framework AffordanceNet.

CVPR 2025
sym
DrivingSphere: Building a High-fidelity 4D World for Closed-loop Simulation
| CVPR 2025 | Paper | Code |
Tianyi Yan, Dongming Wu, Wencheng Han, Junpeng Jiang, Xia Zhou, Kun Zhan, Cheng-zhong Xu, Jianbing Shen
  • DrivingSphere is a novel geometry-aware closed-loop simulation framework that captures 2D visual and 3D geometric properties while seamlessly integrating with vision-based end-to-end driving agents.

AAAI 2025
sym
Language prompt for autonomous driving
| AAAI 2025 | Paper | Code |
Dongming Wu, Wencheng Han, Tiancai Wang, Yingfei Liu, Xiangyu Zhang, Jianbing Shen
  • DrivingSphere is a novel geometry-aware closed-loop simulation framework that captures 2D visual and 3D geometric properties while seamlessly integrating with vision-based end-to-end driving agents.

ECCV 2024
sym
Merlin: Empowering Multimodal LLMs with Foresight Minds
| ECCV 2024 | Paper | Code |
En Yu, Liang Zhao, Yana Wei, Jinrong Yang, Dongming Wu, Lingyu Kong, Haoran Wei, Tiancai Wang, Zheng Ge, Xiangyu Zhang, Wenbing Tao
  • Merlin is a groundbreaking model capable of generating natural language responses that are intricately linked with object trajectories of multiple images.

ICLR 2024
sym
TopoMLP: A Simple yet Strong Pipeline for Driving Topology Reasoning
| ICLR 2024 | Paper | Code |
Dongming Wu, Jiahao Chang, Fan Jia, Yingfei Liu, Tiancai Wang, Jianbing Shen
  • TopoMLP is the 1st solution for 1st OpenLane Topology in Autonomous Driving Challenge. It suggests a first-detect-then-reason philosophy for better topology prediction.

ICCV 2023
sym
OnlineRefer: A Simple Online Baseline for Referring Video Object Segmentation
| ICCV 2023 | Paper | Code |
Dongming Wu, Tiancai Wang, Yuang Zhang, Xiangyu Zhang, Jianbing Shen
  • OnlineRefer is the first to challenge the widespread belief that only offline models can deal well with RVOS and makes online RVOS great again.

CVPR 2023
sym
Referring Multi-Object Tracking
| CVPR 2023 | Paper | Code |
Dongming Wu, Wencheng Han, Tiancai Wang, Xingping Dong, Xiangyu Zhang, Jianbing Shen
  • RMOT is a new referring understanding task that can detect and track an arbitrary number of objects following human instruction. We propose the first RMOT benchmark Refer-KITTI, and a baseline model TransRMOT.

CVPR 2022
sym
Multi-Level Representation Learning with Semantic Alignment for Referring Video Object Segmentation
| CVPR 2022 | Paper |
Dongming Wu, Xingping Dong, Ling Shao, Jianbing Shen


Journal Papers:


Technical Report:

Honors & Awards

Service

Conferences:

Journals: