Dong Jing

Dong Jing - 荆栋

Ph.D Candidate, GSAI at Renmin University of China

Hi there! I am a third-year Ph.D candidate at GSAI, Renmin University of China, advised by Prof. Zhiwu Lu. Currently, I am working closely with Prof. Mingyu Ding at UNC. Earlier, I received my Bachelor's and Master's degrees from Beijing Jiaotong University, where I was advised by Prof. Shuo Zhang. I feel truly fortunate and honored to have worked with all three of my advisors.

I'm focusing on Embodied AI. My goal is to build brains for robots, so that they can act in the physical world with human-level intelligence. Humanity is standing at a singular moment in history. Productivity is being reshaped at an unprecedented pace. Claude and GPT, today's leading LLMs, have already achieved astonishing intelligence inside the screen. However, the physical world still remains beyond their reach. Beyond doubt, such super-brains will need capable bodies. They should grow toward human and even superhuman levels, not only mentally but also physically.

Unfortunately, today's Vision-Language-Action (VLA) models lag far behind LLMs. Pessimistically speaking, they are still far from the GPT-3 moment of Embodied AI. A key reason is that action is a modality fundamentally different from language. In my view, the action modality has not yet shown enough potential for emergent capabilities through scaling. How to properly understand and model action is an often-overlooked yet central question in Embodied AI. I will devote myself to this topic in the coming years.

I expect to graduate in June 2027 and actively exploring opportunities in industry. Feel free to reach out!

Research Publications

(# denotes equal contribution. Full list on Google Scholar.)

🤖 Embodied AI

Mixture of Horizons in Action Chunking

Dong Jing, Gang Wang, Jiaqi Liu, Weiliang Tang, Zelong Sun, Yunchao Yao, Zhenyu Wei, Yun-Hui Liu, Zhiwu Lu, Mingyu Ding

🔥ICML 2026🔥 [paper] [project] [code]

TempoVLA teaser

TempoVLA: Learning Speed-Controllable Vision-Language-Action Policies

Dong Jing#, Jingchen Nie#, Tianqi Zhang#, Jiaqi Liu, Huaxiu Yao, Zhiwu Lu, Mingyu Ding

🔥Preprint🔥 releasing within a week

Learning Action Priors teaser

Learning Action Priors for Cross-embodiment Robot Manipulation

Dong Jing, Tianqi Zhang, Jiaqi Liu, Yu Fang, Jinman Zhao, Li Erran Li, Zhiwu Lu, Mingyu Ding

🔥Preprint🔥 releasing within a week

Counterfactual VLA teaser

When Vision Overrides Language: Evaluating and Mitigating Counterfactual Failures in VLAs

Yu Fang, Yuchun Feng, Dong Jing, Jiaqi Liu, Yue Yang, Zhenyu Wei, Daniel Szafir, Mingyu Ding

🔥Preprint🔥 [paper]

ReasonManip teaser

Incentivizing Multimodal Reasoning in Large Models for Direct Robot Manipulation

Weiliang Tang#, Dong Jing#, Jia-Hui Pan, Zhiwu Lu, Yun-Hui Liu, Li Erran Li, Mingyu Ding, Chi-Wing Fu

Preprint [paper]

🎨 Multimodal Learning

FineCLIP teaser

FineCLIP: Self-distilled Region-based CLIP for Better Fine-grained Understanding

Dong Jing#, Xiaolong He#, Yutian Luo, Nanyi Fei, Guoxing Yang, Wei Wei, Huiwen Zhao, Zhiwu Lu

NeurIPS 2024 [paper] [code]

Say Cheese teaser

Say Cheese! Detail-Preserving Portrait Collection Generation via Natural Language Edits

Zelong Sun, Jiahui Wu, Ying Ba, Dong Jing, Zhiwu Lu

🔥CVPR 2026🔥 [paper]

Adaptive Task Balancing teaser

Adaptive Task Balancing for Visual Instruction Tuning via Inter-Task Contribution and Intra-Task Difficulty

Yanqi Dai, Yong Wang, Zebin You, Dong Jing, Xiangxiang Chu, Zhiwu Lu

🔥WWW 2026🔥 [paper]

Composed Image Retrieval teaser

Leveraging Large Vision-Language Model as User Intent-aware Encoder for Composed Image Retrieval

Zelong Sun, Dong Jing, Guoxing Yang, Nanyi Fei, Zhiwu Lu

AAAI 2025 [paper]

📷 3D Low-level Vision

Awards

Service

Reviewer of ICML, NeurIPS, CVPR, ICCV, ECCV, AAAI, ACM MM, CoRL, ToIS, TMLR, IROS