Haodong Yan Homepage

Biography

I am a PhD student in Intelligent Transportation at The Hong Kong University of Science and Technology (Guangzhou), where I work with Prof. Haoang Li in the IRPN Lab and am co-supervised by Prof. Ying-Cong Chen. My research focuses on foundation models for embodied intelligence, with current interests in Video-Action Models, World Models, and Video Generation.

Before starting my PhD, I received an M.Eng. in Mechanical Engineering and a second bachelor's degree in Computer Science and Technology from Xi'an Jiaotong University. I am interested in building models that can understand dynamics, predict future interactions, and support downstream reasoning and decision making for robotic systems.

News

2026 Released GigaWorld-Policy-0.5, a technical report on efficient action-centered World Action Models, on arXiv.
2026 Released RoboMemArena, a comprehensive robotic memory benchmark, on arXiv.
2026 Released DualCoT-VLA, VLA-OPD, and a new benchmark study on practical Vision-Language-Action models on arXiv.
2026 Our paper S-VAM was accepted to ECCV 2026.
2026 Our paper SCAR was accepted to CVPR 2026.
2024 Began my PhD in Intelligent Transportation at HKUST (Guangzhou).
2024 GazeMoDiff was published at Pacific Graphics 2024, and our work on photometric bundle adjustment appeared at IROS 2024.
2023 Completed research internships with the University of Stuttgart and the Technical University of Munich.

Selected Publications

GigaWorld-Policy-0.5: A Faster and Stronger WAM Empowered by AutoResearch

GigaWorld Team, Angen Ye, Angyuan Ma, Boyuan Wang, Chaojun Ni, Fangzheng Ye, Guan Huang, Guo Li, Guosheng Zhao, Haodong Yan, Hengtao Li, Jiwen Lu, Kai Wang, Mingming Yu, Qitang Hu, Qiuping Deng, Songling Liu, Xiaoyu Tian, Xiaofeng Wang, Xinyu Zhou, Xiuwei Xu, Xinze Chen, Yang Wang, Yejun Zeng, Yifan Chang, Yun Ye, Zhenyu Wu, Zhanqian Wu, Zheng Zhu

arXiv Technical Report, 2026

An action-centered World Action Model that uses future visual dynamics during training while enabling efficient action-only inference for real-time robot control.

Paper Project

S-VAM: Shortcut Video-Action Model by Self-Distilling Geometric and Semantic Foresight

Haodong Yan, Zhide Zhong, Jiaguan Zhu, Junjie He, Weilin Yuan, Wenxuan Song, Xin Gong, Yingjie Cai, Guanyi Zhao, Xu Yan, Bingbing Liu, Ying-Cong Chen, Haoang Li

ECCV 2026

A shortcut video-action model that self-distills geometric and semantic foresight from diffusion-based generation to enable efficient and precise robot manipulation.

Paper Project Code

CARE: Contextually-Aligned and Realistic 4D Scene Generation from a Single Image and Text

Haodong Yan, Pengxu Hou, Zhide Zhong, Xinhu Zheng, Zhe Liu, Hesheng Wang, Haoang Li

Preprint · under review at IEEE Transactions on Image Processing

A 4D scene generation framework from a single image and text prompt that combines scene extension and dynamics synthesis for contextually aligned and realistic physical interactions.

VLA-OPD: Bridging Offline SFT and Online RL for Vision-Language-Action Models via On-Policy Distillation

Zhide Zhong*, Haodong Yan*, Junfeng Li, Junjie He, Tianran Zhang, Haoang Li (* equal contribution)

arXiv 2026

A post-training framework that bridges offline supervised fine-tuning and online reinforcement learning through on-policy distillation for vision-language-action models.

Paper

DualCoT-VLA: Visual-Linguistic Chain of Thought via Parallel Reasoning for Vision-Language-Action Models

Zhide Zhong, Junfeng Li, Junjie He, Haodong Yan, Xin Gong, Guanyi Zhao, Yingjie Cai, Jiantao Gao, Xu Yan, Bingbing Liu, Yingcong Chen, Liuqing Yang, Haoang Li

arXiv 2026

A dual-stream reasoning framework that couples visual and linguistic chain-of-thought with parallel inference for fine-grained robotic manipulation.

Paper

RoboMemArena: A Comprehensive and Challenging Robotic Memory Benchmark

Huashuo Lei, Wenxuan Song, Huarui Zhang, Jieyuan Pei, Jiayi Chen, Haodong Yan, Han Zhao, Pengxiang Ding, Zhipeng Zhang, Lida Huang, Donglin Wang, Yan Wang, Haoang Li

arXiv 2026

A large-scale robotic memory benchmark with 26 long-horizon tasks averaging over 1,000 steps, most of them memory-dependent, together with PrediMem, a dual-system VLA whose VLM planner manages a predictive memory bank.

Paper

Rethinking the Practicality of Vision-language-action Model: A Comprehensive Benchmark and An Improved Baseline

Wenxuan Song, Jiayi Chen, Xiaoquan Sun, Huashuo Lei, Yikai Qin, Wei Zhao, Pengxiang Ding, Han Zhao, Tongxin Wang, Pengxu Hou, Zhide Zhong, Haodong Yan, Donglin Wang, Jun Ma, Haoang Li

ICRA 2026

A comprehensive benchmark and lightweight baseline that study how to make vision-language-action models more practical across diverse robot embodiments and real-world settings.

Paper

Open-world Hand-Object Interaction Video Generation Based on Structure and Contact-aware Representation

Haodong Yan, Hang Yu, Zhide Zhong, Weilin Yuan, Xin Gong, Zehang Luo, Chengxi Heyu, Junfeng Li, Wenxuan Song, Shunbo Zhou, Haoang Li

CVPR 2026

A scalable structure- and contact-aware representation for generating realistic hand-object interaction videos that generalize to open-world scenarios.

Paper Project Code

ReconVLA: Reconstructive Vision-Language-Action Model as Effective Robot Perceiver

Wenxuan Song, Ziyang Zhou, Han Zhao, Jiayi Chen, Pengxiang Ding, Haodong Yan, Yuxin Huang, Feilong Tang, Donglin Wang, Haoang Li

AAAI 2026 Outstanding Paper Award

A reconstructive vision-language-action model that improves robot perception by reconstructing task-relevant visual regions for downstream manipulation.

Paper Project Code

FlowVLA: Visual Chain of Thought-based Motion Reasoning for Vision-Language-Action Models

Zhide Zhong*, Haodong Yan*, Junfeng Li, Xiangchen Liu, Xin Gong, Tianran Zhang, Wenxuan Song, Jiayi Chen, Xinhu Zheng, Hesheng Wang, Haoang Li (* equal contribution)

arXiv 2025

A visual chain-of-thought framework for motion reasoning in VLAs that predicts future dynamics before generating the final action.

Paper Project

GazeMoDiff: Gaze-guided Diffusion Model for Stochastic Human Motion Prediction

Haodong Yan, Zhiming Hu, Syn Schmitt, Andreas Bulling

Pacific Graphics 2024

A multimodal diffusion framework that uses gaze to improve stochastic human motion prediction.

Paper Project Code

Physically-Based Photometric Bundle Adjustment in Non-Lambertian Environments

Cheng Lei, Junpeng Hu, Haodong Yan, Mariia Gladkova, Tianyu Huang, Yun-Hui Liu, Daniel Cremers, Haoang Li

IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2024

Photometric bundle adjustment with material and illumination awareness for challenging non-Lambertian scenes.

Paper Project

A Graph Embedded in Graph Framework with Dual-sequence Input for Efficient Anomaly Detection of Complex Equipment under Insufficient Samples

Haodong Yan, Fudong Li, Jinglong Chen, Zijun Liu, Jun Wang, Yong Feng, Xinwei Zhang

Reliability Engineering & System Safety, 2023

A graph-based anomaly detection framework designed for complex equipment under limited-data settings.

Paper

Memory-augmented Skip-connected Autoencoder for Unsupervised Anomaly Detection of Rocket Engines with Multi-source Fusion

Haodong Yan, Zijun Liu, Jinglong Chen, Yong Feng, Jun Wang

ISA Transactions, 2023

An unsupervised anomaly detection model that combines memory augmentation and multi-scale skip connections for rocket engine monitoring.

Paper Code

Virtual Sensor-based Imputed Graph Attention Network for Anomaly Detection of Equipment with Incomplete Data

Haodong Yan, Jun Wang, Jinglong Chen, Zijun Liu, Yong Feng

Journal of Manufacturing Systems, 2022

A graph-based framework that imputes missing sensor readings and performs anomaly detection on incomplete multi-sensor equipment data.

Paper

Education

PhD in Intelligent Transportation, The Hong Kong University of Science and Technology (Guangzhou)

Sep 2024 - Present

M.Eng. in Mechanical Engineering, Xi'an Jiaotong University

Sep 2021 - Jul 2024 · GPA 3.73/4.0 · Rank 1/281

B.Eng. in Mechanical Engineering, Xi'an Jiaotong University

Sep 2017 - Jan 2021

Second Bachelor's Degree in Computer Science and Technology, Xi'an Jiaotong University

Jul 2019 - Jul 2021

Experience & Awards

Internship, Institute for Visualisation and Interactive Systems, University of Stuttgart

Jul 2023 - Oct 2023

Remote Internship, Computer Vision Group, Technical University of Munich

Apr 2023 - Dec 2023

National Scholarship, Ministry of Education, China

2023 and 2022

National First Prize, Chinese University Students Mechanical Innovation Competition

2020

Interests

Recently, I have been enjoying rock climbing, especially bouldering at V2-V3 and routes around 5.10a-5.10b.