Selected Publications
S-VAM: Shortcut Video-Action Model by Self-Distilling Geometric and Semantic Foresight
Haodong Yan, Zhide Zhong, Jiaguan Zhu, Junjie He, Weilin Yuan, Wenxuan Song, Xin Gong, Yingjie Cai, Guanyi Zhao, Xu Yan, Bingbing Liu, Ying-Cong Chen, Haoang Li
arXiv 2026
A shortcut video-action model that self-distills geometric and semantic foresight from diffusion-based generation to enable efficient and precise robot manipulation.
Paper
/
Project
/
Code
CARE: Contextually-Aligned and Realistic 4D Scene Generation from a Single Image and Text
Haodong Yan, Pengxu Hou, Zhide Zhong, Xinhu Zheng, Zhe Liu, Hesheng Wang, Haoang Li
Preprint · under review at IEEE Transactions on Image Processing
A 4D scene generation framework from a single image and text prompt that combines scene extension and dynamics synthesis for contextually aligned and realistic physical interactions.
VLA-OPD: Bridging Offline SFT and Online RL for Vision-Language-Action Models via On-Policy Distillation
Zhide Zhong, Haodong Yan, Junfeng Li, Junjie He, Tianran Zhang, Haoang Li
arXiv 2026
A post-training framework that bridges offline supervised fine-tuning and online reinforcement learning through on-policy distillation for vision-language-action models.
Paper
DualCoT-VLA: Visual-Linguistic Chain of Thought via Parallel Reasoning for Vision-Language-Action Models
Zhide Zhong, Junfeng Li, Junjie He, Haodong Yan, Xin Gong, Guanyi Zhao, Yingjie Cai, Jiantao Gao, Xu Yan, Bingbing Liu, Yingcong Chen, Liuqing Yang, Haoang Li
arXiv 2026
A dual-stream reasoning framework that couples visual and linguistic chain-of-thought with parallel inference for fine-grained robotic manipulation.
Paper
Rethinking the Practicality of Vision-language-action Model: A Comprehensive Benchmark and An Improved Baseline
Wenxuan Song, Jiayi Chen, Xiaoquan Sun, Huashuo Lei, Yikai Qin, Wei Zhao, Pengxiang Ding, Han Zhao, Tongxin Wang, Pengxu Hou, Zhide Zhong, Haodong Yan, Donglin Wang, Jun Ma, Haoang Li
ICRA 2026
A comprehensive benchmark and lightweight baseline that study how to make vision-language-action models more practical across diverse robot embodiments and real-world settings.
Paper
Open-world Hand-Object Interaction Video Generation Based on Structure and Contact-aware Representation
Haodong Yan, Hang Yu, Zhide Zhong, Weilin Yuan, Xin Gong, Zehang Luo, Chengxi Heyu, Junfeng Li, Wenxuan Song, Shunbo Zhou, Haoang Li
CVPR 2026
A scalable structure- and contact-aware representation for generating realistic hand-object interaction videos that generalize to open-world scenarios.
Paper
/
Project
/
Code
ReconVLA: Reconstructive Vision-Language-Action Model as Effective Robot Perceiver
Wenxuan Song, Ziyang Zhou, Han Zhao, Jiayi Chen, Pengxiang Ding, Haodong Yan, Yuxin Huang, Feilong Tang, Donglin Wang, Haoang Li
AAAI 2026
Outstanding Paper Award
A reconstructive vision-language-action model that improves robot perception by reconstructing task-relevant visual regions for downstream manipulation.
Paper
/
Project
/
Code
FlowVLA: Visual Chain of Thought-based Motion Reasoning for Vision-Language-Action Models
Zhide Zhong, Haodong Yan, Junfeng Li, Xiangchen Liu, Xin Gong, Tianran Zhang, Wenxuan Song, Jiayi Chen, Xinhu Zheng, Hesheng Wang, Haoang Li
arXiv 2025
A visual chain-of-thought framework for motion reasoning in VLAs that predicts future dynamics before generating the final action.
Paper
/
Project
GazeMoDiff: Gaze-guided Diffusion Model for Stochastic Human Motion Prediction
Haodong Yan, Zhiming Hu, Syn Schmitt, Andreas Bulling
Pacific Graphics 2024
A multimodal diffusion framework that uses gaze to improve stochastic human motion prediction.
Paper
/
Project
/
Code
Physically-Based Photometric Bundle Adjustment in Non-Lambertian Environments
Cheng Lei, Junpeng Hu, Haodong Yan, Mariia Gladkova, Tianyu Huang, Yun-Hui Liu, Daniel Cremers, Haoang Li
IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2024
Photometric bundle adjustment with material and illumination awareness for challenging non-Lambertian scenes.
Paper
/
Project
A Graph Embedded in Graph Framework with Dual-sequence Input for Efficient Anomaly Detection of Complex Equipment under Insufficient Samples
Haodong Yan, Fudong Li, Jinglong Chen, Zijun Liu, Jun Wang, Yong Feng, Xinwei Zhang
Reliability Engineering & System Safety, 2023
A graph-based anomaly detection framework designed for complex equipment under limited-data settings.
Paper
Memory-augmented Skip-connected Autoencoder for Unsupervised Anomaly Detection of Rocket Engines with Multi-source Fusion
Haodong Yan, Zijun Liu, Jinglong Chen, Yong Feng, Jun Wang
ISA Transactions, 2023
An unsupervised anomaly detection model that combines memory augmentation and multi-scale skip connections for rocket engine monitoring.
Paper
/
Code
Virtual Sensor-based Imputed Graph Attention Network for Anomaly Detection of Equipment with Incomplete Data
Haodong Yan, Jun Wang, Jinglong Chen, Zijun Liu, Yong Feng
Journal of Manufacturing Systems, 2022
A graph-based framework that imputes missing sensor readings and performs anomaly detection on incomplete multi-sensor equipment data.
Paper