Wenxuan Wang

Wenxuan Wang (王文轩)

I am currently a third-year PhD student at the Institute of Automation, Chinese Academy of Sciences, co-supervised with the Beijing Academy of Artificial Intelligence by Prof. Jing Liu and Dr. Xinlong Wang. My research interests span Foundation Models, Native Multimodal Models, Generative Models, and Visual Grounding.

Email / Google Scholar / Github / Curriculum Vitae

Education

University of Science and Technology Beijing, Sept. 2016 - Jun. 2020, B.S. in School of Automation.

University of Science and Technology Beijing, Sept. 2020 - Jun. 2023, M.S. in School of Automation.

Institute of Automation, Chinese Academy of Sciences, Sept. 2023 - Jun. 2026, Ph.D. in Zidongtaichu Foundation Model Research Center.

Beijing Academy of Artificial Intelligence, Sept. 2023 - Jun. 2026, Ph.D. in Multimodal Large Model Research Center.

Recent Projects

* indicates equal contribution

Emu3.5: Native Multimodal Models are World Learners
Yufeng Cui*, Honghao Chen*, Haoge Deng*, Xu Huang*, Xinghang Li*, Jirong Liu*, Yang Liu*, Zhuoyan Luo*, Jinsheng Wang*, Wenxuan Wang*, Yueze Wang*, Chengyuan Wang*, Fan Zhang*, Yingli Zhao*, Ting Pan, Xianduo Li, Zecheng Hao, Wenxuan Ma, Zhuo Chen, Yulong Ao, Tiejun Huang, Zhongyuan Wang, Xinlong Wang
arXiv, 2025
[paper] [Page] [Code]

a large-scale multimodal world model that natively predicts the next state across vision and language

First-author Publications

* indicates equal contribution

	End-to-End Vision Tokenizer Tuning Wenxuan Wang, Fan Zhang, Yufeng Cui, Haiwen Diao, Zhuoyan Luo, Huchuan Lu, Jing Liu, Xinlong Wang NeurIPS, 2025 [paper] an end-to-end vision tokenizer tuning approach that enables joint optimization between vision tokenization and target autoregressive tasks
	Towards Unified Referring Expression Segmentation Across Omni-Level Visual Target Granularities Jing Liu, Wenxuan Wang, Yisi Zhang, Yepeng Tang, Xingjian He, Longteng Guo, Tongtian Yue, Xinlong Wang arXiv, 2025 [paper] takes a step further towards visual granularity unified RES task
	Image Difference Grounding with Natural Language Wenxuan Wang, Zijia Zhao, Yisi Zhang*, Yepeng Tang, Erdong Hu, Xinlong Wang, Jing Liu arXiv, 2025 [paper] push towards precisely localizing visual differences based on user instructions
	Diffusion Feedback Helps CLIP See Better Wenxuan Wang, Quan Sun, Fan Zhang, Yepeng Tang, Jing Liu, Xinlong Wang ICLR, 2025 [paper] [Page] [Code] leverages generative feedback from text-to-image diffusion models to optimize CLIP representations, with only images (without corresponding text)
	Beyond Literal Descriptions: Understanding and Locating Open-World Objects Aligned with Human Intentions Wenxuan Wang, Yisi Zhang, Xingjian He, Yichen Yan, Zijia Zhao, Xinlong Wang, Jing Liu ACL, 2024 (Findings) [paper] takes a step further to the intention-driven visual-language understanding and promotes classic visual grounding towards human intention interpretation
	Unveiling Parts Beyond Objects: Towards Finer-Granularity Referring Expression Segmentation Wenxuan Wang, Tongtian Yue, Yisi Zhang, Longteng Guo, Xingjian He, Xinlong Wang, Jing Liu CVPR, 2024 [paper] [Page] [Code] takes a step further to finer-grained part-level referring expression segmentation task
	CM-MaskSD: Cross-Modality Masked Self-Distillation for Referring Image Segmentation Wenxuan Wang, Jing Liu, Xingjian He, Yisi Zhang, Chen Chen, Jiachen Shen, Yan Zhang, Jiangyun Li IEEE-TMM, 2024 [paper] a new cross-modality masked self-distillation framework for referring image segmentation task

Co-author Publications

* indicates equal contribution

	EVEv2: Improved Baselines for Encoder-Free Vision-Language Models Haiwen Diao, Xiaotong Li, Yufeng Cui, Yueze Wang, Haoge Deng, Ting Pan, Wenxuan Wang, Huchuan Lu, Xinlong Wang ICCV, 2025 (highlight) [paper] [Code] encoder-free vision-language models
	Unified Vision-Language-Action Model Yuqi Wang, Xinghang Li, Wenxuan Wang, Junbo Zhang, Yingyan Li, Yuntao Chen, Xinlong Wang, Zhaoxiang Zhang arXiv, 2025 [paper] [Page] [Code] unified vision-language-action model for embodied intelligence

Honors and Awards

2025 National Scholarship (Ph.D.)

2022 National Scholarship (Master)

Website Template