I am a first-year PhD student at the College of Engineering, Westlake University, advised by Prof. Huan Wang.
My research interests include:
- Video Large Language Model
- Multi-modal models
- Efficient AI
- Image Restoration
🔥 News
- 2025.11: 🌟 [Preprint] A new work: OmniZip has been released. OmniZip is an audio-guided token compression method for fast OmniLLMs. Code is available at Repo.
- 2025.10: [Preprint] We have released the preprint of StreamingTom, the first token compression method for streaming setting efficient video understanding!
- 2025.09: 🎉 [NeurIPS’25] Poison as Cure and Hilitom are accepted by NeurIPS 2025!
- 2025.08: 🌟 [Survey] We are excited to present the first systematic review of multimodal long-context token compression methods: “When Tokens Talk Too Much”. arXiv Repo
- 2025.07: 🎉 [Reward] Received the “2025 Westlake University Xinrui Award” (西湖大学博士研究生新锐奖).
- 2025.05: [Preprint] We have released the preprint of Hilitom: “Holistic Token Merging for Fast Video Large Language Models”.
- 2025.03: [Preprint] We introduce VidKV, a plug-and-play 1.x-bit KV Cache quantization for VideoLLMs. Code is available at VidKV.
- 2025.02: 🎉 [CVPR’25] DyCoke is accepted by CVPR’25! DyCoke is a plug-and-play token compression method for fast VideoLLMs.
- 2025.02: [Preprint] We have released the preprint of our paper Poison as Cure. We propose a novel visual adversarial perturbation (VAP) method to mitigate the hallucination issue in VLMs.
- 2025.01: 🎉 [ICLR’25] MGFR: “Overcoming False Illusions in Real-World Face Restoration with Multi-Modal Guided Diffusion Model” is accepted by ICLR 2025 Spotlight! The Reface-HQ dataset is also released!
- 2024.11: [Preprint] We have released the preprint of our paper DyCoke.
📝 Publications

OmniZip: Audio-Guided Dynamic Token Compression for Fast Omnimodal Large Language Models.
Keda Tao, Kele Shao, Bohan Yu, Weiqiang Wang, Jian Liu, Huan Wang.
Project |

Overcoming False Illusions in Blind Face Restoration with Multi-Modal Guided Diffusion Model. Spotlight
Keda Tao, Jinjin Gu, Yulun Zhang, Xiucheng Wang, Nan Cheng.

DyCoke: Dynamic Compression of Tokens for Fast Video Large Language Models.)
Keda Tao, Can Qin, Haoxuan Yu, Yang Sui, Huan Wang.
Project |

When Tokens Talk Too Much: A Survey of Multimodal Long-Context Token Compression across Images, Videos, and Audios.)
Kele Shao*, Keda Tao*, Kejia Zhang, Sicheng Feng, Mu Cai, Yuzhang Shang, Haoxuan You, Can Qin, Yang Sui, Huan Wang. (* Equal Contribution)
Project |
- Xiucheng Wang*,
Keda Tao*, Nan Cheng, Zhisheng Yin, Zan Li, Yuan Zhang, Xuemin (Sherman) Shen. RadioDiff: An Effective Generative Diffusion Model for Sampling-Free Dynamic Radio Map Construction. TCCN, 2024.
- Haoyu Chen,
Keda Tao, Yizao Wang, Xinlei Wang, Lei Zhu, Jinjin Gu. PhotoArtAgent: Intelligent Photo Retouching with Language Model-Based Artist Agents. arXiv, 2025.
🏭 Internships
- 2023.05 - 2024.05, UNIC Lab, Xidian University, China.
- 2024.06 - now, ENCODE Lab, Westlake University, China.