I am a first-year PhD student at the College of Engineering, Westlake University, advised by Prof. Huan Wang.

My research interests include:

  • Video Large Language Model
  • Multi-modal models
  • Efficient AI
  • Image Restoration

🔥 News

  • 2025.11: 🌟 [Preprint] A new work: OmniZip has been released. OmniZip is an audio-guided token compression method for fast OmniLLMs. Code is available at Repo.
  • 2025.10: [Preprint] We have released the preprint of StreamingTom, the first token compression method for streaming setting efficient video understanding!
  • 2025.09: 🎉 [NeurIPS’25] Poison as Cure and Hilitom are accepted by NeurIPS 2025!
  • 2025.08: 🌟 [Survey] We are excited to present the first systematic review of multimodal long-context token compression methods: “When Tokens Talk Too Much”. arXiv Repo
  • 2025.07: 🎉 [Reward] Received the “2025 Westlake University Xinrui Award” (西湖大学博士研究生新锐奖).
  • 2025.05: [Preprint] We have released the preprint of Hilitom: “Holistic Token Merging for Fast Video Large Language Models”.
  • 2025.03: [Preprint] We introduce VidKV, a plug-and-play 1.x-bit KV Cache quantization for VideoLLMs. Code is available at VidKV.
  • 2025.02: 🎉 [CVPR’25] DyCoke is accepted by CVPR’25! DyCoke is a plug-and-play token compression method for fast VideoLLMs.
  • 2025.02: [Preprint] We have released the preprint of our paper Poison as Cure. We propose a novel visual adversarial perturbation (VAP) method to mitigate the hallucination issue in VLMs.
  • 2025.01: 🎉 [ICLR’25] MGFR: “Overcoming False Illusions in Real-World Face Restoration with Multi-Modal Guided Diffusion Model” is accepted by ICLR 2025 Spotlight! The Reface-HQ dataset is also released!
  • 2024.11: [Preprint] We have released the preprint of our paper DyCoke.

📝 Publications

arXiv
sym

OmniZip: Audio-Guided Dynamic Token Compression for Fast Omnimodal Large Language Models.
Keda Tao, Kele Shao, Bohan Yu, Weiqiang Wang, Jian Liu, Huan Wang.

Project | Hugging Face

ICLR 2025 Soptlight
sym
CVPR 2025
sym

DyCoke: Dynamic Compression of Tokens for Fast Video Large Language Models.)
Keda Tao, Can Qin, Haoxuan Yu, Yang Sui, Huan Wang.

Project | Hugging Face

Survey
sym

When Tokens Talk Too Much: A Survey of Multimodal Long-Context Token Compression across Images, Videos, and Audios.)
Kele Shao*, Keda Tao*, Kejia Zhang, Sicheng Feng, Mu Cai, Yuzhang Shang, Haoxuan You, Can Qin, Yang Sui, Huan Wang. (* Equal Contribution)

Project | Hugging Face

  • Keda Tao, Haoxuan Yu, Yang Sui, Can Qin, Huan Wang. Plug-and-Play 1.x-Bit KV Cache Quantization for Video Large Language Models. arXiv, 2025. [Github] [Page]
  • Kejia Zhang, Keda Tao, Jiasheng Tang, Huan Wang. Poison as Cure: Visual Noise for Mitigating Object Hallucinations in LVMs. NeurIPS, 2025. [Github]
  • Kele Shao, Keda Tao, Can Qin, Haoxuan You, Yang Sui, Huan Wang. HoliTom: Holistic Token Merging for Fast Video Large Language Models. NeurIPS, 2025. [Github]
  • Xiucheng Wang*, Keda Tao*, Nan Cheng, Zhisheng Yin, Zan Li, Yuan Zhang, Xuemin (Sherman) Shen. RadioDiff: An Effective Generative Diffusion Model for Sampling-Free Dynamic Radio Map Construction. TCCN, 2024.
  • Haoyu Chen, Keda Tao, Yizao Wang, Xinlei Wang, Lei Zhu, Jinjin Gu. PhotoArtAgent: Intelligent Photo Retouching with Language Model-Based Artist Agents. arXiv, 2025.
  • Sicheng Feng, Keda Tao, Huan Wang. Is Oracle Pruning the True Oracle? arXiv, 2025. [Github]

🏭 Internships

  • 2023.05 - 2024.05, UNIC Lab, Xidian University, China.
  • 2024.06 - now, ENCODE Lab, Westlake University, China.