Kangrui Cen

Hi! I am currently a research intern at OPPO Research Institute, supervised by Prof. Lei Zhang. Before that, I received my Bachelor degree in Computer Science from Shanghai Jiao Tong University, where I am a member of John Hopcroft Honors Class.

Previously, I'm honored to collaborate with Prof. Ming-Hsuan Yang at UCM, Dr. Kelvin C.K. Chan at Google DeepMind, and Prof. Xiaohong Liu at SJTU.

Email / Google Scholar / Github / HuggingFace

News

[2026.04] 🎉 Two papers (co-first author) are accepted to ICML 2026!
[2026.02] 🎉 MICo-150K is accepted to CVPR 2026! The datasets are open-sourced!
[2026.02] 🎓 I will start my PhD journey at The Hong Kong Polytechnic University in Sept. 2026!
[2026.02] 🔥 We've released LayerT2V v2 on arxiv. We put forward a brand new multi-layer video representations and adopted Wan Video as our backbone.
[2026.01] 🔥 Release UniMRG on arxiv, an effective architecture-agnostic post-training method that enhances the understanding capabilities of UMMs.
[2025.12] 🔥 We've released MICo-150K, a large-scale dataset with identity consistency, and MICo-Bench.
[2025.08] 🔥 Release LayerT2V on arxiv, the first approach for generating video by compositing background and foreground objects layer by layer.
[2025.07] 🧑🏻‍💻 I am now a research intern at OPPO Research Institute, supervised by Prof. Lei Zhang!
[2025.06] 🎓 I graduated from Shanghai Jiao Tong University with Bachelor of Science Degree (Computer Science, Zhiyuan Honors), as an outstanding graduate student!

Research Interests

I am broadly interested in Computer Vision, especially Image/Video Editing/Enhancement/Generation, and Vision Language Models. Creating a super-intelligent entity that is capable of seeing, drawing and thinking like humans, or even superior to humans is our mission.

Papers

LayerT2V: A Unified Multi-Layer Video Generation Framework

Guangzhao Li^*, Kangrui Cen^*, Baixuan Zhao, Yi Xin, Siqi Luo, Guangtao Zhai, Lei Zhang, Xiaohong Liu

ICML 2026

Abstract

Text-to-video generation has advanced rapidly, but existing methods typically output only the final composited video and lack editable layered representations... Extensive experiments demonstrate that LayerT2V substantially outperforms prior methods in visual fidelity, temporal consistency, and cross-layer coherence.

Project Page Code arXiv

Generation Enhances Understanding in Unified Multimodal Models via Multi-Representation Generation

Zihan Su^*, Hongyang Wei^*, Kangrui Cen^*, Yong Wang, Guanhua Chen, Chun Yuan, Xiangxiang Chu

ICML 2026

Abstract

Unified Multimodal Models (UMMs) integrate both visual understanding and generation within a single framework. In this work, we propose UniMRG, a simple yet effective architecture-agnostic post-training method. UniMRG enhances the understanding capabilities of UMMs by incorporating auxiliary generation tasks...

Project Page Code arXiv

MICo-150K: A Comprehensive Dataset Advancing Multi-Image Composition

Xinyu Wei^*, Kangrui Cen^*, Hongyang Wei^*, Zhen Guo, Bairui Li, Zeqing Wang, Jinrui Zhang, Lei Zhang

CVPR 2026

Abstract

In controllable image generation, synthesizing coherent and consistent images from multiple reference inputs, i.e., Multi-Image Composition (MICo), remains a challenging problem... Our baseline model matches Qwen-Image-2509 in 3-image composition while supporting arbitrary multi-image inputs.

Project Page Code arXiv Dataset

Parameter-Efficient Fine-Tuning for Pre-Trained Vision Models: A Survey and Benchmark

Yi Xin, Jianjiang Yang, Siqi Luo, Yuntao Du, Qi Qin, Kangrui Cen, Yangfan He, Bin Fu, Xiaokang Yang, Guangtao Zhai, Ming-Hsuan Yang, Xiaohong Liu

Under Review

Abstract

Pre-trained vision models (PVMs) have demonstrated remarkable adaptability across a wide range of downstream vision tasks... This paper presents a comprehensive survey of the latest advancements in the visual PEFT field, systematically reviewing current methodologies...

Resources arXiv

Experience

Oppo Research Institute

2025.07 ~ Present | Shenzhen, Guangdong, China
Research Intern
Supervisor: Prof. Lei Zhang

Google DeepMind

2024.06 ~ 2024.12 | Seattle, WA, USA
Remote Collaborator
Supervisor: Dr. Kelvin C.K. Chan; Prof. Ming-Hsuan Yang

University of California, Merced

2024.04 ~ 2025.01 | Merced, CA, USA
Exchange Scholar
Supervisor: Prof. Ming-Hsuan Yang

Shanghai Jiao Tong University

2021.09 ~ 2025.06 | Shanghai, China
B.S. in Computer Science (Zhiyuan Honors Program, John Hopcroft Class)

Honors

Outstanding Graduates of Shanghai Jiao Tong University, 2025
Merit Scholarship, B Level (top 10%), SJTU, 2022, 2023
Meritorious Winner of MCM/ICM (top 7%), 2022
Zhiyuan Honors Scholarship (top 5%), SJTU, 2021, 2022, 2023