Kangrui Cen

Hi! I am currently a research intern at OPPO Research Institute, supervised by Prof. Lei Zhang. Before that, I received my Bachelor degree in Computer Science from Shanghai Jiao Tong University, where I am a member of John Hopcroft Honors Class.

Previously, I'm honored to collaborate with Prof. Ming-Hsuan Yang at UCM, Dr. Kelvin C.K. Chan at Google DeepMind, and Prof. Xiaohong Liu at SJTU.

profile photo

News

Research Interests

I am broadly interested in Computer Vision, especially Image/Video Editing/Enhancement/Generation, and Vision Language Models. Creating a super-intelligent entity that is capable of seeing, drawing and thinking like humans, or even superior to humans is our mission.

Papers

LayerT2V v2
LayerT2V: A Unified Multi-Layer Video Generation Framework
Guangzhao Li*, Kangrui Cen*, Baixuan Zhao, Yi Xin, Siqi Luo, Guangtao Zhai, Lei Zhang, Xiaohong Liu
Under Review
Abstract

Text-to-video generation has advanced rapidly, but existing methods typically output only the final composited video and lack editable layered representations... Extensive experiments demonstrate that LayerT2V substantially outperforms prior methods in visual fidelity, temporal consistency, and cross-layer coherence.

UniMRG
Generation Enhances Understanding in Unified Multimodal Models via Multi-Representation Generation
Zihan Su*, Hongyang Wei*, Kangrui Cen*, Yong Wang, Guanhua Chen, Chun Yuan, Xiangxiang Chu
Under Review
Abstract

Unified Multimodal Models (UMMs) integrate both visual understanding and generation within a single framework. In this work, we propose UniMRG, a simple yet effective architecture-agnostic post-training method. UniMRG enhances the understanding capabilities of UMMs by incorporating auxiliary generation tasks...

MICo-150K
MICo-150K: A Comprehensive Dataset Advancing Multi-Image Composition
Xinyu Wei*, Kangrui Cen*, Hongyang Wei*, Zhen Guo, Bairui Li, Zeqing Wang, Jinrui Zhang, Lei Zhang
CVPR 2026
Abstract

In controllable image generation, synthesizing coherent and consistent images from multiple reference inputs, i.e., Multi-Image Composition (MICo), remains a challenging problem... Our baseline model matches Qwen-Image-2509 in 3-image composition while supporting arbitrary multi-image inputs.

PEFT Survey
Parameter-Efficient Fine-Tuning for Pre-Trained Vision Models: A Survey and Benchmark
Yi Xin, Jianjiang Yang, Siqi Luo, Yuntao Du, Qi Qin, Kangrui Cen, Yangfan He, Bin Fu, Xiaokang Yang, Guangtao Zhai, Ming-Hsuan Yang, Xiaohong Liu
Under Review
Abstract

Pre-trained vision models (PVMs) have demonstrated remarkable adaptability across a wide range of downstream vision tasks... This paper presents a comprehensive survey of the latest advancements in the visual PEFT field, systematically reviewing current methodologies...

Experience

Oppo logo
Oppo Research Institute
2025.07 ~ Present | Shenzhen, Guangdong, China
Research Intern
Supervisor: Prof. Lei Zhang
Google logo
Google DeepMind
2024.06 ~ 2024.12 | Seattle, WA, USA
Remote Collaborator
Supervisor: Dr. Kelvin C.K. Chan; Prof. Ming-Hsuan Yang
UCM logo
University of California, Merced
2024.04 ~ 2025.01 | Merced, CA, USA
Exchange Scholar
Supervisor: Prof. Ming-Hsuan Yang
SJTU logo
Shanghai Jiao Tong University
2021.09 ~ 2025.06 | Shanghai, China
B.S. in Computer Science (Zhiyuan Honors Program, John Hopcroft Class)

Honors