Research in visual generation, multimedia intelligence, and generative compression.

Qi Mao is a Professor with the State Key Laboratory of Media Convergence and Communication and the School of Information and Communication Engineering, Communication University of China.

Communication University of China · Ph.D., Peking University · Principal Investigator, MIPG

Portrait of Qi Mao

Current Focus

Controllable image and video generation, multimedia intelligence, and image-video compression.

About

Academic profile

I am currently a Professor at the School of Information and Communication Engineering and the State Key Laboratory of Media Convergence and Communication, Communication University of China.

I received my Ph.D. degree from Peking University in July 2021, where I was affiliated with the Institute of Digital Media and worked with Prof. Wen Gao and Prof. Siwei Ma.

Prior to that, I obtained both the B.E. degree in Digital Media Technology and the B.A. degree in Journalism from Communication University of China in 2016.

I was also a visiting Ph.D. student at the Vision and Learning Lab, University of California, Merced, under the supervision of Prof. Ming-Hsuan Yang.

I also worked as a visiting scholar with the National University of Singapore (NUS), where I collaborated with Prof. Mike Zheng Shou.

My research interests include controllable image and video generation, image editing, multimedia intelligence, and image-video compression based on generative models.

News

News and updates

Feb 2026

One paper was accepted to CVPR 2026

Our paper Generative Neural Video Compression via Video Diffusion Prior was accepted to the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2026).

Nov 2025

One paper was accepted to WACV 2026

UniVid: Unifying Vision Tasks with Pre-trained Video Generation Models was accepted to the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV 2026).

Sep 2025

TIP publication on multimodal large foundation models

Our work on ultra-low bitrate image compression enabled by multimodal large foundation models appeared in IEEE Transactions on Image Processing.

Oct 2025

Undergraduate recruitment remains open

The group welcomes undergraduate students with interests in generative AI, agents, and multimedia intelligence.

Lab

Multimedia Intelligent Processing Group

Research Group

I lead MIPG, the Multimedia Intelligent Processing Group at Communication University of China. The group conducts research in visual generation, image and video editing, multimedia intelligence, and generative compression.

Lab Location

F15, State Key Laboratory of Media Convergence and Communication, Communication University of China, Beijing 100024, China.

Mailing Address

Information and Communication Engineering School, Communication University of China, No. 1 Dingfuzhuang East Street, Chaoyang District, Beijing 100024, China.

Group Website

The MIPG homepage provides additional information about group members, projects, publications, and opportunities for prospective students.

Visit team site

Research

Research areas

Controllable Image and Video Generation

Research on controllable generation, localized editing, semantic alignment, and instruction-aware synthesis for visual content creation.

Generative Compression

Image and video compression with generative priors, multimodal knowledge, and human-machine collaborative perception.

Multimedia Intelligence

Visual communication, representation learning, and intelligent multimedia systems that connect media understanding with generation.

Collaboration

Maintaining long-term collaborations with Peking University, UC Merced, NUS, CityU Hong Kong, and other leading institutions.

Publications

Selected publications

View full Google Scholar profile

2026

Generative Neural Video Compression via Video Diffusion Prior

Qi Mao, Hao Cheng, Tinghan Yang, Libiao Jin, Siwei Ma. CVPR 2026.

UniVid: Unifying Vision Tasks with Pre-trained Video Generation Models

Lan Chen, Yuchao Gu, Qi Mao. WACV 2026.

2025

Exploring Multimodal Knowledge for Image Compression via Large Foundation Models

Junlong Gao, Zhimeng Huang, Qi Mao(*), Siwei Ma, Chuanmin Jia. IEEE Transactions on Image Processing, 34:5904-5919.

StarVid: Enhancing Semantic Alignment in Video Diffusion Models via Spatial and Syntactic Guided Attention Refocusing

Yuanhang Li, Qi Mao(*), Lan Chen, Zhen Fang, Lei Tian, Xinyan Xiao, Libiao Jin, Hua Wu. IEEE Transactions on Multimedia.

Edit Transfer: Learning Image Editing via Vision In-Context Relations

Lan Chen, Qi Mao, Yuchao Gu, Mike Zheng Shou. arXiv preprint.

Tuning-Free Image Editing with Fidelity and Editability via Unified Latent Diffusion Model

Qi Mao, Lan Chen, Yuchao Gu, Mike Zheng Shou, Ming-Hsuan Yang. arXiv preprint.

2024

MAG-Edit: Localized Image Editing in Complex Scenarios via Mask-Based Attention-Adjusted Guidance

Qi Mao, Lan Chen, Yuchao Gu, Zhen Fang, Mike Zheng Shou. ACM MM 2024.

Extreme Image Compression using Fine-tuned VQGANs

Qi Mao, Tinghan Yang, Yinuo Zhang, Zijian Wang, Meng Wang, Shiqi Wang, Siwei Ma. DCC 2024.

Unifying Generation and Compression: Ultra-low Bitrate Image Coding Via Multi-stage Transformer

Naifu Xue, Qi Mao, Zijian Wang, Yuan Zhang, Siwei Ma. ICME 2024.

Beyond Aligned Target Face: StyleGAN-Based Face-Swapping via Inverted Identity Learning

Yuanhang Li, Qi Mao, Libiao Jin. ICMEW 2024.

2023

Scalable Face Image Coding via StyleGAN Prior: Toward Compression for Human-Machine Collaborative Vision

Qi Mao, Chongyu Wang, Meng Wang, Shiqi Wang, Ruijie Chen, Libiao Jin, Siwei Ma. IEEE Transactions on Image Processing, 33:408-422.

Enhancing Style-Guided Image-to-Image Translation via Self-Supervised Metric Learning

Qi Mao, Siwei Ma. IEEE Transactions on Multimedia, 25:8511-8526.

2022

Conceptual Compression via Deep Structure and Texture Synthesis

Jianhui Chang, Zhenghui Zhao, Chuanmin Jia, Shiqi Wang, Lingbo Yang, Qi Mao, Jian Zhang, Siwei Ma. IEEE Transactions on Image Processing, 31:2809-2823.

Continuous and Diverse Image-to-Image Translation via Signed Attribute Vectors

Qi Mao, Hung-Yu Tseng, Hsin-Ying Lee, Jia-Bin Huang, Siwei Ma, Ming-Hsuan Yang. International Journal of Computer Vision.

Earlier

DRIT++: Diverse Image-to-Image Translation via Disentangled Representations

H.-Y. Lee, H.-Y. Tseng, Q. Mao, J.-B. Huang, Y.-D. Lu, M. K. Singh, M.-H. Yang. International Journal of Computer Vision, 128(10-11):2402-2417.

Mode Seeking Generative Adversarial Networks for Diverse Image Synthesis

Q. Mao*, H.-Y. Lee*, H.-Y. Tseng*, S. Ma, M.-H. Yang. CVPR 2019.

Students

Prospective Ph.D., M.S., and undergraduate students are welcome to apply.

I welcome inquiries from highly motivated students interested in controllable generation, image and video editing, generative compression, and multimedia intelligence.

  • Please send your CV, transcript when applicable, and a brief statement of research interests to cuc_mipg@163.com.
  • Applicants with strong coding ability, solid mathematical preparation, and a clear research motivation are especially encouraged to contact me.
  • Students from related areas are welcome, including computer vision, graphics, multimedia, machine learning, and digital media technology.
  • For additional updates from the group, please follow the WeChat official account: cuc-mipg.

Projects

Selected projects and resources

Contact

Links and profiles