Preprint

Preprint

“*” means authors contributed equally and “#” means corresponding author.

Strip R-CNN: Large Strip Convolution for Remote Sensing Object Detection

Xinbin Yuan, ZhaoHui Zheng, Yuxuan Li, Xialei Liu, Li Liu, Xiang Li, Qibin Hou#, Ming-Ming Cheng#

Arxiv, 2025

[Arxiv] [Code] [Zhihu] [PaperWithCode]

Selected Journal Publications (Google Scholar)

Yolo-ms: rethinking multi-scale representation learning for real-time object detection

Yuming Chen, Xinbin Yuan, Ruiqi Wu, Jiabao Wang, Qibin Hou#, Ming-Ming Cheng

IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 47(6), 4240-4252, 2025

[Arxiv] [Code]

Conv2former: A simple transformer-style convnet for visual recognition

Qibin Hou, Cheng-Ze Lu, Ming-Ming Cheng#, Jiashi Feng

IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2024

[Arxiv] [Code]

Camoformer: Masked separable attention for camouflaged object detection

Bowen Yin*, Xuying Zhang*, Deng-Ping Fan, Shaohui Jiao, Ming-Ming Cheng, Luc Van Gool, Qibin Hou#

IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2024

[Arxiv] [Code]

Vision permutator: A permutable mlp-like architecture for visual recognition

Qibin Hou, Zihang Jiang, Li Yuan, Ming-Ming Cheng, Shuicheng Yan, Jiashi Feng

IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2023

[Arxiv] [Code]

Deeply Supervised Salient Object Detection with Short Connections

Qibin Hou, Ming-Ming Cheng, Xiaowei Hu, Ali Borji, Zhuowen Tu, Philip Torr

IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017

IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2019

[Arxiv] [Code]

Selected Conference Publications (Google Scholar)

Enhancing Visual Grounding for GUI Agents via Self-Evolutionary Reinforcement Learning

Xinbin Yuan, Jian Zhang, Kaixin Li, Zhuoxuan Cai, Lujian Yao, Jie Chen, Enguang Wang, Qibin Hou#, Jinwei Chen, Peng-Tao Jiang, Bo Li#

Neural Information Processing Systems (NeurIPS), 2025

[Arxiv] [Code]

TAR3D: Creating High-Quality 3D Assets via Next-Part Prediction

Xuying Zhang*, Yutong Liu*, Yangguang Li, Renrui Zhang, Yufei Liu, Kai Wang, Wanli Ouyang, Zhiwei Xiong, Peng Gao, Qibin Hou#, Ming-Ming Cheng

IEEE International Conference on Computer Vision (ICCV), 2025

[Arxiv] [Code]

Unbiased Region-Language Alignment for Open-Vocabulary Dense Prediction

Yunheng Li, Yuxuan Li, Quansheng Zeng, Wenhai Wang, Qibin Hou#, Ming-Ming Cheng

IEEE International Conference on Computer Vision (ICCV), 2025

[Arxiv] [Code]

DFormerv2: Geometry Self-Attention for RGBD Semantic Segmentation

Bo-Wen Yin, Jiao-Long Cao, Ming-Ming Cheng, Qibin Hou#

IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2025

[Arxiv] [Code]

StoryDiffusion: Consistent Self-Attention for Long-Range Image and Video Generation

Yupeng Zhou, Daquan Zhou#, Ming-Ming Cheng, Jiashi Feng, Qibin Hou#

Neural Information Processing Systems (NeurIPS), 2024

[Arxiv] [Project] [Code]

OPUS: Occupancy Prediction Using a Sparse Set

Jiabao Wang*, Zhaojiang Liu*, Qiang Meng, Liujiang Yan, Ke Wang, Jie Yang, Wei Liu, Qibin Hou#, Ming-Ming Cheng

Neural Information Processing Systems (NeurIPS), 2024

[Arxiv] [Code]

Dformer: Rethinking rgbd representation learning for semantic segmentation

Bowen Yin, Xuying Zhang, Zhongyu Li, Li Liu, Ming-Ming Cheng, Qibin Hou#

International Conference on Learning Representations (ICLR), 2024

[Arxiv] [Code]

SRFormer: Permuted Self-Attention for Single Image Super-Resolution

Yupeng Zhou, Zhen Li, Chun-Le Guo, Song Bai, Ming-Ming Cheng, Qibin Hou#

IEEE International Conference on Computer Vision (ICCV), 2023

[Arxiv] [Code]

Coordinate attention for efficient mobile network design

Qibin Hou, Daquan Zhou, Jiashi Feng

IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2021

[Arxiv] [Code]