About Me

I am a technical lead at Beijing Academy of Artificial Intelligence (BAAI), leading and founding the vision and multimodal research center (a.k.a. BAAI Vision). I received my PhD degree from The University of Adelaide, supervised by Prof. Chunhua Shen. Before that I obtained my Bachelor's degree from Tongji University.

My research interests lie in the area of computer vision and foundation models. I work on visual perception (SOLO, SOLOv2), visual representation (DenseCL, EVA), visual generalist (Painter, SegGPT), multimodal representation (EVA-CLIP, Uni3D) and multimodal generalist (Emu, Emu2, Emu3).


Contact

We are always looking for full-time researchers, engineers and interns at BAAI, feel free to shoot an email if interested!

Email: wangxinlong@baai.ac.cn


News

[Sept.2024] We have released Emu3, new state-of-the-art multimodal models trained solely with next-token prediction.
[Sept.2024] EVE for encoder-free VLMs is accepted by NeurIPS 2024 as Spotlight.
[Feb.2024] Emu2 and CapsFusion are accepted by CVPR 2024.
[Feb.2024] We have released EVA-CLIP-18B, the largest and most powerful open-source CLIP model to date, with 18-billion parameters.
[Jan.2024] Emu and Uni3D are accepted by ICLR 2024. Uni3D is selected for Spotlight presentation.
[Dec.2023] We have released Emu2, open and largest generative multimodal models that achieve new state of the art on multimodal understanding and generation tasks.
[Sept.2023] EVA is one of the most influential papers (7th/2359) in CVPR 2023.
[Jul.2023] SegGPT is accepted by ICCV 2023.
[Jul.2023] We have released Emu, a multimodal generalist that can seamlessly generate images and texts in multimodal context.
[Feb.2023] Painter and EVA are accepted by CVPR 2023.
[Dec.2022] We have released Painter, a generalist model using "image" as the general-purpose interface.
[Nov.2022] We have released EVA, the best 1B Vision Foundation Model to date. All the code and models are available.


Selected Publications

  • Unveiling Encoder-Free Vision-Language Models
    Haiwen Diao, Yufeng Cui, Xiaotong Li, Yueze Wang, Huchuan Lu, Xinlong Wang
    Advances in Neural Information Processing Systems (NeurIPS), 2024
    Spotlight
    [arXiv] [code] [models]

  • Generative Multimodal Models are In-Context Learners
    Quan Sun*, Yufeng Cui*, Xiaosong Zhang*, Fan Zhang*, Qiying Yu*, Zhengxiong Luo, Yueze Wang, Yongming Rao, Jingjing Liu, Tiejun Huang, Xinlong Wang
    IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2024
    [project page] [arXiv] [code] [demo] [models] [video demo]

  • Uni3D: Exploring Unified 3D Representation at Scale
    Junsheng Zhou* , Jinsheng Wang*, Baorui Ma*†, Yu-Shen Liu, Tiejun Huang, Xinlong Wang
    International Conference on Learning Representations (ICLR), 2024
    Spotlight (5% acceptance rate)
    [arXiv] [code] [models]

  • Generative Pretraining in Multimodality
    Quan Sun*, Qiying Yu*, Yufeng Cui*, Fan Zhang*, Xiaosong Zhang*, Yueze Wang, Hongcheng Gao, Jingjing Liu, Tiejun Huang, Xinlong Wang
    International Conference on Learning Representations (ICLR), 2024
    [arXiv] [code] [demo]

  • SegGPT: Segmenting Everything In Context
    Xinlong Wang*, Xiaosong Zhang*, Yue Cao*, Wen, Wang, Chunhua Shen, Tiejun Huang
    IEEE International Conference on Computer Vision (ICCV), 2023
    [arXiv] [code] [demo]

  • Images Speak in Images: A Generalist Painter for In-Context Visual Learning
    Xinlong Wang*, Wen Wang*, Yue Cao*, Chunhua Shen, Tiejun Huang
    IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2023
    [arXiv] [code]

  • EVA: Exploring the Limits of Masked Visual Representation Learning at Scale
    Yuxin Fang, Wen Wang, Binhui Xie, Quan Sun, Ledell Wu, Xinggang Wang, Tiejun Huang, Xinlong Wang, Yue Cao
    IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2023
    Highlight (2.6% acceptance rate)
    [arXiv] [code]

  • FreeSOLO: Learning to Segment Objects without Annotations
    Xinlong Wang, Zhiding Yu, Shalini De Mello, Jan Kautz, Anima Anandkumar, Chunhua Shen, Jose M. Alvarez
    IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2022
    [arXiv] [bibtex] [code]

  • SOLO: A Simple Framework for Instance Segmentation
    Xinlong Wang, Rufeng Zhang, Chunhua Shen, Tao Kong, Lei Li
    IEEE T. Pattern Analysis and Machine Intelligence (TPAMI), 2021
    [arXiv] [bibtex] [demo] [code] [code@adet]

  • BoxInst: High-Performance Instance Segmentation with Box Annotations
    Zhi Tian, Chunhua Shen, Xinlong Wang, Hao Chen
    IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2021
    [arXiv] [demo] [code]

  • End-to-End Video Instance Segmentation with Transformers
    Yuqing Wang, Zhaoliang Xu, Xinlong Wang, Chunhua Shen, Baoshan Cheng, Hao Shen, Huaxia Xia
    IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2021
    Oral (4.3% acceptance rate)
    [arXiv] [code]

  • Dense Contrastive Learning for Self-Supervised Visual Pre-Training
    Xinlong Wang, Rufeng Zhang, Chunhua Shen, Tao Kong, Lei Li
    IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2021
    Oral (4.3% acceptance rate)
    [arXiv] [bibtex] [code][usage@adet]

  • SOLOv2: Dynamic and Fast Instance Segmentation
    Xinlong Wang, Rufeng Zhang, Tao Kong, Lei Li, Chunhua Shen
    Advances in Neural Information Processing Systems (NeurIPS), 2020
    [arXiv] [bibtex] [demo] [code] [code@adet]

  • SOLO: Segmenting Objects by Locations
    Xinlong Wang, Tao Kong, Chunhua Shen, Yuning Jiang, Lei Li
    European Conference on Computer Vision (ECCV), 2020
    [arXiv] [bibtex] [code]

  • Associatively Segmenting Instances and Semantics in Point Clouds
    Xinlong Wang, Shu Liu, Xiaoyong Shen, Chunhua Shen and Jiaya Jia
    IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2019
    [arXiv] [bibtex] [code]

  • Repulsion Loss: Detecting Pedestrians in a Crowd
    Xinlong Wang, Tete Xiao, Yuning Jiang, Shuai Shao, Jian Sun and Chunhua Shen
    IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018
    [arXiv] [bibtex]

Professional Activities

  • Journal Reviewer
    IEEE Transactions on Pattern Analysis and Machine Intelligence, Nature Communications, IEEE Transactions on Image Processing, IEEE Transactions on Multimedia, IEEE Transactions on Robotics, Neurocomputing, Pattern Recognition, Transactions on Machine Learning Research

  • Conference Reviewer
    CVPR, ECCV, ICCV, ICLR, NeurIPS, ICML, AAAI

  • Conference Area Chair
    ICLR, ICCV