Jiahao Zhan 「詹佳豪」

| CV | Email | Google Scholar |
| Github |

I am a junior at Fudan University, majoring in Artificial Intelligence. I am also a research intern at Shanghai Qi Zhi Institute advised by Hang Zhao . Previously, I worked with Dequan Wang at Shanghai AI Lab.

Focus: My research interest lies in the integration of multiple modalities, including the text modal, the vision modal, the trajectory modal and etc. I am also excited with the creative ability of diffusion models and their application in generative AI.

Email: 22307140116 [AT] m.fudan.edu.cn


  Publications
sym

Generalizing Motion Planners with Mixture of Experts for Autonomous Driving
Qiao Sun*, Huimin Wang*, Jiahao Zhan, Fan Nie, Xin Wen, Leimeng Xu, Kun Zhan, Peng Jia, Xianpeng Lang, Hang Zhao
2024
(Accepted by ICRA 2025)

webpage | pdf | abstract | bibtex | arXiv

Large real-world driving datasets have sparked significant research into various aspects of data-driven motion planners for autonomous driving. These include data augmentation, model architecture, reward design, training strategies, and planner pipelines. These planners promise better generalizations on complicated and few-shot cases than previous methods. However, experiment results show that many of these approaches produce limited generalization abilities in planning performance due to overly complex designs or training paradigms. In this paper, we review and benchmark previous methods focusing on generalizations. The experimental results indicate that as models are appropriately scaled, many design elements become redundant. We introduce StateTransformer-2 (STR2), a scalable, decoder-only motion planner that uses a Vision Transformer (ViT) encoder and a mixture-of-experts (MoE) causal Transformer architecture. The MoE backbone addresses modality collapse and reward balancing by expert routing during training. Extensive experiments on the NuPlan dataset show that our method generalizes better than previous approaches across different test sets and closed-loop simulations. Furthermore, we assess its scalability on billions of real-world urban driving scenarios, demonstrating consistent accuracy improvements as both data and model size grow.

  @misc{sun2024generalizingmotionplannersmixture,
    title={Generalizing Motion Planners with Mixture of Experts for Autonomous Driving}, 
    author={Qiao Sun and Huimin Wang and Jiahao Zhan and Fan Nie and Xin Wen and Leimeng Xu and Kun Zhan and Peng Jia and Xianpeng Lang and Hang Zhao},
    year={2024},
    eprint={2410.15774},
    archivePrefix={arXiv},
    primaryClass={cs.RO},
    url={https://arxiv.org/abs/2410.15774}, 
}
sym

Do Large Multimodal Models Cover Academic Journal Covers?
Jin Gao, Jiahao Zhan, Chongxuan Li, Dequan Wang
2023

abstract | bibtex | arXiv

The field of Large Multimodal Models (LMMs) has experienced significant growth, primarily in creative arts and daily scenarios. However, their ability to comprehend abstract and conceptual content remains a question. Academic covers with their comprehensive visual and textual summaries, offer promising datasets for evaluating LMMs. This paper assesses recent multimodal models' capabilities in interpreting and generating academic covers. We introduce the Multimodal Academic Cover (MAC) benchmark, comprising a collection of 5872 cover images, cover stories, and relevant articles from 40 leading academic journals. Bidirectional generative tasks, Image2Text, and Text2Image, are devised to evaluate authenticity and creativity in producing cover images and stories. We evaluate existing LMMs on MAC, including DALL·E 3, GPT-4V, Gemini, CogView-3, GLM-4V, LLaVA, LLaMA-adapter, MiniGPT4, among others. Additionally, we propose a novel approach, MultiAgent Linkage (MAL), to enhance LMMs' conceptual comprehension in the long-context window. In-context learning techniques such as few-shot learning are explored to leverage LMMs effectively. The benchmark, prompts, and codes will be released soon.

    to be released
  
  Projects
sym

Shape Completion and Reconstruction of Sweet Peppers Challeng (ECCV workshop)

The Third Prize

sym

Intel LLM-based Application Innovation Contest (Team Leader)

Press conference

The Second Prize





Website template from here and here