教师简介
杜江溯(Jiangsu Du),MKSport体育副教授(引进人才系列),博士生导师。本科毕业于武汉大学,硕士毕业于爱丁堡大学(爱丁堡并行计算中心),博士毕业于中山大学(国家超级计算广州中心),曾赴新加坡国立大学HPC-AI实验室进行长期学习,研究成果发表在OSDI、PPoPP、SC、ASPLOS、VLDB、ICS、INFOCOM、TPDS、TACO等并行系统高水平会议和期刊上。获得国自然青年基金项目,广东省区域联合基金,百度松果基金、阿里创新合作AIR、中国科协“科技智库青年人才计划”等项目资助。
研究围绕大模型训推系统(MLSYS/AI Infra)展开,聚焦算力底座底层优化需求,融合算法层(高效采样、优化器重构、计算冗余消解)、系统层(GPU / 国产加速卡算子优化、资源调度、存算协同、网络与体系结构设计)关键技术,面向大语言模型、多模态模型、扩散模型、MoE 混合专家模型等主流架构,赋能通用人工智能与科学智能领域大规模计算需求,实现极致算力加速与工程落地。论文科研与工程开源并重,目前已与微信、腾讯、阿里构建合作研究。
招收自驱力强的硕博研究生,也欢迎有兴趣的本科生/科研实习生加入,希望你:对底层系统优化真正有兴趣,而不是停留在API调用;愿钻研复杂问题,敢啃国产生态硬骨头;有高性能计算、计算机系统相关科研竞赛经历者优先;若无相关经历,希望你逻辑严谨、自学能力强。
教育背景
2018.09 - 2022.12,中山大学,计算机科学与技术,博士
2021.09 - 2022.09,新加坡国立大学,访问学者
2016.09 - 2017.09,爱丁堡大学(爱丁堡并行计算中心),高性能计算,硕士
2012.09 - 2016.07,武汉大学,空间信息与数字技术,学士
代表性论文
会议
- [OSDI 2026] Jiangsu Du, Hongbin Zhang, Taosheng Wei, Zhenyi Zheng, Jiazhi Jiang, Kaiyi Wu, Zhiguang Chen, and Yutong Lu, "Efficient LLM Serving on Commodity GPU Clusters with Data-Reduced Cross-Instance Orchestration".
- [SC 2025] Yuhao Gu, Haoquan Chen, Xianjie Chen, Jiangsu Du, Zhiguang Chen, Nong Xiao, Xianwei Zhang, and Yutong Lu, “coMtainer: Compilation-assisted HPC Container Images with Enhanced Adaptability”.
- [WWW 2025] Yuhao Gu, Junyu Chen, Jiangsu Du, Xiaoxi Zhang, and Xianwei Zhang, “ORFA: A WebAssembly-based Runtime to Optimize Remote Procedure Calls with Complete Expressiveness”.
- [VLDB 2025] Qingyin Lin, Jiangsu Du*,Rui Li, Zhiguang Chen, Wenguang Chen, and Nong Xiao, “IncrCP: Decomposing and Orchestrating Incremental Checkpoints for Effective Recommendation Model Training”.
- [ASPLOS 2025] Shenggan Cheng, Shengjie Lin, Lansong Diao, Hao Wu, Siyu Wang, Chang Si, Ziming Liu, Xuanlei Zhao, Jiangsu Du, Wei Lin, and Yang You, “Concerto: Automatic Communication Optimization and Scheduling for Large-Scale Deep Learning”.
- [PPoPP 2024] Jiangsu Du, Jinhui Wei, Jiazhi Jiang, Shenggan Cheng, Dan Huang, Zhiguang Chen, Yulong Lu, “Liger: Interleaving Intra- and Inter-Operator Parallelism for Distributed Large Model Inference”.
- [SC 2024] Yuanxin Wei, Jiangsu Du*, Jiazhi Jiang, Xiao Shi, Xianwei Zhang, Dan Huang*, Nong Xiao, Yutong LU, “APTMoE: Affinity-Aware Pipeline Tuning for MoE Models on Bandwidth-Constrained GPU Nodes”.
- [INFOCOM 2024] Shengyuan Ye, Jiangsu Du*, Liekang Zeng, Wenzhong Ou, Xiaowen Chu, Yutong Lu, Xu Chen*, “Galaxy: A Resource-Efficient Collaborative Edge AI System for In-situ Transformer Inference”.
- [ICCD 2025] Xiao Shi, Jiangsu Du, Zhiguang Chen*, Yutong Lu*, “AuLoRA: Fine-Grained Loading and Computation Orchestration for Efficient LoRA LLM Serving”.
- [ICCD 2025] Jianghui Wei, Ye Huang, Yuhui Zhou, Jiazhi Jiang, and Jiangsu Du*, “Ghirorah: Fast LLM Inference on Edge with Speculative Decoding and Hetero-Core Parallelism”.
- [ICPP 2025] Hongbin Zhang, Taosheng Wei, Zhenyi Zheng, Jiangsu Du*, Zhiguang Chen*, Yutong Lu, “TD-Pipe: Temporally-Disaggregated Pipeline Parallelism Architecture for High-Throughput LLM Inference”.
- [DATE 2024] Yuanxin Wei, Shengyuan Ye, Jiazhi Jiang, Xu Chen, Dan Huang*, Jiangsu Du*, Yutong Lu, “Communication-Efficient Model Parallelism for Distributed In-situ Transformer Inference”.
- [NPC 2024] Yu Li, Yuanxin Wei, Jiangsu Du, Dan Huang, Nong Xiao, “Understanding the Inference Performance of Spatial Temporal Diffusion Transformer”.
- [ICCD 2023] Jiazhi Jiang, Rui Tian, Jiangsu Du, Dan Huang, Yutong Lu, “MixRec: Orchestrating Concurrent Recommendation Model Training on CPU-GPU platform”.
- [DATE 2023] Jiazhi Jiang, Zhijian Huang, Dan Huang, Jiangsu Du, Yutong Lu, “Accelerating Inference of 3D-CNN on ARM Many-core CPU via Hierarchical Model Partition”.
- [ICS 2022] Jiangsu Du, Jiazhi Jiang, Yang You, Dan Huang, Yutong Lu, “Handling Heavy-tailed Input of Transformer Inference on GPUs”.
- [ICCD 2020] Jiangsu Du, Minghua Shen, Yunfei Du. “A Distributed In-Situ CNN Inference System for IoT Applications”.
- [ICPP 2022] Jiazhi Jiang, Jiangsu Du, Dan Huang, Dongsheng Li, Jiang Zheng, Yutong Lu. “Characterizing and optimizing transformer inference on arm many-core processor”.
期刊
- [TPDS] Jiangsu Du, Xin Zhu, Minghua Shen, Yunfei Du, Yutong Lu, Nong Xiao, and Xiangke Liao, “Co-designing Transformer Architectures for Distributed Inference with Low Communication”.
- [TPDS] Jiangsu Du, Yuanxin Wei, Shengyuan Ye, Jiazhi Jiang, Xu Chen, Dan Huang, and Yutong Lu, “Model Parallelism Optimization for Distributed Inference via Decoupled CNN Structure”.
- [TACO] Wenxuan Pan, Zejia Lin, Jiangsu Du, and Xianwei Zhang, “HuntKTm: Hybrid Scheduling and Automatic Management for Efficient Kernel Execution on Modern GPUs”.
- [TACO] Jiangsu Du, Jiazhi Jiang, Jiang Zheng, Hongbin Zhang, Dan Huang, Yutong Lu, “Improving Computation and Memory Efficiency for Real-world Transformer Inference on GPUs”.
- [JCST] Jiangsu Du, Dongsheng Li, Yingpeng Wen, Jiazhi Jiang, Dan Huang, Xiangke Liao, and Yutong Lu, “SAIH: A Scalable Evaluation Methodology for Understanding AI Performance Trend on HPC Systems”.
- [IOTJ] Jiangsu Du, Yunfei Du, Dan Huang, Yutong Lu, and Xiangke Liao, “Enhancing Distributed In-Situ CNN Inference in the Internet of Things”
- [TPDS] Rui Tian, Jiazhi Jiang, Jiangsu Du, Dan Huang, Yutong Lu, “Sophisticated Orchestrating Concurrent DLRM Training on CPU/GPU Platform”.
- [TPDS] Jiazhi Jiang, Jiangsu Du, Dan Huang, Zhiguang Chen, Yutong Lu, Xiangke Liao, “Full-Stack Optimizing Transformer Inference on ARM Many-Core CPU”.
完整论文参考Google Scholar主页:https://scholar.google.com/citations?user=GayKRzEAAAAJ&hl=en&oi=ao



