教师简介:
黄聃,教授,湖南邵阳人。2018年8月获得美国中佛罗里达大学计算机工程专业博士学位。2015年11月至2016年8月在美国橡树岭国家实验室从事研究工作。在国家超级计算广州中心从事超算和大数据、人工智能融合创新发展的技术、系统和应用研究和实现。研究成果发表在IEEE TC, IEEE TPDS, SC, PPoPP,ICML, ACL, IPDPS, ICDCS, ICS, ICPP等期刊和会议。
欢迎有意报考中山大学的硕士生和博士生与我联系。
硕士培养路径:探索一个领域应用,研发一个系统软件,熟悉一个硬件设备(1+1+1)
博士培养路径:在硕士路径基础上,独立发现问题、诊断分析问题、提出技术方案、系统研发验证、论文撰写修改、PPT设计汇报、项目资料整理
研究领域:
高性能计算/并行计算系统与应用优化、高性能人工智能计算系统、科学数据并行处理与存储IO系统
Three key factors of system research计算机系统研究三个关键要素:
Abstractions抽象, Principles原理, Techniques技术.
项目组已研发系统:
1. RTAI系列框架:面向超算平台的高效流式HPC-AI协同开发运行框架
->国内外相关系统:Ray、Google Pathways、Parsl、Radical、DeepDriveMD、Colmena
->助力药物设计专家团队,扩展超智融合药物设计工作流至1245万处理器核心
2. ParM:基于国产处理器的异构并行编程模型
->国内外相关系统:Kokkos、RAJA
3. HeteroHC:基于GPU 的基因序列比对并行工作流
->国内外相关系统:GATK HaplotypeCaller、Samtools、NVIDIA Clara Parabricks
教育背景:
(1) 2014-8至2018-8, 中佛罗里达大学 (University of Central Florida), 博士
(2) 2012-8至2014-8, 佐治亚州立大学 (Georgia State University), 硕士
(3) 2007-9至2010-6, 东南大学, 硕士
(4) 2003-9至2007-6, 吉林大学, 学士
获奖及荣誉:
广东省科技进步特等奖、电子学会科技进步一等奖、广东省通信学会特等奖等
期刊会议程序委员会:
IEEE TPDS (Reviewer Board), SC, IPDPS, Cluster, CCGrid, HiPC, DRBSD@SC Workshop
代表性论著:
在上述系统研发的牵引下,项目组发表了以下论文:
DBLP:https://dblp.org/pid/41/3836-1.html
Google Scholar:https://scholar.google.com/citations?user=Bo6PwnQAAAAJ&hl=en
2026:
Yujia Fu, Heming Zhong, Dan Huang, Yutong Lu. 2026. "FLARE: Fine-Grained Length-Aware Routing for Resource-Efficient Heterogeneous LLM Serving". The 64th Annual Meeting of the Association for Computational Linguistics ACL 2026. (CCF A)
Yu, Mingkun, Heming Zhong, Jiazhi Jiang, Dan Huang, Yutong Lu. 2026. "PolyKAN: A High-Performance and Universal GPU Operator Library for Polynomial Kolmogorov-Arnold Networks". The International Conference of Supercomputing ICS 2026. (CCF B)
Jiang, Jiazhi, Xijia Yao, Jiayu Chen, Jinhui Wei, Dan Huang, and Yutong Lu. 2026. “ASM-SpMM: Unleashing the Potential of Arm SME for Sparse Matrix Multiplication Acceleration.” In PPoPP, 232–244. (CCF A)
Wei, Jinhui, Shenggan Cheng, Wei Zhu, Jiazhi Jiang, Dan Huang, Zhiguang Chen, Jiangsu Du, and Yutong Lu. 2026. “Dynamic Latency-Throughput Balancing in Distributed Large Model Inference with Interleaved Parallelism.” ACM Transactions on Architecture and Code Optimization 23 (1): 1–26. (CCF A)
2025:
Lin, Peijia, Pin Chen, Rui Jiao, Qing Mo, Jianhuan Cen, Wenbing Huang, Yang Liu, Dan Huang, and Yutong Lu. 2025. “Equivariant Diffusion for Crystal Structure Prediction.” ICML 2025. (CCF A)
Huang, Han, Jiabin Xie, Guangnan Feng, Xianwei Zhang, Dan Huang, Zhiguang Chen, and Yutong Lu. 2025. “HStencil: Matrix-Vector Stencil Computation with Interleaved Outer Product and MLA.” In SC Conference, 1816–1829. (CCF A)
Jiang, Jiazhi, Yao Chen, Zining Zhang, et al. 2025. “Efficient KV Cache Spillover Management on Memory-Constrained GPU for LLM Inference.” IEEE Transactions on Parallel and Distributed Systems 37 (1): 90–105. (CCF A)
Jiang, Jiazhi, Xiao Liu, Jiangsu Du, Dan Huang, and Yutong Lu. 2025. “Doppeladler: Adaptive Tensor Parallelism for Latency-Critical LLM Deployment.” In PACT, 57–70. IEEE. (CCF B)
Zhong, Heming, Jinhui Wei, Yujia Fu, Dan Huang, and Yutong Lu. 2025. “IasRT: Interference-Aware GPU Scheduling for Real-Time DNN Inference.” In ICCD, 372–379. IEEE. (CCF B)
Zhong, Heming, Xiaojian Pan, Zengquang He, Haoling Wang, Dan Huang, and Zhiguang Chen. 2025. “GPU Acceleration for DNA Sequence Alignment Algorithm and Its Application.” CCF Transactions on High Performance Computing 7 (2): 169–177.
2024:
Jiang, Jiazhi, Dan, Huang, Hu, Chen, Yutong, Lu, and Xiangke, Liao. "HTDcr: a job execution framework for high-throughput computing on supercomputers".Science China Information Sciences (SCIS) 67, no.1 (2024): 112104. (CCF A 国内期刊)
Du, Jiangsu, Jinhui, Wei, Jiazhi, Jiang, Shenggan, Cheng, Dan, Huang, Zhiguang, Chen, and Yutong, Lu. "Liger: Interleaving Intra-and Inter-Operator Parallelism for Distributed Large Model Inference." . In Proceedings of the 29th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming (PPoPP) (pp. 42–54).2024. (CCF A 会议)
Huang, Han, Tengyang, Zheng, Tianxing, Yang, Yang, Ye, Siran, Liu, Zhe, Tang, Shengyou, Lu, Guangnan, Feng, Zhiguang, Chen, and Dan, Huang. "Critique of “Productivity, Portability, Performance Data-Centric Python” by SCC Team From Sun Yat-sen University".IEEE Transactions on Parallel and Distributed Systems (TPDS) (2024). (CCF A 期刊)
Tian, Rui, Jiazhi, Jiang, Jiangsu, Du, Dan, Huang, and Yutong, Lu. "Sophisticated Orchestrating Concurrent DLRM Training on CPU/GPU Platform".IEEE Transactions on Parallel and Distributed Systems (TPDS) (2024). (CCF A 期刊)
Jiang, Jiazhi, Hongbin, Zhang, Deyin, Liu, Jiangsu, Du, Xiaojiao, Yao, Jinhui, Wei, Pin, Chen, Dan, Huang, and Yutong, Lu. "Efficient Coupling Streaming AI and Ensemble Simulations on HPC Clusters." . In European Conference on Parallel Processing (Euro-Par) (pp. 313–328).2024. (CCF B 会议)
Lin, Peijia, Pin, Chen, Rui, Jiao, Qing, Mo, Cen, Jianhuan, Wenbing, Huang, Yang, Liu, Dan, Huang, and Yutong, Lu. "Equivariant Diffusion for Crystal Structure Prediction." . In Forty-first International Conference on Machine Learning (ICML) .2024. (CCF A 会议)
Hu, Nan, Yutong, Lu, Zhuo, Tang, Zhiyong, Liu, Dan, Huang, and Zhiguang, Chen. "Topo: Towards a Fine-grained Topological Data Processing Framework on Tianhe-3 Supercomputer".Journal of Parallel and Distributed Computing (JPDC) (2024): 104926. (CCF B 期刊)
Wen, Yingpeng, Zhilin, Qiu, Dongyu, Zhang, Dan, Huang, Nong, Xiao, and Liang, Lin. "Accelerating Massively Distributed Deep Learning Through Efficient Pseudo-Synchronous Update Method".International Journal of Parallel Programming 52, no.3 (2024): 125–146.
Wei, Yuanxin, Shengyuan, Ye, Jiazhi, Jiang, Xu, Chen, Dan, Huang, Jiangsu, Du, and Yutong, Lu. "Communication-Efficient Model Parallelism for Distributed In-Situ Transformer Inference." . In 2024 Design, Automation & Test in Europe Conference & Exhibition (DATE) (pp. 1–6).2024. (CCF B 会议)
Wen, Yingpeng, Weijiang, Yu, Fudan, Zheng, Dan, Huang, and Nong, Xiao. "AdaNAS: Adaptively Post-processing with Self-supervised Neural Architecture Search for Ensemble Rainfall Forecasts".IEEE Transactions on Geoscience and Remote Sensing (2024).
Du, Jiang-Su, Dong-Sheng, Li, Ying-Peng, Wen, Jia-Zhi, Jiang, Dan, Huang, Xiang-Ke, Liao, and Yu-Tong, Lu. "SAIH: A Scalable Evaluation Methodology for Understanding AI Performance Trend on HPC Systems".Journal of Computer Science and Technology (JCST) 39, no.2 (2024): 384–400. (CCF B 国内期刊)
2023:
Wen, Yingpeng, Weijiang, Yu, Dongsheng, Li, Jiangsu, Du, Dan, Huang, and Nong, Xiao. "CosNAS: Enhancing estimation on cosmological parameters via neural architecture search".New Astronomy 99 (2023): 101955.
Jiang, Jiazhi, Jiangsu, Du, Dan, Huang, Zhiguang, Chen, Yutong, Lu, and Xiangke, Liao. "Full-stack optimizing transformer inference on ARM many-core CPU".IEEE Transactions on Parallel and Distributed Systems (TPDS) 34, no.7 (2023): 2221–2235. (CCF A 期刊)
Jiang, Jiazhi, Zijian, Huang, Dan, Huang, Jiangsu, Du, Lin, Chen, Ziguan, Chen, and Yutong, Lu. "Hierarchical Model Parallelism for Optimizing Inference on Many-core Processor via Decoupled 3D-CNN Structure".ACM Transactions on Architecture and Code Optimization (TACO) 20, no.3 (2023): 1–21. (CCF A 期刊)
Zheng, Jiang, Jiazhi, Jiang, Jiangsu, Du, Dan, Huang, and Yutong, Lu. "Optimizing massively parallel sparse matrix computing on ARM many-core processor".Parallel Computing 117 (2023): 103035. (CCF B 期刊)
Du, Jiangsu, Jiazhi, Jiang, Jiang, Zheng, Hongbin, Zhang, Dan, Huang, and Yutong, Lu. "Improving Computation and Memory Efficiency for Real-world Transformer Inference on GPUs".ACM Transactions on Architecture and Code Optimization (TACO) 20, no.4 (2023): 1–22. (CCF A 期刊)
Jiang, Jiazhi, Rui, Tian, Jiangsu, Du, Dan, Huang, and Yutong, Lu. "MixRec: Orchestrating Concurrent Recommendation Model Training on CPU-GPU platform." . In 2023 IEEE 41st International Conference on Computer Design (ICCD) (pp. 366–374).2023. (CCF B 会议)
ZHU, Wen-long, Jia-zhi, JIANG, Dan, HUANG, and Nong, XIAO. "ParM: A heterogeneous programming model for domestic processors".Computer Engineering & Science (计算机工程与科学) 45, no.09 (2023): 1521. (CCF B 中文期刊)
2022:
Jiang, Jiazhi, Jiangsu, Du, Dan, Huang, Dongsheng, Li, Jiang, Zheng, and Yutong, Lu. "Characterizing and optimizing transformer inference on arm many-core processor." . In Proceedings of the 51st International Conference on Parallel Processing (ICPP) (pp. 1–11).2022. (CCF B 会议)
Huang, Dan, Zhenlu, Qin, Qing, Liu, Norbert, Podhorszki, and Scott, Klasky. "Identifying challenges and opportunities of in-memory computing on large HPC systems".Journal of Parallel and Distributed Computing (JPDC) 164 (2022): 106–122. (CCF B 期刊)
Du, Jiangsu, Jiazhi, Jiang, Yang, You, Dan, Huang, and Yutong, Lu. "Handling heavy-tailed input of transformer inference on GPUS." . In Proceedings of the 36th ACM International Conference on Supercomputing (ICS) (pp. 1–11).2022. (CCF B 会议)
Chen, Lin, Raphael C-W, Phan, Zhili, Chen, and Dan, Huang. "Persistent items tracking in large data streams based on adaptive sampling." . In IEEE INFOCOM 2022-IEEE Conference on Computer Communications (pp. 1948–1957).2022. (CCF A 会议)
Jiang, Jiazhi, Dan, Huang, Jiangsu, Du, Yutong, Lu, and Xiangke, Liao. "Optimizing small channel 3D convolution on GPU with tensor core".Parallel Computing 113 (2022): 102954. (CCF B 期刊)
Du, Jiangsu, Yunfei, Du, Dan, Huang, Yutong, Lu, and Xiangke, Liao. "Enhancing Distributed In-Situ CNN Inference in the Internet of Things".IEEE Internet of Things Journal 9, no.17 (2022): 15511–15524.
2021:
Li, Dongsheng, Dan, Huang, Zhiguang, Chen, and Yutong, Lu. "Optimizing massively parallel winograd convolution on arm processor." . In Proceedings of the 50th International Conference on Parallel Processing (ICPP) (pp. 1–12).2021. (CCF B 会议)
2020:
Huang, Dan, and Yutong, Lu. "Improving the efficiency of HPC data movement on container-based virtual cluster".CCF Transactions on High Performance Computing 2, no.1 (2020): 67–80.
Huang, Dan, Jun, Wang, Qing, Liu, Nong, Xiao, Huafeng, Wu, and Jiangling, Yin. "Enhancing proportional IO sharing on containerized big data file systems".IEEE Transactions on Computers (TC) 70, no.12 (2020): 2083–2097. (CCF A 期刊)
Huang, Dan, Zhenlu, Qin, Qing, Liu, Norbert, Podhorszki, and Scott, Klasky. "A comprehensive study of in-memory computing on large HPC systems." . In 2020 IEEE 40th International Conference on Distributed Computing Systems (ICDCS) (pp. 987–997).2020. (CCF B 会议)
Before 2019:
Luo, Huizhang, Dan, Huang, Qing, Liu, Zhenbo, Qiao, Hong, Jiang, Jing, Bi, Haitao, Yuan, Mengchu, Zhou, Jinzhen, Wang, and Zhenlu, Qin. "Identifying latent reduced models to precondition lossy compression." . In 2019 IEEE International Parallel and Distributed Processing Symposium (IPDPS) (pp. 293–302).2019. (CCF B 会议)
Huang, Dan, Qing, Liu, Scott, Klasky, Jun, Wang, Jong Youl, Choi, Jeremy, Logan, and Norbert, Podhorszki. "Harnessing data movement in virtual clusters for in-situ execution".IEEE transactions on parallel and distributed systems (TPDS) 30, no.3 (2018): 615–629. (CCF A 期刊)
Huang, Dan, Qing, Liu, Jong, Choi, Norbert, Podhorszki, Scott, Klasky, Jeremy, Logan, George, Ostrouchov, Xubin, He, and Matthew, Wolf. "Can i/o variability be reduced on qos-less hpc storage systems?".IEEE Transactions on Computers (TC) 68, no.5 (2018): 631–645. (CCF A 期刊)
Huang, Dan, Jun, Wang, and Dezhi, Han. "Performance Evaluation and Analysis for MPI-Based Data Movement in Virtual Switch Network." . In 2018 IEEE International Conference on Networking, Architecture and Storage (NAS) (pp. 1–4).2018.
Wang, Jun, Xuhong, Zhang, Junyao, Zhang, Jiangling, Yin, Dezhi, Han, Ruijun, Wang, and Dan, Huang. "Deister: A light-weight autonomous block management in data-intensive file systems using deterministic declustering distribution".Journal of Parallel and Distributed Computing (JPDC) 108 (2017): 3–13. (CCF B 期刊)
Wang, Jun, Dan, Huang, Huafeng, Wu, Jiangling, Yin, Xuhong, Zhang, Xunchao, Chen, and Ruijun, Wang. "SideIO: A Side I/O system framework for hybrid scientific workflow".Journal of Parallel and Distributed Computing (JPDC) 108 (2017): 45–58. (CCF B 期刊)
Huang, Dan, Dezhi, Han, Jun, Wang, Jiangling, Yin, Xunchao, Chen, Xuhong, Zhang, Jian, Zhou, and Mao, Ye. "Achieving load balance for parallel data access on distributed file systems".IEEE Transactions on Computers (TC) 67, no.3 (2017): 388–402. (CCF A 期刊)
Huang, Dan, Jun, Wang, Qing, Liu, Xuhong, Zhang, Xunchao, Chen, and Jian, Zhou. "DFS-container: Achieving containerized block I/O for distributed file systems." . In Proceedings of the 2017 Symposium on Cloud Computing (SoCC poster) (pp. 660–660).2017.
Chen, Xunchao, Navid, Khoshavi, Jian, Zhou, Dan, Huang, Ronald F, DeMara, Jun, Wang, Wujie, Wen, and Yiran, Chen. "AOS: Adaptive overwrite scheme for energy-efficient MLC STT-RAM cache." . In Proceedings of the 53rd Annual Design Automation Conference (DAC) (pp. 1–6).2016. (CCF A 会议)
Chen, Xunchao, Navid, Khoshavi, Ronald F, DeMara, Jun, Wang, Dan, Huang, Wujie, Wen, and Yiran, Chen. "Energy-aware adaptive restore schemes for MLC STT-RAM cache".IEEE Transactions on Computers (TC) 66, no.5 (2016): 786–798. (CCF A 期刊)
Huang, Dan, Jun, Wang, Qing, Liu, Jiangling, Yin, Xuhong, Zhang, and Xunchao, Chen. "Experiences in using OS-level virtualization for block I/O." . In Proceedings of the 10th Parallel Data Storage Workshop (pp. 13–18).2015.
Yin, Jiangling, Jun, Wang, Jian, Zhou, Tyler, Lukasiewicz, Dan, Huang, and Junyao, Zhang. "Opass: Analysis and optimization of parallel data access on distributed file systems." . In 2015 IEEE International Parallel and Distributed Processing Symposium (IPDPS) (pp. 623–632).2015. (CCF B 会议)
Tan, Song, Wenzhan, Song, Dan, Huang, Qifen, Dong, and Lang, Tong. "Distributed software emulator for cyber-physical analysis in smart grid".IEEE Transactions on Emerging Topics in Computing 5, no.4 (2014): 506–517.



