Yushun Zhang

Ph.D. student,
School of Data Science,
The Chinese University of Hong Kong, Shenzhen, China

Email: yushunzhang [AT] link.cuhk.edu.cn

Google Scholar / GitHub / Twitter / Weibo

About me

I'm a Ph.D student in School of Data Science at The Chinese University of Hong Kong, Shenzhen, China. I'm very proud to be advised by Prof. Zhi-Quan (Tom) Luo. I'm also very fortunate to work closely with Prof. Ruoyu Sun. Previously, I did my undergraduate study in the Department of Mathematics at Southern University of Science and Technology (SUSTech).

My research focuses on optimization, deep learning, and especially, large language models. I am interested in important and practical problems with optimization flavor.

Personal update: I'm very excited to join Moonshot AI (Kimi) as a intern in July 2025.

Biography

2019 - Present: Ph.D. student, The Chinese University of Hong Kong, Shenzhen
2015 - 2019: B.S., Southern University of Science and Technology
2012 - 2015: Shenzhen Foreign Language School
2009 - 2012: Shenzhen Foreign Language School, Branch

Major Research Projects

(My full publication list can be seen in Google Scholar)

Towards Quantifying the Hessian Structure of Neural Networks
Zhaorui Dong*, Yushun Zhang* , Zhi-Quan Luo, Jianfeng Yao, Ruoyu Sun
(*: Equal contribution. Alphabetically ordered.)
Preprint

XX^t Can Be Faster
Dmitry Rybin, Yushun Zhang , Zhi-Quan Luo
Preprint

Finite Horizon Optimization: Framework and Applications
Yushun Zhang , Dmitry Rybin, Zhi-Quan Luo
Preprint

Adam-mini: Use Fewer Learning Rates To Gain More
Yushun Zhang* , Congliang Chen*, Ziniu Li, Tian Ding, Chenwei Wu, Diederik P. Kingma, Yinyu Ye, Zhi-Quan Luo, Ruoyu Sun
(*: Equal contribution.)
(This work is acknowledged at the 2025 Test of Time Award Speech for its contribution to reduce the memory footprint of Adam.)
(See the ICLR official video at approximately 47:10 for more details)
ICLR 2025

Why Transformers Need Adam: A Hessian Perspective
Yushun Zhang, Congliang Chen, Tian Ding, Ziniu Li, Ruoyu Sun, Zhi-Quan Luo
NeurIPS 2024

Adam Can Converge Without Any Modification on Update Rules
Yushun Zhang, Congliang Chen, Naichen Shi, Ruoyu Sun, Zhi-Quan Luo
(This work is acknowledged at the 2025 Test of Time Award Speech as the convergence guarentee of Adam optimizer.)
(See the ICLR official video at approximately 48:55 for more details)
NeurIPS 2022 (Spotlight)

Does Adam Converge and When?
Yushun Zhang, Congliang Chen, Zhi-Quan Luo
ICLR Blog Track 2022

When Expressivity Meets Trainability: Fewer than n Neurons Can Work
Jiawei Zhang*, Yushun Zhang* , Mingyi Hong, Ruoyu Sun, Zhi-Quan Luo
(*: Equal contribution. Alphabetically ordered.)
NeurIPS 2021

Invited Talks

June 2025: I gave a talk at ICTCAS and FAI Seminar, hosted by Bohan Wang and Jiaye Teng. Thanks Bohan and Jiaye for the invitation!

Topic: Towards Quantifying the Hessian Structure of Neural Networks
Slides can be seen here . Recording at Bilibili can be seen here.

Dec 2024: I gave a talk at Tsinghua University, hosted by Kaifeng Lyu. Thanks Kaifeng for the invitation!

Topic: Adam for Transformers: Why and Why Not
Slides can be seen here

Oct 2024: I gave a talk at University of Minnesota, hosted by Prof. Mingyi Hong. Thanks Prof. Hong for the invitation!

Topic: Why Transformers Need Adam: A Hessian Perspective
Slides can be seen here (this is a >40 min version with Adam-mini included)

Oct 2024: I gave a talk at INFORMS Anneal Meeting, Seattle, hosted by Jianhao Ma. Thanks Jianhao for the invitation!

Topic: Why Transformers Need Adam: A Hessian Perspective
Slides can be seen here (this is a 15 min version)

Sep 2023: I gave a talk at Tsinghua University, hosted by Prof. Jian Li. Thanks Prof. Li for the invitation!

Topic: Converge or Diverge? A Story of Adam
Slides can be seen here

Jan 2023: I gave a talk at Google Brain, hosted by Dr. Diederik P. Kingma. Thanks Dr. Kingma for the invitation!

Topic: Adam Can Converge Without Any Modification on Update Rules
Slides can be seen here

Awards

Dec 2023: Duan Yongping Outstanding Resesearch Award (1st place)

Dec 2023: Teaching Assistant Award, School of Data Science

Aug 2022: Best Paper Presentation Award (1st place), 2nd Doctoral and Postdoctoral Daoyuan Academic Forum

Topic: Does Adam Converge and When?
Slides can be seen here.
A short version of this talk can be viewed here.

Jul 2021: Best Paper Presentation Award (1st place), 3rd Tsinghua-Berkeley workshop on Learning Theory

Topic: When Expressivity Meets Trainability: Width < n Can Work
A short version of this talk can be viewed here.

Jun 2019: Magna cum laude of SUSTech

Jun 2019: Outstanding graduation thesis, SUSTech

Sep 2018: Scholarship Award for Excellence, Mathematics department, SUSTech (Top 10 students)

Services

Reviewer

I serve as a reviewer for machine learning conferences including NeurIPS, ICLR, ICML, COLT, AISTATS, as well as journals including JMLR and TMLR.

Social Activities

I hosted a session named “Optimization Issues in Recent AI Models” at INFORMS Anneal Meeting, Oct, 2024.

Teaching Assistant (by time)

DDA4300: Optimization for Machine Learning, by Prof. Yinyu Ye (2023 Spring)

DDA 6060: Machine Learning, by Prof. Hongyuan Zha & Prof. Shuang Li (2022 Spring)

DDA 4002: Multivariate Statistics, by Prof. Zhaoyuan Li (2021 Autumn)

DDA 4250: Mathematics for Deep Learning, by Prof. Arnulf Jentzen (2021 Spring)

MFE 5100: Optimization, by Prof. Zizhuo Wang (2020 Autumn)

STA 2002: Probalility and Statistics, by Prof. Xinyun Chen (2020 Summer)

CSC 4020: Fundamentals of Machine Learning, by Prof. Hongyuan Zha (2020 Spring)

MAT 2040: Linear algebra, by Prof. Shenghao Yang (2019 Autumn)

MAT 7035: Computational Statistics, by Prof. Guoliang Tian (SUSTech) (2018 Autumn)

MA 204: Mathematical Statistics, by Prof. Guoliang Tian (SUSTech) (2018 Spring)

Experiences

2009 - 2012: I spent the best three years at the Shenzhen Foreign Language School, branch.