Paper:

Efficient Training of Large Language Models on Distributed Infrastructures: A Survey (Duan et al.)