Large Language Models (LLMs) have demonstrated remarkable efficacy across diverse applications. While multi-layer Transformer and self-attention architecture are believed to play a pivotal role from empirical evidence, it remains a mystery why and how they work together to find a better representation to enable many downstream tasks. In this talk, I will introduce our analysis characterizing the training dynamics of self-attention and project-in MLP layer in a mathematically rigorous manner and provides a hypothesis on how tokens can be combined automatically to form latent hierarchy. Our findings lead to insights of LLMs, such as contextual sparsity and low-rankness of gradients, which in turns leads to novel approaches for more efficient pre-training and fine-tuning approaches for LLMs, such as Deja Vu, H2O, StreamingLLM and GaLore.
Yuandong Tian is a Research Scientist and Senior Manager in Meta AI Research (FAIR), working on more efficient training and inference of Large Language Models (LLMs), understanding of LLM, optimization and reinforcement learning. He has been the main mentor of recent works StreamingLLM and GaLore that improves the training and inference of LLM, and the project lead for OpenGo project that beats professional players with a single GPU during inference. He is the first-author recipient of 2021 ICML Outstanding Paper Honorable Mentions and 2013 ICCV Marr Prize Honorable Mentions, and also received the 2022 CGO Distinguished Paper Award. Prior to that, he worked in Google Self-driving Car team in 2013-2014 and received a Ph.D in Robotics Institute, Carnegie Mellon University in 2013. He has been appointed as area chairs for NeurIPS, ICML, AAAI, CVPR and AIStats.