Fecha: 23/07/2025 12:00
Lugar: Seminario del Departamento de Estadística e I.O.
Grupo: G.I.R. PEM
Abstract:
Large Language Models (LLMs) have revolutionized natural language processing, but much of their training pipeline remains opaque or framed in engineering terms. In this talk, I’ll present a statistical and machine learning perspective on how LLMs are trained, structured around three core phases: autoregressive pre-training, supervised fine-tuning, and reinforcement learning with human feedback (RLHF). We’ll formalize the learning objectives behind each stage, explore consistency guarantees under ideal assumptions, and reinterpret the full pipeline as a sequence of conditional distribution estimators and value-based policy updates. The aim is to demystify LLMs by grounding them in tools familiar to statisticians and ML theorists, without reference to architecture or optimization heuristics.