Overview
This course covers the foundational ideas for large language models, including core machine learning concepts, the Transformer architecture, and notable LLMs such as BERT, T5, GPT, and more. Students will learn about inputs, input embedding, masked multi-head attention, positional encoding, and transformer hyperparameters. The teaching method includes lecture-style explanations with slides. This course is intended for individuals interested in understanding the fundamentals of large language models and their applications in machine learning.
Syllabus
Intro
Foundations of Machine Learning
The Transformer Architecture
Transformer Decoder Overview
Inputs
Input Embedding
Masked Multi-Head Attention
Positional Encoding
Skip Connections and Layer Norm
Feed-forward Layer
Transformer hyperparameters and Why they work so well
Notable LLM: BERT
Notable LLM: T5
Notable LLM: GPT
Notable LLM: Chinchilla and Scaling Laws
Notable LLM: LLaMA
Why include code in LLM training data?
Instruction Tuning
Notable LLM: RETRO
Taught by
The Full Stack