CMSC 848O, Spring 2025, UMD
Assignments
Schedule
Make sure to reload this page to ensure you're seeing the latest version.
Week 1 (1/27-29): introduction, neural language models
-
- Course introduction // [slides]
- No associated readings!
-
- Language model basics // [slides]
- [reading] Jurafsky & Martin, 3.1-3.5 (language modeling)
- [reading] Jurafsky & Martin, 7 (neural language models)
Week 2 (2/4-6): Attention, Transformers, scaling
-
-
- Transformer language models // [notes]
- [reading] Vaswani et al., NeurIPS 2017 (paper that introduced Transformers)
- [optional reading] An easy-to-read blog post on attention
- [optional reading] Training Compute-Optimal Large Language Models (Hoffmann et al., 2022)
Week 3 (2/11-13): LLM post-training, usage, and evaluation
-
- Instruction tuning & RLHF // [notes]
- [reading] Instruction tuning (Wei et al., 2022, FLAN)
- [reading] Reinforcement learning from human feedback (Ouyang et al., 2022, RLHF)
-
Week 4 until end of semester: Student presentations & discussion of research papers
- Topics (papers to be posted soon!)
-
- Extending LLMs from short context to long context: continual pretraining, mid-training, post-training
- Efficient attention mechanisms: pros and cons
- Architectural modifications: state space models (e.g., Mamba) and hybrid models (e.g., Jamba)
- Efficient implementations of vanilla attention: flash attention, ring attention
- Evaluation of long-context language models: perplexity, point-wise retrieval, summarization, QA, etc.
- Synthetic data generation for long-context instruction following and reasoning
- Generating long outputs from long inputs
- Long context vs. RAG