Following the Mathematics concept section, this section transitions to the Large Language Model, which is the foundation of OpenAI and other GPT concepts.
First, watch [1hr Talk] Intro to Large Language Models by Andrej.
Then Large Language Models in Five Formulas, by Alexander Rush — Cornell Tech
Watch Neural Networks: Zero to Hero
It starts with explaining and coding backpropagation from scratch and ends with writing GPT from scratch.
Neural Networks: Zero To Hero by Andrej Karpathy
He just released a new video → Let’s build the GPT Tokenizer
You can also look at GPT in 60 Lines of NumPy | Jay Mody while you’re at it.
Free LLM boot camp
A paid **LLM Bootcamp** released for free by Full Stack Deep Learning.
It teaches prompt engineering, LLMOps, UX for LLMs, and how to launch an LLM app in an hour.
Now that you’re itching to build after this boot camp,
Build with LLMs
Want to build apps with LLMs?
Watch Application Development using Large Language Modelsby Andrew Ng
Read Building LLM applications for production by Huyen Chip
As well as Patterns for Building LLM-based Systems & Products by Eugene Yan
Refer to the OpenAI Cookbook for recipes.
Use Vercel AI templates to get started.
Participate in hackathons
lablab.ai has new AI hackathons every week. Let me know if you want to team up!
If you want to go deeper into the theory and understand how everything works:
Read papers
A great article by Sebastian Raschka on Understanding Large Language Models, where he lists some papers you should read.
He also recently published another article with papers you should read in January 2024, covering mistral models.
Follow his substack Ahead of AI.
Write Transformers from scratch.
Read The Transformer Family Version 2.0 | Lil’Log for an overview.
Choose whichever format suits you best and implement it from scratch.
Paper
- Attention Is All You Need
- The Illustrated Transformer
- The Annotated Transformer by Harvard
- Thinking like Transformer
Blogs
- Creating a Transformer From Scratch — Part One: The Attention Mechanism (part 2) (code)
- Understanding and Coding the Self-Attention Mechanism of Large Language Models From Scratch by Sebastian Raschka, PhD
- Transformers from scratch
Videos
- Coding a Transformer from scratch on PyTorch, with full explanation, training and inference
- NLP: Implementing BERT and Transformers from Scratch
You can code transformers from scratch now. But there’s still more.
Watch these Stanford CS25 — Transformers United videos.
Some good blogs
- Gradient Descent into Madness — Building an LLM from scratch
- The Illustrated Transformer — Jay Alammar
- Some Intuition on Attention and the Transformer by Eugene Yan
- Speeding up the GPT — KV cache | Becoming The Unbeatable
- Beyond Self-Attention: How a Small Language Model Predicts the Next Token
- Llama from scratch (or how to implement a paper without crying) | Brian Kitano
- Improving LoRA: Implementing Weight-Decomposed Low-Rank Adaptation (DoRA) from Scratch
Watch Umar Jamil
He has fantastic in-depth videos explaining papers. He also shows you the code.
- LoRA: Low-Rank Adaptation of Large Language Models — Explained visually + PyTorch code from scratch
- Mistral / Mixtral Explained: Sliding Window Attention, Sparse Mixture of Experts, Rolling Buffer
- Attention is all you need (Transformer) — Model explanation (including math), Inference and Training
- LLaMA explained: KV-Cache, Rotary Positional Embedding, RMS Norm, Grouped Query Attention, SwiGLU
- Retrieval Augmented Generation (RAG) Explained: Embedding, Sentence BERT, Vector Database (HNSW)
Some more links related to LLMs that are not exhaustive. Look at LLM Syllabus for a more comprehensive syllabus for LLMs.
Learn how to run open-source models.
Use ollama: Get up and running with Llama 2, Mistral, and other large language models locally
They recently released Python & JavaScript Libraries
Prompt Engineering
Read Prompt Engineering | Lil’Log
ChatGPT Prompt Engineering for Developers by Ise Fulford (OpenAI) and Andrew Ng
DeepLearning.ai also has other short courses you can enroll in for free.
Fine-tuning LLMs
Read the Hugging Face fine-tuning guide.
A good guidebook: Fine-Tuning — The GenAI Guidebook
Check out axolotl.
This is a good article: Fine-tune a Mistral-7b model with Direct Preference Optimization | by Maxime Labonne
RAG
A great article by Anyscale: Building RAG-based LLM Applications for Production
A comprehensive overview of Retrieval Augmented Generation by Aman Chadha
Reading all these would help you to speed up and understand the detail of LLMs.