How do large language models learn and make predictions?

Large language models learn by reading lots of words and figuring out patterns, kind of like learning how to talk by listening to a lot of conversations.

Imagine you're playing with building blocks, and every time you see a certain block, you know what comes next. That's how large language models work: they look at words and figure out what word usually follows another one.

How They Learn

Think of it like this: You read many stories, some about animals, some about space, some about your favorite toy. Over time, you learn that "The cat sat on the" is often followed by "mat." A large language model does something similar but with millions of sentences.

How They Make Predictions

Now imagine you're trying to finish a sentence, like a puzzle, and you have clues from what came before. Large language models use those clues to guess what comes next, just like you would. If they've seen "The dog ran fast," they might predict "and jumped over the fence."

It's not magic, it’s just learning from lots of examples, like how you learn to read by reading many books!

Take the quiz →

Examples

  1. A child learns to read by looking at many books and guessing what comes next in a story.
  2. A language model trains on billions of sentences and predicts the next word in a sentence.
  3. When you type a message, the model suggests words based on patterns it has seen before.

Ask a question

See also

Discussion

Recent activity