What an LLM actually is..
I met a guy in a Dutch class I was taking. He was a cook and he used ChatGPT to get recipes. I explained what ChatGPT actually was and it kind of blew his mind. He had thought that it was some kind of magic search engine that would look through all the information on the Internet to answer your questions. It is not even close to that.
If you, or someone you know, thinks that ChatGPT or any “AI” “knows” things, or “finds answers,” or “figures things out,” then this post is for you. You have been sold something that isn't quite what it seems.
I'm going to do my best to explain this in as similar a way as possible to the first time I explained it. This means I'll simplify things a bit. When I throw technical language or details, it will only be so that you can dig in deeper if you want. It won't be necessary to understand anything in depth to grasp the general concept.
I'm adding a bit more context than when I first explained it. But I'm also going to try to keep this as short as possible, because it's actually a lot less complex to understand at a high level than you might have been lead to believe.
We're going to start simple, and build up. Imagine you counted each letter in this script.
Now you turn those into statistics.
from string import ascii_uppercase
fname = "what_an_llm_really_is.org"
with open(fname) as fd:
script_text = fd.read().upper()
counts = {l:script_text.count(l) for l in ascii_uppercase}
total_letters = sum(v for v in counts.values())
probabilities = {k:v/total_letters for k,v in counts.items()}
return probabilities
| A | 0.07698545710115154 |
|---|---|
| B | 0.014110396280477916 |
| C | 0.030869870789857815 |
| D | 0.0335189490187598 |
| E | 0.11493755744174732 |
| F | 0.01816510785532789 |
| G | 0.02659890793101584 |
| H | 0.036113964426663786 |
| I | 0.06368600313564361 |
| J | 0.0022706384819159865 |
| K | 0.009460993674649943 |
| L | 0.04854841325620371 |
| M | 0.02713953614099584 |
| N | 0.06406444288262962 |
| O | 0.07979672379304752 |
| P | 0.025085148943071848 |
| Q | 0.001351570524949992 |
| R | 0.062064118505703626 |
| S | 0.07363356219927555 |
| T | 0.10174622911823539 |
| U | 0.030923933610855815 |
| V | 0.014218521922473915 |
| W | 0.02297669892414986 |
| X | 0.0038925231118559765 |
| Y | 0.017191977077363897 |
| Z | 0.0006487538519759961 |
Your highest probability letters would usually be “ETOANRISH”, but for us it's “ETOSARINL”. Not far off.
'J' makes up about .2% of letters, while 'Q' and Z are both under .01%. This isn't too far off what we'd expect from a normal distribution. 'S' is a bit more common than we'd expect. There are some things skewing these stats, for this specific set, but that will probably become obvious later.
Great.
If we take those stats we can generate “words.” Let's just choose 10 letters and see what happens…
from random import choices
from string import ascii_uppercase
fname = "what_an_llm_really_is.org"
with open(fname) as fd:
script_text = fd.read().upper()
counts = {l:script_text.count(l) for l in ascii_uppercase}
hat = ''.join(l*c for l,c in counts.items())
return [''.join(choices(hat, k=10)) for _ in range(5)]
['ANGOUCGOUE', 'SNEKEEOHRE', 'TYEOWTOPLR', 'THUGESTCNG', 'TEOEVSGINE']
'SNEKEEOHRE'
That doesn't really sound a lot like English, does it?. Neither is 'ANGOUCGOUE' or 'TYEOWTOPLR.' 'THUGESTCNG' includes an English word, so we're not too far off the mark. Perhaps we can improve this by including a bit more information. What if we get the average word size?
fname = "what_an_llm_really_is.org"
with open(fname) as fd:
script_text = fd.read().upper()
last_offset = 0
offset = script_text.find(" ")
lengths = {}
while offset > 0:
length = offset-last_offset
if length not in lengths:
lengths[length] = 0
lengths[length] += 1
last_offset = offset
offset = script_text.find(" ", offset+1)
word_count = sum(v for v in lengths.values())
length_stats = {l:c/word_count for l,c in lengths.items()}
return length_stats
| 8 | 0.056666666666666664 |
|---|---|
| 5 | 0.12333333333333334 |
| 3 | 0.12111111111111111 |
| 4 | 0.15 |
| 9 | 0.028888888888888888 |
| 7 | 0.06666666666666667 |
| 2 | 0.11333333333333333 |
| 6 | 0.07777777777777778 |
| 10 | 0.012222222222222223 |
| 12 | 0.023333333333333334 |
| 11 | 0.008888888888888889 |
| 16 | 0.005555555555555556 |
| 30 | 0.0044444444444444444 |
| 1 | 0.12555555555555556 |
| 18 | 0.006666666666666667 |
| 20 | 0.01888888888888889 |
| 24 | 0.0033333333333333335 |
| 14 | 0.01 |
| 19 | 0.011111111111111112 |
| 17 | 0.0011111111111111111 |
| 23 | 0.0011111111111111111 |
| 59 | 0.0011111111111111111 |
| 21 | 0.008888888888888889 |
| 22 | 0.005555555555555556 |
| 13 | 0.006666666666666667 |
| 26 | 0.0011111111111111111 |
| 56 | 0.0011111111111111111 |
| 43 | 0.0022222222222222222 |
| 54 | 0.0011111111111111111 |
| 15 | 0.0022222222222222222 |
Cool. Now let's generate some “sentences.”
from random import choices
len_pop = [i[0] for i in length_stats]
len_weights = [i[1] for i in length_stats]
letter_pop = [i[0] for i in script_word_stats]
letter_weigths = [i[1] for i in script_word_stats]
lengths = choices(len_pop, weights=len_weights, k=5)
return ' '.join(''.join(choices(letter_pop, weights=letter_weigths, k=l)) for l in lengths)
'N GITE SDNAT YPDSRGTC SWONRTLOEA'
Rad. Death metal lyrics.
We're getting a bit closer. If you run it a bunch sometimes short words will come out like “I.” You might even get an “AM” or “DOG” every now and then, but it mostly sounds like something uttered by a Lovecraftian horror. But… it's getting a bit closer, isn't it? Little by little.
It's not hard to make things that are even more like words.
We know that there are more rules to English spelling than letter probability. I bet you could recite one:
“I” before “E” except after “C.”
It does seem to “know” that some words are only one letter but it doesn't “know” that “N” is not one of them. And what about other letter pairs? There are a lot of pairs of letters that are common in English, like “th.”
We really need to be using more information. We could probably massively improve things just by tracking the probability of the next letter based on the current letter. That would capture things like “th” and “ie.” If we tracked two letters, that would also give us the “CEI” thing.
Well, this is actually a thing and it has a name. It's called a “Markov model.” It's a bit more complicated than my basic statistics example, so I'm not going to write one up. Instead I'm going to use someone else's.
git clone https://github.com/ggouzi/markov-word-generator.git
You can download this markov word generator if you would like. It can produce words like…
from markov_word_generator import MarkovWordGenerator, WordType
generator = MarkovWordGenerator(
markov_length=2,
language="en",
word_type=WordType.WORD,
)
return '\n'.join(generator.generate_word() for _ in range(10))
virs lep undingers perstorable posedgentionating whadlistubtly sed pes delingeoulle marks
These sound much more like real words. Sometimes it even makes real words like “marks.”
Great.
The more previous letters we use to determine the statistical probability of the next word, the more often we produce to real words.
from markov_word_generator import MarkovWordGenerator, WordType
generator = MarkovWordGenerator(
markov_length=5,
language="en",
word_type=WordType.WORD,
)
return '\n'.join(generator.generate_word() for _ in range(10))
sinner offprintential novices chapman downfall unpoetic ironwork sidest acupunctuary penicilling
Now what if instead of words we build this type of model for sentences?
Turns out there's code for that too. Let's use it on some of my writing.
pip install markovify --break-system-packages
from os.path import expanduser
from glob import glob
import markovify
text = ""
for fn in glob(expanduser("~/Writing/posts/posted/*.org")):
with open(fn) as fd:
text += fd.read()
text_model = markovify.Text(text)
return text_model.make_sentence()
He told me how close the bullet had been sliced up and was having lemon poured over the world.
Well that's strangely dark. It doesn't completely make sense, but it does sound almost human. If you've read my personal stuff, you may recognize the vibe. But hey! It's much better than those sentences we started with earlier, isn't it?
Maybe we should run it on this script to get something a little less dark.
import markovify
with open('what_an_llm_really_is.org') as fd:
text = fd.read()
text_model = markovify.Text(text)
return text_model.make_sentence()
Basil shows up a lot of things that simply can't be represented with statistics.
Hey, it almost sounds like something I've said. I mean, technically I did say it… just then…
Let's compare our first sentence with this one to see how far we've come:
'N GITE SDNAT YPDSRGTC SWONRTLOEA'
“Basil shows up a lot of things that simply can't be represented with statistics.”
As we add more and more statistical data we start to get things that seem more and more “real” or “alive.” Even making small changes to the models, still only using basic statistics, can give us really significant improvements in behavior.
The differences between the first sentence and the second all comes from the amount of statistical data we're capturing, and the way we're “tokenizing” or “breaking up the things we're doing statistics on.”
Now, if you extrapolate from the difference between between the first model using raw letter counts, and the second based on a Markov model, you start to see how a much more complex statistical model could give you output like you see from LLMs like ChatGPT.
But none of this is knowledge. This program isn't “thinking about things I wrote and coming up with a sentence based on them.” It's a simple statistical operation. The output is essentially random, constrained by the weights we calculated from the input data.
An LLM is just a very complex statistical model built using unimaginably large amounts of data, to produce an unimaginably large statistical model.
A simple auto-complete can do a pretty good job of guessing what you might want to type next by using a Markov model, like we've already used, to generate options. If you start writing on your phone and then you hit auto-complete a bunch of times, you'll probably get a sentence that's not too different form the one we generated with the Markov model just now.
And that's basically what an LLM does. It takes your input, shoves a special prompt at the beginning, and passes it through the statistical model to produce some output based on your input. It's the world's most advanced auto-complete.
It's advanced, using really incredible math, but it's not smart. It's just ducking auto-complete.
Ducking.
Duck.
duck…
So let's go back to the recipe thing.
>>> Please give me a good salad recipe that uses wawai (sea mango). Wawai, also known as sea mango or mangrove mango, is a unique fruit found in tropical regions. It has a distinct flavor and texture compared to regular mangoes and can be used in various dishes, including salads. Here’s a simple yet refreshing salad recipe that incorporates wawai:
### Ingredients:
- 2 ripe wawai fruits
- 1 red onion, finely chopped
- 1 cucumber, thinly sliced
- 1 cup fresh cilantro leaves
- Juice of 1 lime
- 2 tablespoons olive oil
- Salt to taste
### Instructions:
- *Prepare the Wawai:* Wash and peel the wawai fruits if necessary. Cut them into small pieces.
- *Chop the Onion:* Finely chop the red onion.
- *Slice Cucumber:* Thinly slice the cucumber.
- *Chop Cilantro:* Finely chop the cilantro leaves.
- *Combine Ingredients:* In a large bowl, combine the wawai, onion, cucumber, and cilantro.
- *Add Lime Juice and Olive Oil:* Pour in the lime juice and olive oil over the salad ingredients. Mix well to ensure everything is coated evenly.
- *Season with Salt:* Season with salt to taste.
This salad makes for a light and refreshing meal perfect for hot days or as a side dish. Enjoy!
“Sea Mango” or “wawai” is Cerbera manghas. It's deadly. It's literally poison. Do not make this recipe. If you ate this, you would die.
People who pay attention to popular culture may know it from “White Lotus.” Yeah. That's the plant this LLM just gave me a recipe for. This is suicide salad. Don't make this.

So that was obviously cheating a bit. I'm using a smaller model, one that I can actually run on my computer. It doesn't have all the guard rails that ChatGPT and Google have developed based on their repeated failures.
Google's AI will warn you that this is poisonous. After people point out some off-the-wall thing like this, LLM companies like Google tune their AI to provide warnings or fix obvious errors. We don't know how many other pieces of deadly advice still exist, or how many other ways you can run into the same deadly advice. These “fixes” don't actually fix everything because they can't. (That would be impossible... but explaining that is much harder, so we'll skip it for now.) These adjustments just skew the stats a bit so you're more likely to get some kind of warning if you ask how to do something dangerous.
So an LLM can use statistical representations to generate something that actually may work as a real recipe. If you want something with pasta, it may recommend tomatoes. Basil shows up a lot with tomatoes, especially in pasta, so it will probably show up too. Salt is in almost everything, so you'll probably get salt. You can start to see how a recipe can come together from just stats.
You can definitely catch the “vibe” of a recipe with a statistical model alone, just like you can catch the vibe of English from a Markov model. That doesn't mean it will be the best, or even be especially good.
What you're not getting is a “hand crafted recipe just for you,” or “the best recipe that a super-intelligence could possibly produce.” It doesn't understand flavors, or how to blend them, or what cheese smells like. It's just randomly creating text with similar statistical properties as the input.
It may well be like if you someone who had never tasted a type of cuisine read a whole bunch of books about it and then made a recipe. Like when Dutch Hello Fresh gives you a recipe for Mexican food and it uses pasta sauce instead of salsa.
You probably won't die, but you probably won't find anything really innovative. It may not taste bad, but it also may not be any good.
Now, I said that LLMs don't “search the Internet to find answers.”

That's true, but it can be a bit more complex than that. It can be really hard to square what I've just said with Google's AI search results that reference specific pages. So let me explain that a bit.
There's a thing called “Retrieval Augmented Generation” that weights the statistics a little more heavily towards some input. Google search works exactly the way it has for years before the advent of LLMs. Google then takes the top matching pages, already ranked using it's existing algorithm, and dumps the top matching ones into the LLM to skew the stats in favor of that content.
And with that, (intentional or not) Google creates the really convincing illusion of a smart AI that searches the Internet to answer your questions.
So instead of getting a recipe out of a blender, you get one that's essentially plagiarized from a real human… a real human who won't be able to monetize their work through ads because you've viewed their content from Google instead of from their site.
Wow. Now they're gonna starve. I hope you're happy with yourself.
All of this relies on page rank working correctly. But sometimes you search for something and something… else… comes up.

If game lore happens to be the thing that comes up at the top of your search results, Google's LLM may confidently inform you about a fantasy world. The LLM doesn't know the difference between fantasy and reality. It's all just math. That's it… and a lot of things can't be represented with statistics. Not everything can even be represented with math.
So if some nerds get together and start making jokes about the name collision between “The Doctor” from Star Trek: Voyager and “The Doctor” from Doctor Who, an LLM might…

…come to the wrong conclusion.
These problems get fixed, manually, after they occur. So at any point you could just randomly run in to one of these and you would only know if you start with the understanding that the LLM may well be wrong, and may be wrong quite often.
It may work, for now, for some things. But it's definitely not magic. It's a tool. It can do some interesting things.
LLMs can be really useful for very specific things, like natural language processing. They are far better at processing large amounts of language data than any previous method. That image of about “The Doctor” was found quickly using an LLM.

But they're also extremely bad at a lot of things. The more niche the knowledge is, the more wrong it will tend to be. There are a lot of things that simply can't be represented mathematically right now at all, those things are extremely wrong.
For example, computer security really revolves around the difference between how people expect a system to work and the way a system actually behaves. There are some statistical associations between syntax and unexpected behavior, but semantics, the meaning of the thing, is often not even documented much less available as statistics. If people don't even know to write something down then a machine definitely can't extract that information documentation. Security is fundamentally a creative process, which may not even be possible to automate.
Just like how medicine is really about empathy, and law is really about story telling, and… there are a lot of things we keep being told these statistical models will do. The fact is, they won't. This math won't do that, because it can't.
Not only will LLMs never do these things. We really have yet to see any evidence that there is any math, anything we could possibly encode into an algorithm, that can.
And these models will probably just get a lot worse. There are a lot of reasons why, but let's play with our Markov model again.
What happens when you use the output Markov model as input to itself?
from pprint import pprint
from random import choice
from markov_word_generator import MarkovWordGenerator, WordType
wbg = {}
generator = MarkovWordGenerator(
markov_length=1,
# dictionary_filename="what_an_llm_really_is.dict",
language="en",
word_type=WordType.WORD,
)
for g in range(300):
words = [generator.generate_word() for _ in range(100)]
wbg[g] = choice(list(words))
with open("tmp.dict", 'w') as fd:
fd.write('\n'.join(words))
generator = MarkovWordGenerator(
markov_length=3,
dictionary_filename="tmp.dict",
)
pprint(wbg)
pprint(set(words))
{0: 'madugeroliveerufondefuns',
1: 'nerpranoaliere',
2: 'rdvests',
3: 'tcibederanutrsiorphalat',
4: 'walowadia',
5: 'rdvesteravouslpang',
6: 'sibsteravouslpang',
7: 'esteranoaliere',
8: 'walowadia',
9: 'flinantirs',
10: 'ntirs',
11: 'ntirs',
12: 'rdvesteravouslpang',
13: 'fomont',
14: 'rdvesteravous',
15: 'we',
16: 'we',
17: 'we',
18: 'rdvestere',
19: 'ss',
20: 'cos',
[...]
30: 'cos',
31: 'ss',
32: 'walowadick',
33: 'rdvesteravouslpang',
34: 'walowadick',
35: 'ss',
36: 'walowadick',
37: 'rdvesteravouslpang',
38: 'rdvesteravouslpang',
39: 'ss',
40: 'fomont',
41: 'ss',
[...]
79: 'rdvesteravouslpang',
80: 'rdvesteravouslpang',
81: 'cos',
82: 'rdvesteravouslpang',
83: 'cos',
84: 'cos',
85: 'rdvesteravouslpang',
86: 'cos',
87: 'cos',
[...]
115: 'rdvesteravouslpang',
116: 'cos',
117: 'cos',
118: 'cos',
119: 'cos',
120: 'cos',
121: 'cos',
122: 'rdvesteravouslpang',
123: 'rdvesteravouslpang',
124: 'cos',
125: 'cos',
126: 'rdvesteravouslpang',
127: 'cos',
128: 'cos',
[...]
297: 'cos',
298: 'cos',
299: 'cos'}
{'cos'}
We generated 100 words from an English training set, then generated 100 more from the 100 we generated, and so on. We repeated this 299 times. Each time we chose a random word from the 100 as a sample, to kind of “see how things are going.”
This model is tuned down to 1 character of “attention,” (that means, we're only looking at one previous character to figure out the next) so it's not great. “fomonts” looks a bit like English. We selected a real word, “we,” on the 15th iteration (15th time feeding it to itself), and chose that a couple more times. At iteration 20 we produce “cos.” We'll see this again… 'walowadick' is a fun word that shows up on iteration 32. I'm going to have to use somewhere and see if I can get away with it. It looks a bit like 'walowadia,' which showed up on iteration 8. By iteration 49 we're starting to fall into a pattern of producing either 'rdvesteravouslpang' or 'cos,' and by 127 we're only producing 'cos'.
We can see at the end that every word we output is just “cos” over and over again.
This is what LLMs are starting to do to themselves. Humans are still producing content, sure, but LLMs are also being used to produce content. They can produce way more content, way faster than humans. So LLMs start to consume more and more LLM generated output, in some kind of reverse ouroboros or LLM human centipede. LLM centipede. Slopipede. A walowadick, if you will.
These systems are not just raw statistical models. They're engineered, tweaked, refined to work in very specific ways. They are built to make us trust them, to make us believe what they're saying is legitimate.
But these systems are also feeding off our data. That means we actually can also manipulate their output.
walowadick:
- noun
- LLM quality degradation manifesting from the ingestion of LLM output, repeated over multiple iterations.
- The intentional manipulation of LLM output as a form of collective resistance.
The LLM feedback loop has become a complete walowadick.
- adjective
- having the properties of LLM quality degradation that manifest from LLM feedback loops.
- having the properties of intentional output manipulation as an LLM.
It's not just slop. It's walowadick slop.
If text shows up enough, if it's linked to enough, it will end up in the LLM. It will change the output. Some of this manipulation can be prevented, but words are interesting. Anyone can make up a joke word that can influence LLMs, but those words can't always be removed or fixed.
Words, by their usage, become real. So those who realize what LLMs are and don't want to see them everywhere can resist. That resistance can be encoded in language, and that language can leave an indelible mark in LLMs.
fomont:
- noun
- A fake word that one pretends to be a real word, that ceases to be a fake word by people choosing to make it a real word, which they do as a joke.
They saw the 'fnord' and realized that it had thereby become a fomont.
I'm not telling anyone to stop using LLMs. I'm just saying that you should know enough about the tools you use to keep yourself safe. You wouldn't pick up an angle grinder or a circular saw without at least a basic understanding of safety. LLMs can be at least that dangerous.
But if you understand what's really going on, then you're that much less likely to get hurt. I hope this clears a few things up.
Edit:
I assume that there's a real linguistic term that means the same things as “fomont” and @etymology_nerd has probably already made a video about it... I guess we'll see.
Also, thanks again to @unchartedworlds@scicomm.xyz for catching typos!