The Language Machine

Why Large Language Models Aren’t Thinking, They’re Reshaping How We Use Language

Sep 16, 2025

TL;RL: Stop asking “Is it thinking?”; ask “What new things become possible when a library of frozen human patterns can talk back?”

The great artificial intelligence debate rages on. Will these systems save humanity or doom it? Are they artificial minds? Or is it all just meaningless statistics? Yet in institutions across the world, a quieter revolution unfolds. A teacher generates personalised mathematics problems in minutes rather than hours. A marketing team halves their copywriting workload while improving output quality. A museum curator discovers she can craft compelling exhibition wall text in multiple versions—scholarly, accessible, and child-friendly—drawing from vast historical and cultural knowledge in minutes rather than days.

These practitioners aren’t debating artificial general intelligence or alignment problems. They’re solving immediate challenges with tools that feel fundamentally different from previous software. The mystery isn’t whether these systems qualify as “intelligent”, nor why they excel at certain tasks while failing spectacularly at others. Rather, it’s why our current explanations—ranging from “autocomplete on steroids” to “emerging superintelligence”—capture so little of what actually occurs.

The answer lies in recognising what these systems truly model. They aren’t modelling intelligence at all. They’re modelling language—and language, it transpires, proves far more structured and powerful than we typically realise.

What Language Actually Contains

We tend to conceive of language as words strung together in sequence, but this misses its deeper architecture. Language represents sedimented human experience:^⁠ layers of knowledge, patterns, and practices embedded in how we communicate that have been deposited over centuries.1 This language-as-knowledge-system contains far more structural sophistication than we commonly acknowledge.

Consider how you actually deploy language. You don’t construct sentences word-by-word from scratch. Instead, you adapt chunks of meaning—familiar phrases, sentence structures, narrative patterns—to your current context. “Thanks for your email” becomes “Thanks for flagging this” becomes “Thanks for the heads up.” You’re working with pre-existing patterns within this language-as-structure, not assembling raw components.

This operates across multiple hierarchical levels simultaneously. We recognise topic sentences, classic essay structures, legal brief conventions, academic article formats, three-act dramatic structures, television series arcs, fiction genres, even the Marvel Cinematic Universe’s narrative logic. Each level contains its own structural patterns that shape how meaning gets organised and communicated.

Large language models have learned to manipulate this same hierarchical structure, though at unprecedented scale. Early word2vec models2 captured relationships between individual terms, allowing computers to understand that words like ‘king’ and ‘queen’ are related in similar ways to ‘man’ and ‘woman’. Attention mechanisms enabled systems to work with phrases and clauses by considering blocks of text at a time, rather than individual words. Increased parameters and wider context windows allowed recognition and manipulation of higher-order patterns: argumentative structures, narrative arcs, genre conventions.3

Yet this sedimentation is partial and lossy. Language captures what humans have managed to encode linguistically, but significant dimensions of knowledge—tacit expertise, embodied skill, contextual judgment—remain beyond linguistic representation, beyond description. When the curator transforms an LLM’s wall text into compelling exhibition copy, they’re drawing on institutional knowledge, aesthetic sensibility, and audience awareness that exist outside the patterns any language model could reconstruct.

Why Capabilities Appear to ‘Emerge’

This progression illuminates one of LLM development’s most puzzling aspects: why capabilities seem to emerge suddenly at scale. What appears to be mysterious intelligence materialising is actually systems becoming capable of recognising and manipulating progressively higher-order linguistic structures that were always present in human communication.

A system confined to word-to-word associations will seem fundamentally limited. One that recognises argumentative structures spanning paragraphs, or narrative arcs across entire documents, will feel qualitatively different—not because it became ‘smarter’, but because it gained access to more of language’s inherent hierarchical organisation.

This explains why scaling feels so dramatic from a user’s perspective. Moving from sentence-level patterns to document-level or genre-level patterns isn’t merely quantitative improvement—it unlocks entirely different categories of linguistic work. The capabilities were always latent in linguistic structure; systems simply became powerful enough to access and manipulate them.

What we term inconsistent or ‘jagged’ performance4 isn’t random. Different types of problems emerge at different scales—factual errors with individual words, structural issues at the paragraph level, and hollow conventions at the genre level.5 The improvement trajectory thus isn’t simply about “more intelligence”. It concerns systems becoming capable of engaging with more of language’s inherent structure. They’re not getting smarter—they’re getting better at working with the sophisticated organisational patterns that already exist in human communication.

The Consciousness Distraction

Before examining how these language-as-structure machines actually function, we must address another debate consuming enormous attention: whether LLMs possess consciousness. Functionalists argue that if LLMs behave as though conscious, we must consider they might be. David Chalmers has suggested the probability of LLMs developing consciousness within ten years might reach twenty percent.6 While the parsimonious explanation for LLM behaviour remains that they’re sophisticated language machines, the argument proceeds that since we cannot definitively rule out consciousness—along with the existential risks this might entail—we must take the possibility seriously.

Yet this focus on potential consciousness comes at a cost. It diverts attention from how adoption actually unfolds today and the issues it raises, while fixating on possible futures that will likely never eventuate. What Gerald Gaus termed “the tyranny of the ideal”.7

Meaning is enacted, not deferred. It doesn’t happen in hypothetical futures; it occurs in present interactions between users, tools, and contexts. Consciousness debates defer engagement with present realities in favour of speculative futures, when the meaningful work is being done now. The curator crafting accessible interpretations of complex cultural artefacts, the teacher creating personalised problems—that’s where meaning resides.

Consciousness, too, is a phenomenon of the present. It’s a verb in the present tense, not a noun awaiting future discovery. Any discussion that isn’t grounded in observable, present-tense evidence—behaviour, architecture, immediate effects in the world—slides into untestable metaphysics. We cannot measure the consciousness of a future AGI any more than we can measure the anger of a hypothetical person.

This reframes the debate entirely, forcing attention away from “Could it be conscious?” toward questions that actually matter now: What is it doing? How is it being used? What are its effects? How do we govern it appropriately?

By focusing on the present, we’re not ignoring the future; we’re building a responsible path toward it, grounded in observable reality.

Even those who argue that consciousness could emerge from the right computational patterns must acknowledge that all proposed markers for consciousness point to specific measurable features. Current LLMs categorically lack these features. The precautionary debate can be tabled until systems develop measurable signatures of consciousness rather than mere behavioural similarities.

The Responsive Index: How Language Machines Work

Understanding what LLMs actually do requires a fundamental shift in metaphor. You’re not conversing with a mind; you’re querying a responsive index of human linguistic patterns—a library that can engage in dialogue. Unlike traditional libraries where you must know what to seek, this responsive index can actively synthesise and reconstruct answers from the entire collection based on your query.

Consider the analogy of a calculator. Calculators don’t “do mathematics”—they manipulate mathematical representations rapidly and accurately. You input problems using symbolic notation (a formal subset of language), and the calculator processes those symbols according to mathematical rules (a grammar) to produce results.

LLMs play a similar role with language-as-structure. They manipulate linguistic representations at scale. You input prompts using natural language, and the system processes those linguistic patterns according to statistical regularities learned from training data.8 LLMs, however, don’t simply retrieve and recombine—the grammar-following of the calculator—they generate novel combinations that weren’t explicitly present in training data.

This analogy has limits. Mathematical operations yield stable results across contexts—2 + 2 equals 4 regardless of who calculates it. Linguistic operations depend fundamentally on interpretive context. When an LLM generates literary analysis or strategic advice, its value depends entirely on the reader’s expertise, purpose, and judgment. The calculator metaphor obscures this crucial difference: mathematical computation is context-independent, while linguistic reconstruction requires human interpretation to become meaningful.

Crucially, LLMs don’t “remember” information through recall—they reconstruct it by regenerating the strongest patterns. When you inquire about the Mona Lisa, the system doesn’t retrieve stored facts; it reconstructs the linguistic patterns most strongly associated with that concept from its training. This holds true even for multimodal LLMs, which map between different symbolic representations—“the Mona Lisa” as text and the Mona Lisa as pixels represent different pattern spaces the system has learned to connect.

This explains why you can reference obscure works or ideas and the system appears to “know” them—if they appeared in the training corpus, the linguistic patterns clustering around those references have been indexed and can be reconstructed.

This process—reconstructing linguistic patterns rather than retrieving stored information—explains both the capabilities and peculiarities of these systems. When an LLM ‘hallucinates’, it’s not randomly generating content or misremembering. Rather, it recognises that the pattern under construction requires a specific element—a date, citation, or fact—but when that piece doesn’t exist strongly enough in learned patterns, it fabricates something fitting the structural requirements. The fabrication follows the grammar of what should occupy that position, even when the specific semantic content is invented. For instance, an LLM asked to summarise a non-existent book might invent a plot, characters, and even a publication date, because the linguistic pattern of a book summary requires these elements. The LLM isn’t ‘lying’; it’s simply filling in the blanks based on the most probable linguistic structures. This makes LLMs much more than stochastic parrots (a term coined by Bender, Gebru, and others),9 because they move beyond the idea of simple regurgitation. LLMs can and do generate novel content—even if that content is wrong—in service of completing a linguistic pattern.

This pattern reconstruction framework also explains why techniques like chain-of-thought prompting work—but not for the reasons we typically assume. When you ask a system to “think step by step”, you’re not inducing reasoning. Instead, you’re signalling that the complete response pattern should include both a conclusion and the type of reasoning chain that typically accompanies such conclusions in the training data. The system reconstructs both simultaneously, generating reasoning-shaped content to match the structural requirements of “how problems like this get solved in text”. The chain of thought isn’t a logical progression toward an answer—it’s post-hoc justification generated to complete the expected pattern. This is why chain-of-thought can produce seemingly logical chains that lead to wrong conclusions, or generate different reasoning paths to identical answers. Both the ‘reasoning’ and ‘conclusion’ are fabricated together to create a coherent linguistic pattern, not because one actually leads to the other.

Why do these language machines excel at coding or mathematical reasoning? The answer reinforces rather than undermines this framework—code represents a formal language with its own grammar and conventions, while mathematical word problems are linguistic constructs following predictable structural patterns. When an LLM solves a coding problem, it manipulates the linguistic patterns of programming syntax and algorithmic description, not performing computation as a calculator processes mathematics.

This explains why LLMs prove simultaneously more capable and more limited than expectations suggest. They can work with linguistic patterns in seemingly magical ways—generating coherent responses to complex prompts, maintaining context across extended conversations, even composing moving poetry. But they remain fundamentally bounded by linguistic patterns in their training data. They can recombine and interpolate within that space brilliantly, fabricating content to complete patterns when necessary, but cannot transcend the structural logic of their training.

This has profound implications beyond capability limitations. Since LLMs reconstruct patterns from sedimented human experience, their outputs inevitably encode the biases, blind spots, and power structures embedded in that historical accumulation. And pattern reconstruction isn’t passive—it’s selective archaeology. Who curates the training corpus, how texts are weighted, what gets filtered out, which languages and dialects are included—these decisions embed contemporary power structures alongside historical ones. When a system generates a job description, it’s not just condensing centuries of professional communication patterns; it’s reflecting the specific curation choices made by those with sufficient capital and compute to build these systems.

Thanks for reading The Puzzle and Its Pieces! This post is public so feel free to share it.

Creativity as the Dividing Line

Margaret Boden (who passed away this past July)10 developed frameworks for understanding computational creativity that cut directly to the heart of current LLM debates.11 Her distinction between psychological creativity (novel for the individual) and historical creativity (novel for human culture) explains why LLMs can feel so creative while remaining fundamentally bounded by their training data.

LLMs excel at psychological creativity—combining and exploring existing ideas and problem spaces in ways that feel fresh to users—but they’re structurally limited in historical creativity because they can only recombine patterns from existing linguistic traditions. Understanding this distinction transforms how one approaches prompting and sets appropriate expectations for what these systems can and cannot accomplish.

Rather than seeing an LLM as a tool that simply automates tasks, it can be viewed as a collaborator that helps a person explore the design space of a creative problem. A musician might ask an LLM for variations on a melody or chord progression. The LLM provides a range of recombinations or extrapolations (psychological creativity), which the human then uses as a springboard for genuine, historical creativity. The LLM is a cognitive prosthetic, enabling the musician to explore a larger and richer range of creative options than they could without the LLM’s help.

Boden’s work on creativity deserves centrality in our understanding of these systems, yet remains largely absent from contemporary discussions. We’ve become so focused on the latest benchmarks that we’ve forgotten decades of careful thinking about the very phenomena we’re now encountering at scale.

Learning to Communicate with Language Machines

Linguist Robert Hopper’s research on telephone conversation12 reveals something crucial about how we coordinate meaning through constrained mediums. During phone calls, you cannot rely on physical presence or immediate context. Instead, you develop conversational strategies—methods for checking understanding, managing turn-taking, and building shared meaning through sequential exchanges.

Telephone etiquette wasn’t innate—early adopters had to develop it and those who came latter had to learn it. Early phone users struggled with timing, turn-taking, call initiation and termination, and conveying meaning without visual cues. What now feels natural required collective learning and gradual development of new communication protocols.

Human-LLM interaction operates similarly. You’re coordinating meaning with a system through text, developing strategies for effective prompting, learning to iterate and refine responses. The most successful LLM users have unconsciously developed what amounts to telephone etiquette for working with statistical language patterns.

Just as we developed telephone conventions through experimentation, we’re now collectively developing “LLM etiquette”—a new protocol for querying the responsive index—learning when to be explicit, how to build context, what these systems handle well and poorly. However, while early telephone users had to learn new communication protocols, they were still communicating with other humans. LLM interaction involves learning to communicate with pattern reconstruction systems that have different constraints and failure modes than humans. Our current struggles with LLM interaction aren’t signs of fundamental limitations but the natural learning curve of adapting to a new communication medium.

This emerging etiquette includes techniques like asking systems to “think step by step”—not because this induces reasoning, but because we’ve learned that certain linguistic patterns produce more reliable outputs. We’re unconsciously discovering which structural cues help these systems reconstruct stronger patterns, even when we misunderstand why these techniques work.

This also explains why single-prompt interactions often disappoint while iterative conversations succeed. Just as telephone conversations require back-and-forth to establish shared understanding, effective LLM use involves building context through multiple exchanges with these responsive indexes of human linguistic patterns.

What This Changes

Understanding LLMs as language machines rather than intelligence engines transforms how one should approach using them.

For individual work, you can externalise thoughts into language-as-structure and then manipulate them at scale. Need to explore different ways to frame a proposal? Generate multiple approaches and refine them iteratively. Working through complex analysis? Use the system to help organise and restructure your thinking. This doesn’t differ in principle from using writing to clarify thoughts—it’s simply vastly more powerful in scope and speed.

For organisational applications, the question shifts from “which jobs will AI replace?” to “where does linguistic work occur, and how might it be augmented?” Customer service scripts, report writing, policy documentation, training materials—all involve manipulating language according to organisational patterns that can be captured and scaled.

For strategic thinking, LLMs excel at working within established knowledge domains but struggle with genuinely novel situations requiring movement beyond existing linguistic patterns. A consulting firm using an LLM to explore variations on established business frameworks will gain valuable insights, but asking it to invent entirely new strategic paradigms for unprecedented market conditions will likely produce sophisticated-sounding but fundamentally derivative advice. They’re powerful tools for exploring known problem spaces and inadequate for breaking entirely new ground.

The real transformation proves quieter but more profound: we’re developing new ways to think with and through language-as-structure at unprecedented scales.13 The curator crafting accessible interpretations, the teacher creating personalised problems, the marketing team reducing copywriting time—all participate in this larger shift, learning to work with linguistic prosthetics through practical experimentation.

Patterns in the Investigation

The case of the mysterious intelligence turns out to involve mistaken identity. We thought we were investigating artificial minds, but discovered sophisticated tools for working with the structure of human language and accumulated knowledge.

Instead of asking “how intelligent are these systems?” we should ask “what becomes possible when humans can manipulate meaning at the scale of entire linguistic traditions?” That’s not merely a technical question—it’s a fundamental question about the nature of thinking, communication, and knowledge work itself.

The curator, teacher, and marketer may not realise they are manipulating sedimented human experience through responsive indexes of linguistic patterns—but we now understand the profound tool they are learning to use. They’re not collaborating with artificial minds; they’re learning to communicate with libraries that can engage in dialogue, developing new forms of etiquette for working with the accumulated structure of human language-as-knowledge-system.

This shift raises genuine governance challenges today—labour displacement, epistemic pollution, technological centralisation—that demand urgent attention, even if consciousness isn’t among them. The investigation continues, but this lens fundamentally changes what we’re examining.

The real question, then, is not what we’re building, but what we will become as we learn to think with our own language externalised and amplified.

Language compresses human experience but this isn’t the full picture. Much remains beyond what can be captured in words—embodied knowledge, intuitive judgment—lived experience that resists description.

Word2vec is a computer technique that learns to represent words as numbers by analysing how words appear near each other in large amounts of text, allowing computers to understand that words like ”king” and “queen” are related in similar ways to “man” and “woman”.

It’s worth noting that while language is a profound repository of human intelligence, it is not its entirety. There are things beyond description—raw, non-linguistic experience. Lacan calls this the Real; it is where Derrida’s hors-texte lives. LLMs, operating solely on linguistic representations, cannot access this realm.

Dell'Acqua, F., McFowland, E., III, Mollick, E. R., Lifshitz-Assaf, H., Kellogg, K. C., Rajendran, S., Krayer, L., Candelon, F., & Lakhani, K. R. (2023). Navigating the jagged technological frontier: Field experimental evidence of the effects of AI on knowledge worker productivity and quality. Harvard Business School Working Paper, No. 24-013. https://www.hbs.edu/faculty/Pages/item.aspx?num=64700.

Some sudden capability jumps seem less about accessing higher linguistic levels than about how examples happened to cluster during training. The emergence of abilities like ASCII art or arithmetic may depend more on statistical accidents in the training data than on linguistic hierarchy.

Huckins, Grace. “Minds of Machines: The Great AI Consciousness Conundrum.” MIT Technology Review, August 16, 2023. https://www.technologyreview.com/2023/10/16/1081149/ai-consciousness-conundrum/.

Gaus, Gerald F. The Tyranny of the Ideal: Justice in a Diverse Society. Princeton University Press, 2016.

The precise how of this process—the specific representations formed within the model’s neural networks—remains a complex black box of matrix mathematics. Yet, as with the development of steam engines before the full understanding of thermodynamics, we can productively understand its function and outputs without a complete mechanistic description.

Bender, Emily M.; Gebru, Timnit; McMillan-Major, Angelina; Mitchell, Margaret (2021). “On the Dangers of Stochastic Parrots: Can Language Models Be Too Big?”. Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency. doi:10.1145/3442188.3445922.

Communications. “Obituary: Professor Maggie Boden.” The University of Sussex, July 28, 2025. https://www.sussex.ac.uk/broadcast/read/68608.

Boden, Margaret A. The Creative Mind: Myths and Mechanisms. 2. ed., Reprint. Routledge, 2005.

Hopper, Robert. Telephone Conversation. Indiana University Press, 1992.

To be clear, describing this shift as “quieter” is not to understate its potential impact. The ability to manipulate the collective knowledge and communicative structures of our species at scale is a power unlike any we have previously developed. While its engine may be a language machine and not a mind, the second-order effects on society, economy, and culture could be just as revolutionary as any intelligence explosion, though different in effect.

Tony Rifkin

Sep 18

Excellent, Peter.

It’s particularly interesting that – while honing down the speculated grandeur of LLMs to a more immediate reality – the grandeur is still there in many respects... just not where it’s been touted it will be.

But how nice. As getting to see what they ARE, rather than having to imagine what they'll be, is so much more to the point. For example, on one hand, “They’re powerful tools for exploring known problem spaces and inadequate for breaking entirely new ground.”. But at the same time, even within that bound, in the right hands, new ground will be broken! For as you describe, the LLMs are a tool that will give us an unprecedented access to OUR knowledge and what we can do with it (though I do love that we’ll be doing that through the “sedimented... layers of knowledge”; put on your hardhats boys and girls, we’re goin’ in).

At this stage though, what is necessary for the use of the tool is 1) a better understanding of *what* the LLMs are accessing, and 2) better means for our accessing and utilizing it.

With 1, you note the linguistic (as well as formal) structures the LLMs are accessing, which they rely on for generating their responses. This is essential. And it is because of these that the breathtaking jumps in quality given by scaling “emerge” (thanks for that one Peter; maybe throw in a little catastrophe theory here too! :) )

But how good are their uses of those structures?! That (in my belief) is where WE will have to come in with #2, through our communications with the LLMs as you discussed. You noted the difference between single uses (e.g., for summarizations) and more extended dialogues with the LLMs that can take one deeper. And the protocols that will have to be learned and used to get what one wants.

Both of these will be true. But I think a key will also be on the human side of these collaborations as well. We’re getting to see knowledge and question it in an unprecedented manner. That may be a new experience for many, but they will now get to experience it. And if brought into education appropriately, that will be a very good thing. But note too, those deeper dialogues are what experts do already! So this is very much *not* a foreign matter. As we'll be able to draw upon what’s already natural for us. And in this, our knowledge may then get to become more our own? Sweet.

P.S., note that cleaning up the ‘hallucinations’ in the dialogues may be possible too (https://theconversation.com/why-openais-solution-to-ai-hallucinations-would-kill-chatgpt-tomorrow-265107), though regrettably... well, you’ll see in that article.

Expand full comment

3 replies by PEG and others

3 more comments...

The Puzzle and Its Pieces