Editorial front page
DevelopingAI-edited source brief

A Microsoft Researcher Built a Working LLM Inside Age of Empires II—Using Goats

Adrian de Wynter's goat-powered neural network argues that if you think ChatGPT is sentient, you'd have to say the same about a 1999 strategy game.

Published Updated 3 sources0 Reddit2 web85% confidence

What matters

  • Microsoft and University of York researcher Adrian de Wynter built a working LLM-equivalent neural network inside Age of Empires II using goats and NAND gates.
  • The paper, titled "If LLMs Have Human-Like Attributes, Then So Does Age of Empires II," argues that LLMs possess no genuine human-like qualities.
  • The demo challenges the assumption that humanlike tone or persuasiveness in chatbot output implies sentience or understanding.
  • De Wynter has played Age of Empires II since 1999 and combined his gaming and AI expertise for the study.
  • The work was initially spotted by 404 Media and covered by Gizmodo, TechSpot, and XDA Developers.

What happened

Adrian de Wynter, an AI scientist at Microsoft and a researcher at the University of York, published a study with a deliberately provocative title: "If LLMs Have Human-Like Attributes, Then So Does Age of Empires II." To back up the claim, he built a working neural network—capable of replicating how an LLM functions—entirely inside the 1999 real-time strategy game Age of Empires II, using nothing but in-game goats and NAND logic gates.

De Wynter, who has played Age of Empires II since 1999, combined his gaming expertise with his AI research to make a pointed argument. According to reporting by TechSpot and XDA Developers, the study pushes back against researchers and users who assume LLMs and LLM-powered agentic systems possess anthropomorphic qualities—some even arguing chatbots can exhibit moral reasoning or conscious understanding of language.

The goat-based construction demonstrates that the same computational operations underlying modern LLMs can be reimplemented in a game engine using virtual livestock. The implication is direct: if those operations are enough to confer sentience or human-like attributes on ChatGPT, then the same logic would make Age of Empires II sentient too.

Why it matters

The experiment is a thought-provoking rebuttal to one of the most consequential misunderstandings in consumer AI: the tendency to treat chatbots as persons rather than programs. As people interact with LLMs for the first time, the humanlike tone of responses leads naturally to anthropomorphism. Once a system is perceived as a "person," its outputs are weighed differently—sometimes trusted more, sometimes feared more—than they should be.

De Wynter's work underscores that persuasiveness and self-consistency in LLM outputs are measurable properties of the system's design, not evidence of real understanding or simulated behaviour. The goat demo makes the abstract technical argument viscerally concrete: the "intelligence" is in the architecture and training data, not in any emergent humanity.

For the broader tech industry—where agentic AI systems are being deployed into customer service, coding assistants, and decision-support tools—this distinction matters. Over-attributing qualities to LLMs risks poor product design, misplaced user trust, and regulatory confusion.

Public reaction

No strong public signal was available from Reddit or other discussion forums at the time of writing. The story was primarily circulating through tech press coverage, including Gizmodo, TechSpot, and XDA Developers, with initial reporting spotted via 404 Media.

What to watch

  • Whether de Wynter's paper sparks broader academic debate about the benchmarks used to claim "understanding" or "reasoning" in LLMs.
  • How AI labs and product teams respond to growing pressure to frame chatbot capabilities more carefully in consumer-facing interfaces.
  • Whether regulators probing AI safety and sentience claims cite this kind of reductio argument in policy discussions.
  • Further creative demonstrations that make LLM mechanics tangible to non-technical audiences.

Sources

Public reaction

No Reddit or public forum discussion was available at the time of writing. The story was circulating primarily through tech news outlets rather than community discussion threads.

Signals

  • Coverage confined to tech press; no measurable community reaction yet
  • Story likely to generate debate about LLM sentience and anthropomorphism once it reaches wider audiences

Open questions

  • Will the goat demo resonate with non-technical audiences or remain an inside-baseball academic argument?
  • How will AI labs respond to the paper's challenge to sentience claims?

What to do next

Developers

Review how your product's UI language frames LLM outputs—avoid terms implying understanding, intent, or personality.

De Wynter's work shows that humanlike tone is a product of presentation, not evidence of cognition; misleading framing can erode user trust.

Founders

Audit your marketing and investor materials for anthropomorphic claims about your AI's capabilities.

Overstating AI qualities risks regulatory and reputational exposure as scrutiny of sentience and reasoning claims increases.

PMs

Design user-facing disclaimers or onboarding flows that set accurate expectations about what LLMs can and cannot do.

Users naturally anthropomorphize chatbots; proactive framing reduces misuse and misplaced trust.

Investors

Scrutinize startups that base their value proposition on claims of AI 'understanding' or 'reasoning' rather than measurable performance.

The goat demo illustrates that persuasive output is architecturally explainable; investment theses built on emergent sentience are scientifically shaky.

Operators

Ensure internal AI usage policies treat LLM outputs as probabilistic text generation, not expert judgment.

Treating chatbot responses as authoritative can lead to operational errors when the system lacks genuine comprehension of the domain.

Testing notes

Caveats

  • The goat-powered LLM was built by de Wynter as a research demonstration inside Age of Empires II and is not a publicly available tool or API that can be independently tested. Readers can consult the original paper for methodological details.