Infinite Zork: when procedural generation meets large language models

I've always found procedural generation in games fascinating. First introduced by roguelikes, procgen allows for generating game levels programmatically, creating indefinitely replayable experiences through combinatorial variety. While procgen is good at creating interesting spatial structures and mechanical systems, it traditionally struggles (despite template systems and generative grammars) with generating rich, natural-language content that makes text adventures compelling.

When LLMs started getting better with GPT-3 and onwards, I recognized their potential to complement procedural generation. LLMs can produce fluent natural language but have their own limitations: when asked to generate creative content without guidance, they tend to produce repetitive, predictable results. They need constraints and variety to work with.

This is where combining these technologies works. Procedural generation handles structure: map layouts, room types, semantic constraints. LLMs handle language: descriptions, dialogue, objects. The procgen provides variety through combinations, and gives the LLM boundaries that improve its output. The LLM adapts to those constraints in ways that feel natural. Each approach covers what the other lacks.

For this project, I'm focusing on the classical text adventure: parser-based, dungeon-crawl-minus-monsters, with puzzles, treasures, and scores. The medium has evolved quite a bit since the original Zork and Adventure, but for exploring procgen, the classic formula provides the clearest framework.

If you'd like to skip directly to the game, click here.

The Approach

The entire world is generated upfront before the player starts exploring. This keeps generation tractable: it allows me to control API costs and generate as many worlds as needed. The end result is a game engine that can load different procedurally generated worlds, giving each playthrough its own unique map and content. The game is "infinite" in the sense that there's a program that can generate countless distinct worlds on demand, not that any single world is unbounded.

The architecture is a hybrid: procgen creates the map structure and defines semantic constraints for each room: room types, tags, themes, and such. The LLM takes over from there, filling in natural language descriptions and generating items that fit those constraints. This way, procgen provides variety through sheer combinatorial explosion (e.g., a large number of possible room configurations), while the LLM provides the creative writing and later parses natural language into structured output.

I've made some pragmatic choices to keep this first phase tractable. The game map is set on a perfect 2D grid with movement restricted to cardinal directions only, without diagonals, up or down. Although the map is technically a directed graph, the edges are symmetric: going east and then west will lead you back to where you started. There are also no NPCs in this implementation, just rooms, items, and objects.

The hardest technical problem is the actual content: room generation. Rooms don't exist in a vacuum: every room leads to at least one other room, and the exits that lead to those rooms are part of the room's description. How do you describe what might lie beyond a doorway when that room hasn't been generated yet? I'll dive into this in detail shortly.

This post covers the first phase: building a foundational world that players can explore. The result will be a text adventure where players can walk around, examine their surroundings, look at things, and pick up items. Think of it as a world ready for adventure, but without the actual adventure, meaning no puzzles to solve yet. The second phase, which comes after, will transform this explorable world into a proper game with puzzle generation and polish.

Parser and UI

To make this playable, I needed a parser and UI, though these aren't the focus of this post. The parser is deliberately simple: it supports <action> or <action> <target> commands with common shortcuts for movement and examination, plus basic disambiguation when needed. For the UI, I built a web-based interface using PyScript to run the game engine directly in the browser without a separate backend. If the project matures, I might write a separate post about running Python 3.13 in browsers and the quirks I encountered, but for now, it's enough to know that the game is playable.

With the plumbing out of the way, let's focus on the interesting part: generating the world itself.

Map generation

There are plenty of dungeon map generators out there, of course, but quite a few are made for roguelikes where each location is a tile rather than a "room," so the structures aren't directly comparable. I've made my own simple generator instead, which typically creates "corridors" of size 3 to 5 with occasional branching.

Room and object generation

The LLM takes some inputs such as tags, theme, room type, etc. and generates a room description first. Afterward, it starts building a map of examinable and interactable objects in the room. Each object gets aliases if needed, so a "greatsword" can be referred to as "blade" or "sword" or "weapon."

Here, I had to ensure that base room descriptions do not contain objects that can be picked up. Because that would create a problem of combinatorial explosion of descriptions: I'd have to generate slight variations of the same room where item X is prominently displayed or not, which would result in $2^n$ combinations. Therefore, I explicitly prompted the LLM not to create "loose" objects in the description. Item generation was a separate process, and they were only added separately after the room is generated.

Descriptions and interconnectedness

The rooms don't exist in a vacuum. Every room leads to at least one other room, and the exits that lead to those rooms are part of the room's description.

This creates a problem: if we generate a description while knowing nothing about the neighboring rooms, the description will be either incomplete or inconsistent.

There are many ways to tackle this. I considered a two-pass approach where I'd create room descriptions in isolation, explicitly instructing the LLM not to speculate about exits and where they lead, then do a second pass on descriptions after gaining full knowledge of each room's content. A variation on this would be a "diffusion" approach: allow the LLM to speculate, then do multiple iterations until a "judge" LLM decides that neighboring rooms are consistent in their descriptions.

I could side-step the issue by ignoring it or claiming a surreal setting where the world is not entirely coherent. While tempting, this would end up with the game losing its classical text adventure vibe.

Finally, I could also do a single-pass, semantics-driven approach. As briefly mentioned earlier, procgen generates a semantic "room configuration" to be passed to the LLM later, in the form of room type, tags, and such. This means we're not completely clueless even during the first pass and can use the tags to decide on how the exits may look. I decided to go with a variation this: combining semantic config of yet-to-be-generated neighbors with the explicit descriptions of already-generated ones. In the end, I got something that looks natural, in a way that aligns with the adventurer discovering the world.

Here's an example from one of the room descriptions:

The workshop opens to the north through an ornate archway leading to the Marble Antechamber, while a fog-shrouded doorway to the south leads deeper into the underground city.

Here, the Marble Antechamber is a room the adventurer has already likely seen because it lies closer to the starting room, whereas the room to the south is to be discovered.

A current limitation here is that the description does not change once the room is discovered, unlike some modern IFs, which is a problem I've yet to tackle.

Breaking monotony

Getting back to the "LLMs are not creative" point from the intro: my early experiments took place in a steampunk setting, and for the LLM that meant a mandate to fill every room with brass pipes and clockwork, which understandably gets grating past the third room. Even the procedural generation elements that I use, which gave certain semantic cues, did not help much there.

Giving examples of what was previously generated, prompting the LLM to avoid previous examples, adding even more semantic cues and playing with the temperature helped with that, but I believe there's more work to do here.

LLM failures

The models never ceased to amaze with the occasional failures during object detection and alias generation, such as:

Skipping over the obvious aliases and going straight to the obscure ones: "a frozen greatsword" could have "ice blade", "frost weapon", "chilled claymore", "icy great blade" but not "sword",
Generating multiple "objects" for what is semantically the same thing in the room. A "glass hexagonal display" in the main description, if referred to twice, could cause the creation of a "glass display" and "hexagonal display" as separate objects, not just aliases.

Tweaking prompts and model choices was enough to address these issues.

Lore

Creating a game taking place in a fictional world also means world-building to a certain degree. This can be a bit of a problem, since objects in the world represent a certain lore, and in a coherent game, lore should be consistent across rooms.

Imagine the game taking place at a fictional company's headquarters. If two different rooms were to contain documents talking about history, since they are nearly independently generated, it's possible that they would give contradicting information and break immersion.

For now, I've (mostly) side-stepped this issue by focusing on physical descriptions without heavy lore. Contradicting lore is still possible, however, in the "visual" elements like the company logo. In the future, I might consider tracking lore across room generations.

Final output

Here's a sample game from my last generation, taking place in a cyberpunk setting. While I've decided to keep the generation code itself private for now, I'm sharing this playable output to demonstrate what the system can create.

As described earlier, the game allows walking around, examining objects, collecting items (either in the room itself or hidden in containers), and will congratulate the player upon reaching maximum score.

What's Next

Right now, the engine generates explorable worlds: players can navigate interconnected rooms, examine objects, and pick up items. The descriptions feel more natural than what I started with, and the rooms connect in ways that make sense as you discover them.

It's not a game yet but an interesting place to wander. Phase 2 will change that by tackling the hardest problem: puzzle generation. How does one procedurally create puzzles that are solvable, satisfying, and fit naturally into the world? That's the challenge I'm excited to dig into next.