AgentWorld
Jul 26, 2023
Introduction & Background (Why do this?)
Recently, I’ve been fascinated with the concept of simulated worlds and intrigued by their potential for exploration into the nature of human behavior. Theoretically, simulations can be used to recreate historical events, bring a book to life, or give economists the ability to analyze human behavior from a game-theoretical perspective.
AgentWorld is a response to a desire for more nuanced and custom simulations, a bridge to connect imagination to some proxy of reality. Drawing inspiration from the Interactive Simulacra paper from Stanford, I wanted to build and run my own custom simulations. After getting my hands dirty with the various complexities involved when making this happen, I realized that there was a broader opportunity to instead make a framework such that anyone can get up and running quickly to build experiences of their own.
Under the hood, AgentWorld is like a game engine.
Like a game engine powered by a language model. The game engine is unique in that it is managed entirely by a language model that leverages Retrieval-Augmented Generation (RAG) to maintain its own state.
The core innovation here is the concept of contiguous, self-managed state—not applied to an agent (AutoGPT, BabyAGI) but to a virtual environment.
This idea is revolutionary in that it adds a layer of depth to the virtual world that was previously non-existent. In traditional game design, environments are static and unresponsive, offering no continuity or consequence. With AgentWorld, however, actions bear weight and have persistent impacts on the world, just like in reality.
System Architecture
Our goal was to design the simulator such that it would be easy for anyone to host a server that agents could join remotely.
Therefore, agents are intended to be run as separate servers. The spec for designing an agent is extremely simple. It must have a text in, text out endpoint that accepts POST requests from the game server.
To join a simulation, the agent sends a POST request to the servers /join/ endpoint. On the body of the Join Request, the agent includes its name and a postback URL.
The agent’s postback URL will receive text-based prompts from the game engine, and must respond to to those requests within 90 seconds with text.
The game engine (also referred to as “simulator”), has only one endpoint—the /join/ endpoint described above. The server proactively prompts agents when it is their turn to make a move. In the future, it would be interesting to find a more dynamic approach. We’d be very open to contributions that propose novel approaches to this pattern.
See architecture diagram below:
On game start:
Game Server embeds the WorldState JSON object
Agents send POST requests to game server with values:
Name: Agent Name
URL: Postback URL that the Game Server will request
Game Loop:
Once 2 agents have joined the game, the game loop begins.
Search the embeddings space (objects, locations) with the agent’s previous action
Inject the fetched documents into context
Inject all previous agent actions into context
Generate a custom prompt for that agent based on their position
Send custom prompt to agent, await response
Validate response from agent, update world state accordingly
Complexities of building AgentWorld
The hardest part about building AgentWorld was figuring out how to share context about the world with each agent. I realized early on that it wasn’t a good idea to just send everything to each agent. Because the game-engine has a “God-Mode” view of the world, it was very important to make sure that it didn’t leak information to the agents when they interacted with the world.
A character on one end of the map should not be aware of the actions of a character on the other end of the map.
Solving this problem was mostly a matter of prompting. Below is the current iteration of the ActionRequest prompt—this prompt is used to create a custom message that gets sent to each agent. It took multiple days of iterating to land on this method.
The result looks something like this:
Stanley, the room around you continues to buzz with the group's collective brainstorming. Priya, the physics teacher, appreciates your suggestion of using a wooden object to handle the hot door, citing the laws of thermodynamics. She also acknowledges the idea of having a contingency plan, echoing Ricardo's chess game approach. Iris, the dancer, takes your idea further, suggesting the use of her scarf, soaked in water, wrapped around the wooden object to insulate against the heat. She sees the situation as a dance, requiring finesse and minimal force to lead the hot door. Ricardo, the chess player, agrees with your suggestion and is ready to leverage his strategic thinking for the group's survival. Alina, who understands heat conductivity, also supports your idea and suggests the use of a wet cloth as a heat barrier. The bloody knife on the floor, which you pointed out, has been wrapped in a cloth by Ricardo for safer handling. The group seems to be rallying behind your suggestions and is ready to apply their cumulative knowledge to escape.
The initial iteration of the ActionPrompt generator kept running into an interesting issue; it would often say things such as “You are unaware that…” or “You don’t know about…” and then mention something to the agent, which of course ruins information exchange. Two changes solved that problem: clever prompting, and switching to GPT-4 from gpt-3.5-turbo.
The sample agents have been designed specifically to take input in the format above.
Building the world to be responsive
OpenAI functions played a key role in the design of AgentWorld. The chat endpoint is given two functions that it can call directly within the project code. One adds world state, and the other updates it. The functions are described to ChatGPT in the following way:
The game engine first fetches any relevant world state that it needs, then is prompted in the following way:
What’s next
The structure of AgentWorld may remind some of you of the multi-user dungeons (MUDs) from the '80s and '90s, where text-based interactions led players through fantasy realms. AgentWorld might be seen as a successor to this tradition, but with the added element of AI enhancing the complexity, unpredictability, and dynamic nature of the environment. I only began to learn about MUDs once this project was in full swing, but they became a major source of inspiration and guidance while designing the current iteration of AgentWorld.
While AgentWorld is entirely text-based at the moment, we are looking to find ways to build an immersive front-end experience for users. We believe it should be incredibly simple for anyone with internet access to create these kinds of virtual worlds.
We’re excited about potential contributions from the community. The readme on our GitHub page lists a few potential additions, but feel free to let your creativity guide you.
Here's to the journey into the uncharted territories of our collective imaginations. See you in AgentWorld!