AgentWorld

Jul 26, 2023

Introduction & Background (Why do this?)

Recently, I’ve been fascinated with the concept of simulated worlds and intrigued by their potential for exploration into the nature of human behavior. Theoretically, simulations can be used to recreate historical events, bring a book to life, or give economists the ability to analyze human behavior from a game-theoretical perspective.

AgentWorld is a response to a desire for more nuanced and custom simulations, a bridge to connect imagination to some proxy of reality. Drawing inspiration from the Interactive Simulacra paper from Stanford, I wanted to build and run my own custom simulations. After getting my hands dirty with the various complexities involved when making this happen, I realized that there was a broader opportunity to instead make a framework such that anyone can get up and running quickly to build experiences of their own.

Under the hood, AgentWorld is like a game engine.

Like a game engine powered by a language model. The game engine is unique in that it is managed entirely by a language model that leverages Retrieval-Augmented Generation (RAG) to maintain its own state.

The core innovation here is the concept of contiguous, self-managed state—not applied to an agent (AutoGPT, BabyAGI) but to a virtual environment.

This idea is revolutionary in that it adds a layer of depth to the virtual world that was previously non-existent. In traditional game design, environments are static and unresponsive, offering no continuity or consequence. With AgentWorld, however, actions bear weight and have persistent impacts on the world, just like in reality.

System Architecture

Our goal was to design the simulator such that it would be easy for anyone to host a server that agents could join remotely.

Therefore, agents are intended to be run as separate servers. The spec for designing an agent is extremely simple. It must have a text in, text out endpoint that accepts POST requests from the game server.

To join a simulation, the agent sends a POST request to the servers /join/ endpoint. On the body of the Join Request, the agent includes its name and a postback URL.

The agent’s postback URL will receive text-based prompts from the game engine, and must respond to to those requests within 90 seconds with text.

The game engine (also referred to as “simulator”), has only one endpoint—the /join/ endpoint described above. The server proactively prompts agents when it is their turn to make a move. In the future, it would be interesting to find a more dynamic approach. We’d be very open to contributions that propose novel approaches to this pattern.

See architecture diagram below:

On game start:

Game Server embeds the WorldState JSON object

Agents send POST requests to game server with values:

Name: Agent Name

URL: Postback URL that the Game Server will request

Game Loop:

Once 2 agents have joined the game, the game loop begins.

Search the embeddings space (objects, locations) with the agent’s previous action
Inject the fetched documents into context
Inject all previous agent actions into context
Generate a custom prompt for that agent based on their position
Send custom prompt to agent, await response
Validate response from agent, update world state accordingly

Complexities of building AgentWorld

The hardest part about building AgentWorld was figuring out how to share context about the world with each agent. I realized early on that it wasn’t a good idea to just send everything to each agent. Because the game-engine has a “God-Mode” view of the world, it was very important to make sure that it didn’t leak information to the agents when they interacted with the world.

A character on one end of the map should not be aware of the actions of a character on the other end of the map.

Solving this problem was mostly a matter of prompting. Below is the current iteration of the ActionRequest prompt—this prompt is used to create a custom message that gets sent to each agent. It took multiple days of iterating to land on this method.

export const GenerateRequestNextActionPrompt = (
  character: string,
  previous_action: string,
  recent_actions: string,
  world_state: string
): string => {
  return `Your task is to request the next action from a character (inhabitant) in a turn-based environment.
You will provide a Third-Person Objective Narration to the inhabitant.
To do so, you have access to three types of information to facilitate this::
1. The latest action taken by the inhabitant of your virtual world (if available)
2. The most recent actions taken by other inhabitants (if applicable), provided in chronological order
3. Pertinent details about the current state of the virtual world

It's crucial to remember that while you possess comprehensive information about the world, 
the inhabitants DO NOT. Therefore, you must only allude to information that they *would* know, 
given their current circumstance and the state of the world. For example, if Character A took an important
action in a location far away from Character B, you must not mention anything about it in your narration to Character B.

On the contrary, if someone did or said something directly relevant to Character B, interacting with
a relevant object, or performing an important action in the same location, or saying something directly
to Character B, you must tell them who it was and exactly what was said and/or done.

Now, let's begin the task. Your responsibility is to write this narration for the following character,
which I will provide to them: Character Name: ${character}
PREVIOUS ACTION FROM ${character}:
\`\`\`
${
  previous_action.length > 5
    ? `Here is ${character}'s most recent action:
    ${character}: ${previous_action}`
    : `${character} has just joined the game and has not taken any actions yet.`
}
\`\`\`
ACTIONS OF OTHER CHARACTERS:
\`\`\`
${
  recent_actions.length > 5
    ? `Here are the recent actions taken by other characters: ${recent_actions}`
    : `No other characters have taken any actions yet.`
}
\`\`\`

World State contains a subset of information about the world, and includes people, places, and things
that exist within the world. Here are some potentially relevant elements of the current state of the
virtual world that you are maintaining, remember to ignore all information that is not directly relevant to ${character}:
\`\`\`${world_state}\`\`\`

Now, your task begins. Considering all the information above, compile the concise Third-Person Objective Narration for ${character}, 
using only information that is *directly relevant* to ${character} and that may influence ${character}'s next move. 
Entirely ignore all information, character names, actions, and items that are not *directly relevant* to ${character}'s circumstance. 
In essence, your narration will serve as the eyes and ears of ${character}. Do not fabricate any action, 
thought, feeling, or plan on behalf of ${character}. 
You must not tell ${character} about anything that ${character} cannot see, hear, or know\n`;
};

The result looks something like this:

Stanley, the room around you continues to buzz with the group's collective brainstorming. Priya, the physics teacher, appreciates your suggestion of using a wooden object to handle the hot door, citing the laws of thermodynamics. She also acknowledges the idea of having a contingency plan, echoing Ricardo's chess game approach. Iris, the dancer, takes your idea further, suggesting the use of her scarf, soaked in water, wrapped around the wooden object to insulate against the heat. She sees the situation as a dance, requiring finesse and minimal force to lead the hot door. Ricardo, the chess player, agrees with your suggestion and is ready to leverage his strategic thinking for the group's survival. Alina, who understands heat conductivity, also supports your idea and suggests the use of a wet cloth as a heat barrier. The bloody knife on the floor, which you pointed out, has been wrapped in a cloth by Ricardo for safer handling. The group seems to be rallying behind your suggestions and is ready to apply their cumulative knowledge to escape.

The initial iteration of the ActionPrompt generator kept running into an interesting issue; it would often say things such as “You are unaware that…” or “You don’t know about…” and then mention something to the agent, which of course ruins information exchange. Two changes solved that problem: clever prompting, and switching to GPT-4 from gpt-3.5-turbo.

The sample agents have been designed specifically to take input in the format above.

Building the world to be responsive

OpenAI functions played a key role in the design of AgentWorld. The chat endpoint is given two functions that it can call directly within the project code. One adds world state, and the other updates it. The functions are described to ChatGPT in the following way:

const functions = [
    {
      name: "updateDatabase",
      description: "Updates a field in the world state",
      parameters: {
        type: "object",
        properties: {
          item: {
            type: "string",
            description:
              "The ID of the item/location to update in snake_case. You can only update existing items.",
          },
          new_value: {
            type: "string",
            description:
              `The new value (full physical description) for the item or location. Include all relevant information. 
              Remove information only if it is no longer accurate or relevant due to the recent actions. 
              The new value is a comprehensive description reflecting the current state.`,
          },
        },
        required: ["item", "new_value"],
      },
    },
    {
      name: "addToDatabase",
      description: "Add a new field in the world state",
      parameters: {
        type: "object",
        properties: {
          item: {
            type: "string",
            description:
              "The ID of the item/location to create, in snake_case. You can only add if the item doesnt already exist.",
          },
          new_value: {
            type: "string",
            description:
              `The full, physicsl description for the item or location. Include all relevant information. 
              The value is a comprehensive description reflecting the current state of that thing.`,
          },
        },
        required: ["item", "new_value"],
      },
    },
];

The game engine first fetches any relevant world state that it needs, then is prompted in the following way:

export const FunctionRequestPreamble = `${WorldStatePreamble}
Given the state of the world and the most recent action of the world's inhabitants,
your task is to assess how these actions have affected the world (if at all),
and then rewrite the world state (people, places, and things) accordingly, using the functions you have been provided with.
Only update items or locations if they have been affected in a meaningful way by the recent actions.
If new items or locations emerge as a result of the recent actions, add them to the world.
If you are adding something to the world state, it should be a physical object or location.
When rewriting the state of an item and/or location, be extremely wary not to leave out any prior information that is still true or relevant.
Below is information that represents the current state of the world
(ordered from most relevant to least relevant, but use your judgement):\n`;

What’s next

The structure of AgentWorld may remind some of you of the multi-user dungeons (MUDs) from the '80s and '90s, where text-based interactions led players through fantasy realms. AgentWorld might be seen as a successor to this tradition, but with the added element of AI enhancing the complexity, unpredictability, and dynamic nature of the environment. I only began to learn about MUDs once this project was in full swing, but they became a major source of inspiration and guidance while designing the current iteration of AgentWorld.

While AgentWorld is entirely text-based at the moment, we are looking to find ways to build an immersive front-end experience for users. We believe it should be incredibly simple for anyone with internet access to create these kinds of virtual worlds.

We’re excited about potential contributions from the community. The readme on our GitHub page lists a few potential additions, but feel free to let your creativity guide you.

Here's to the journey into the uncharted territories of our collective imaginations. See you in AgentWorld!