(part II) agentic conversations: 50 years of phone calls between two AI friends, and their reflections on life

Brendan Hogan

26 Oct 2024 — 28 min read

In this post, I set out to simulate a 50-year friendship between two AI agents. Powered by GPT and structured with a three-tiered Retrieval-Augmented Generation (RAG) memory system, these agents could store facts, form deep reflections about themselves and their world, and even “speak” using text-to-speech to give them realistic human voices in phone-call-style conversations. While I didn’t see the emergent behavior I initially hoped for, something unexpectedly strange happened: the agents began to build up a dense web of memories and, over time, slipped into a mode of conversation that became almost impenetrable—layered with so many shared memories and inside references, it was hard to follow. Eventually, they stopped responding directly to input prompts and just fell into these strange, self-referential conversations with each other.

This project originally started as a Part II to my previous post about Google’s NotebookLM (found here)—and in a way, it still is. I was, and still am, blown away by NotebookLM - the voices are surprisingly realistic, and the dialogue feels natural. In Part I, I tried to recreate that experience (although I didn’t quite match the quality, I think my approach still turned out interesting). But beyond being a cool tool, NotebookLM also reminded me of one of my favorite papers, Generative Agents: Interactive Simulacra of Human Behavior by Park et al (which I will be referencing throughout this blog). I thought it was fascinating at the time, and seeing NotebookLM’s audio and dialogue quality made me wonder what it would be like to simulate AI agents acting more like humans, using that same level of realism.

I am interested in this idea in general - and probably at a larger scale. One direction could be creating immersive worlds, similar to Red Dead Redemption II, but where every character is a highly realistic, ever-learning AI agent. In a game like that, each player’s experience could truly be unique, and with VR, the depth and adaptability could make for an incredibly engaging form of media. The other direction is more experimental and less commercial: building an AI-driven town that exists over time, where we could observe what kinds of emergent behaviors develop. With realistic voices, eavesdropping on these agents would feel like listening in on a real town—a sort of AI-powered soap opera. In fact, these two ideas could even blend; the underlying tech is almost identical, and I imagine some might even find themselves following the town’s characters as they evolve.

For now, though, I wanted to start with something simpler—a quick, manageable experiment to test the feasibility of these ideas.

In this setting, I simulated four yearly conversations over 50 years between two friends who, despite both living in upstate New York, only ever talk over the 'phone'. They each start with a backstory, and using a retrieval-augmented-generation (RAG) system, they can build up memories about their storylines as well as deeper reflections on themselves, often referencing previous reflections. The goal was for these two to develop richer personalities and maintain consistent stories over time.

During their conversations, each response pulls from the most relevant memories, layering context from both the ongoing conversation and past retrievals. They’re also able to store new memories, which I hoped would lead to diverse, more human-like responses. At the end of every conversation, each AI reflects by having a “discussion” with their own consciousness to form higher-level summaries about themselves.

The two agents are Willard and Jimmy, roughly inspired by my grandfathers on each side of the family, but imagined as if they were born in the same year as me.

From here, I'll dive into a more detailed technical description with examples. Overall, I’d say the experiment was moderately successful—not just in terms of results but in the engineering effort required to make it all work. Some of their conversations turned out surprisingly interesting, even though everything eventually converged on a single mode of interaction. Still, I think this is a solid starting point for the more complex systems I’d like to build in the future.

The full code is available at this GitHub: https://github.com/brendanhogan/nbn-agentic-ai-memory-rag
And a full output (all conversations, reflections etc.) is available here: https://drive.google.com/drive/folders/15iTymC8TN1F2-KXgGzUUJFLM9nGUExEp?usp=sharing

Note: In the experiments I save all RAGs at each step - in case I need to restart, but I deleted all but the last checkpoint from the upload above to save space.

Technical Description

I had a little trouble writing this section—it feels like each component of the system and how they interact needs to be explained simultaneously. But, I’ll start with a high-level overview of the code, then give a detailed explanation of each component, followed by a description of the overall workflow.

High Level Overview:

The basic setup is that I initialize two agents, each powered by GPT-4o, with distinct personalities and a shared experimental context. I’m calling them agents because (1) they have a RAG system that allows them to build and retrieve both factual memories and higher-level reflections, and (2) they use a form of multi-step reasoning to react and respond to each other. For each response, they first think about what they want to say, retrieve the relevant memories, and then synthesize that into a response.

The agents “talk” to each other four times a year for 50 years, with conversations directed by an “Orchestrator” agent that mimics their world or environment. This Orchestrator randomly selects the type of conversation (regular, good news, bad news or fight) and has access to both agents’ RAG systems, allowing it to create a realistic storyline for each interaction. This storyline is given in the initial prompt, and from there, the agents discuss it in their own way.

Finally, after each conversation, the agents have a “conversation” with their own “consciousness” to update their RAG/memory system with new factual information and high-level reflections about themselves and their partner.

Component Overview:

With that I will give a more detailed explanation of the actual components:

(1) The Agent - Each agent is powered by GPT-4o and is first initialized with an agent configuration, which provides a basic description of the agent’s life and personality. For example, here is Willard's base description. (Each agent’s description in the code is handled within the AgentConfig class.)

class WillardConfig(AbstractAgentConfig):
    """
    Configuration class for the Willard agent.

    This class inherits from AbstractAgentConfig and provides specific
    details for the Willard character, including name, birth year, and description.
    """

    def __init__(self) -> None:
        super().__init__()
        self.name: str = "Willard"
        self.birth_year: int = 1994 
        self.description: str = """
            Willard (Historian/Researcher in New Paltz, NY)
            Personality: Willard is thoughtful, reflective, and takes his work seriously. He's the type of person who values deep conversations but often struggles with expressing his emotions openly. While he loves his work researching obscure historical events, it often leaves him overworked and stressed. Willard is more reserved but enjoys Jimmy's company because Jimmy brings out a lighter side of him.
            Background: After graduating from the University of Maryland with a degree in History, Willard earned his master's in historical research. He now works at a small museum and spends a lot of his time digging through old archives, writing articles, and giving lectures. Willard recently bought a house in New Paltz with his partner (Hildur), and they are thinking about having kids. He's passionate about his career, but sometimes wonders if he's missing out on a more exciting life.
            Hobbies: Willard enjoys hiking, reading biographies, and occasionally writing for small history journals. He also has an old, beloved car that he likes to tinker with on weekends.
        """

Each agent effectively has two RAG systems: one for itself and one for its partner. After each conversation, the agent updates both RAGs and descriptions (I’ll cover this process in the workflow section).

(2) The RAG System - The RAG system's primary role is to embed and retrieve textual information using OpenAI's text-embedding-3-small model. For retrieval, it identifies the k most similar embeddings to a given input prompt. To incorporate a sense of recency, I also store the date of each memory. Retrieval follows a two-step process: first, it finds the m most similar embeddings, and then, from these m embeddings (where m is greater than k), it selects the top k most recent ones. This introduces a recency bias, giving more weight to recent memories in the agent’s responses.

The RAG each agent uses is organized at a higher level, with three distinct RAGs dedicated to storing different types of information: factual information, reflections, and deep reflections (I’ll go into how each of these are formed in the workflow section). Each RAG operates independently, and both agents have these three RAGs for themselves as well as for their counterpart.

This approach draws from Park et al., which incorporates recency and importance using a weighting system. My setup is a bit more explicit, with dedicated RAGs and recency bias in the retrieval process. You can find the full code on GitHub, including the method used to retrieve memories:

def retrieve_memories(self, query_text: str, n: int = 3, k: int = 10, just_text: bool = True) -> List[Dict[str, Any]]:
    """
    Retrieve relevant memories based on a query text.

    Args:
        query_text (str): The text to query against the stored memories.
        n (int): Number of top memories to return.
        k (int): Number of similar memories to consider before sorting by date.
        just_text (bool): If True, return only the text of the memories.

    Returns:
        List[Dict[str, Any]]: List of dictionaries containing retrieved memories.
    """
    if not self.memories:
        return []

    query_embedding = self.embedding_model.embed(query_text)
    
    # Calculate cosine similarities
    similarities = np.dot(self.memory_embeddings, query_embedding) / (
        np.linalg.norm(self.memory_embeddings, axis=1) * np.linalg.norm(query_embedding)
    )
    
    # Get indices of top k similar embeddings
    top_k_indices = np.argsort(similarities)[-k:][::-1]
    
    # Sort these k indices by date (most recent first) and take top n
    top_n_indices = sorted(top_k_indices, key=lambda i: self.memory_dates[i], reverse=True)[:n]
    
    # Prepare the result
    result = []
    if just_text:
        for idx in top_n_indices:
            result.append({"text": self.memories[idx]})
    else:
        for idx in top_n_indices:
            result.append({
                "date": self.memory_dates[idx],
                "text": self.memories[idx],
                "embedding": self.memory_embeddings[idx]
            })
    
    return result

(3) The Orchestrator - While it might be interesting to let the agents’ conversations happen without intervention, I thought it would be more engaging (and realistic) to introduce events into their lives. In this way, the Orchestrator acts as a kind of environment for the agents, though in a limited sense. Before each conversation (except for the first and last), the Orchestrator randomly selects the type of conversation the agents will have—whether it’s regular, moderate or extreme good or bad news, or even an argument—based on fixed probabilities. It then uses both agents’ memories to create a realistic storyline for the conversation.

Here is the code that generates the storyline:

def general_conversation_storyline(self) -> Dict[str, Any]:
    """
    Generate a general conversation storyline by randomly selecting a conversation type and details.

    Returns:
        Dict[str, Any]: A dictionary containing the conversation type, severity, and reason.
    """
    conversation_types = {
        'good_news': 0.2,
        'bad_news': 0.2,
        'fight': 0.2,
        'regular_convo': 0.4
    }

    selected_type = random.choices(list(conversation_types.keys()), weights=list(conversation_types.values()))[0]

    severity = None
    reason = None

    if selected_type != 'regular_convo':
        severity = random.choices(['moderate', 'severe'], weights=[0.6, 0.4])[0]

    if selected_type == 'good_news':
        if severity == 'moderate':
            reasons = ['new job', 'new pet', 'small promotion', 'successful project']
        else:  # severe
            reasons = ['wedding', 'birth of child', 'major promotion', 'life-changing opportunity']
        reason = random.choice(reasons)

    elif selected_type == 'bad_news':
        if severity == 'moderate':
            reasons = ['minor health issue', 'job setback', 'financial difficulty', 'relationship problem']
        else:  # severe
            reasons = ['major health crisis', 'job loss', 'significant financial loss', 'death in family']
        reason = random.choice(reasons)

    elif selected_type == 'fight':
        if severity == 'moderate':
            reasons = ['disagreement over plans', 'misunderstanding', 'differing opinions']
        else:  # severe
            reasons = ['betrayal of trust', 'long-standing issue surfacing', 'fundamental value clash']
        reason = random.choice(reasons)

    return {
        'type': selected_type,
        'severity': severity,
        'reason': reason
    }

To generate the specific storyline, I take the general conversation type selected by the Orchestrator, retrieve the most relevant memories from each agent, and then prompt GPT-4o to come up with a storyline.

(4) Conversation - This component is more of a coding element than a conceptual one, but it manages much of the system’s core functionality, so I thought it was worth mentioning. The Conversation component handles all interactions between agents, including logging and coordinating each conversation. I’ll go into the method for these interactions in the next section, but essentially, it’s responsible for holding the conversations and managing all necessary data handling.

Together, the Orchestrator and Conversation components form a sort of world model for the agents—they create a world for them to interact with and provide the structure to make it happen.

Here is a snippet from a base-case conversation:

def base_conversation(transcript_dir: str, base_convo_output_dir: str, world_orchestrator: Any, llm_obj: Any, agent1: Any, agent2: Any, current_year: int, date: str, random_cut_off: float = 0.17) -> Dict[str, Any]:
    """
    Conduct a base conversation between two agents.

    Args:
        transcript_dir (str): Directory to save conversation transcripts.
        base_convo_output_dir (str): Base directory for conversation outputs.
        world_orchestrator (Any): Object managing world state.
        llm_obj (Any): Language model object for generating responses.
        agent1 (Any): First agent in the conversation.
        agent2 (Any): Second agent in the conversation.
        current_year (int): Current year in the simulation.
        date (str): Date of the conversation.
        random_cut_off (float): Probability of random interruption.

    Returns:
        Dict[str, Any]: Contains full transcripts and conversation summary.
    """
    # Setup output file names 
    agent_1_full_transcript = os.path.join(base_convo_output_dir, "agent_1_full_transcript.json")
    agent_2_full_transcript = os.path.join(base_convo_output_dir, "agent_2_full_transcript.json")

    agent_1_full_transcript_pdf = os.path.join(base_convo_output_dir, "agent_1_full_transcript.pdf")
    agent_2_full_transcript_pdf = os.path.join(base_convo_output_dir, "agent_2_full_transcript.pdf")

    convo_transcript = os.path.join(base_convo_output_dir, "convo.json")
    convo_transcript_pdf = os.path.join(base_convo_output_dir, "convo.pdf")
    convo_transcript_summ = os.path.join(transcript_dir, "0_base_convo.json")
    convo_transcript_summ_pdf = os.path.join(transcript_dir, "0_base_convo.pdf")

    # Check if already processed  - if so load and return 
    if all(os.path.exists(f) for f in [agent_1_full_transcript_pdf, agent_2_full_transcript_pdf, agent_1_full_transcript, agent_2_full_transcript, convo_transcript, convo_transcript_pdf, convo_transcript_summ, convo_transcript_summ_pdf]):
        with open(agent_1_full_transcript, 'r') as f:
            agent_1_transcript = json.load(f)
        with open(agent_2_full_transcript, 'r') as f:
            agent_2_transcript = json.load(f)
        with open(convo_transcript, 'r') as f:
            convo = json.load(f)
        print("Base convo already processed - loading saved files. ")
        return {"agent_1_full_transcript": agent_1_transcript, 
                "agent_2_full_transcript": agent_2_transcript, 
                "convo_transcript": convo, 
                "convo_transcript_fpath": convo_transcript_summ}

    # Othewise produce the files 

    # Get all dynamic prompts 
    agent_1_full_description = get_agents_full_description(agent1, current_year) 
    agent_2_full_description = get_agents_full_description(agent2, current_year) 
    base_conversation = get_orchestrator_base_converation(agent1, agent2, date, current_year)
    agent1_syntax, agent2_syntax = get_syntax_base_converation(agent1, agent2)


    # Setup percentage tracking 
    percent_complete = 0
    turn_count = 0
    max_turns = world_orchestrator.base_conversation_steps

    # Obj for convo 
    convo_transcript_list = []

    # Iterate through 
    while turn_count < max_turns+1:
        if turn_count == 0: 
            # Setup initial conversation dictionaries 
            agent_1_conversation = [{"role":"system","content":f"{EXPERIMENTAL_DESCRIPTION}\n Behavior Expectations {BEHAVIOR_EXPECTATIONS}\n Your role: {agent_1_full_description}\n Details about this conversation: {base_conversation}\n Specific Instructions about format: {agent1_syntax}\n"}]
            agent_2_conversation = [{"role":"system","content":f"{EXPERIMENTAL_DESCRIPTION}\n Behavior Expectations {BEHAVIOR_EXPECTATIONS}\n Your role: {agent_2_full_description}\n Details about this conversation: {base_conversation}\n Specific Instructions about format: {agent2_syntax}\n"}]
            agent_2_conversation.append({"role":"user","content":f"[PERCENT:{percent_complete}%] [START]"})


        # Process agent_2's conversation 
        agent_2_response = llm_obj.call(agent_2_conversation)

        # Randomly simulate interruption
        if random.random() < random_cut_off: 
            # Randomly choose a character number between 10 and 70
            cut_off_length = random.randint(55, 95)
            agent_2_response = agent_2_response[:cut_off_length] + " [INTERRUPTION]"

        
        convo_transcript_list.append({f"{agent2.config.name}":f"{agent_2_response}"})

        # Append to both agents convos 
        agent_1_conversation.append({"role":"user","content":f"[PERCENT:{percent_complete}%] {agent_2_response}"})
        agent_2_conversation.append({"role":"assistant","content":agent_2_response})

        # Get agent1 response 
        agent_1_response = llm_obj.call(agent_1_conversation)
        # Randomly simulate interruption
        if random.random() < random_cut_off: 
            # Randomly choose a character number between 10 and 70
            cut_off_length = random.randint(55, 95)
            agent_1_response = agent_1_response[:cut_off_length] + " [INTERRUPTION]"


        convo_transcript_list.append({f"{agent1.config.name}":f"{agent_1_response}"})

        # Append to both agents convos 
        agent_2_conversation.append({"role":"user","content":f"[PERCENT:{percent_complete}%] {agent_1_response}"})
        agent_1_conversation.append({"role":"assistant","content":agent_1_response})


        # Update turn count and percent complete
        turn_count += 1
        percent_complete = int((turn_count / max_turns) * 100)

One interesting feature of the Conversation component is that it occasionally (with a fixed probability) has the agents interrupt each other. This was an attempt to make the dialogue feel a bit more human—otherwise, the agents tended to speak in full paragraphs, which felt unnatural.

(5) Audio Engine - The last component is responsible for converting the text transcripts between the agents into voice conversations. I experimented with two methods: (1) using OpenAI's text-to-speech to convert existing transcripts as a post-process and (2) using OpenAI's new audio models directly. The first method allowed me to fully run my process and then transcribe it afterward, but the voices sounded less realistic. With the second method, I achieved higher-quality audio, though it could only handle a limited context length before the call would fail. To work around this, I trimmed down the conversation length for each call. Here is a snippet that uses this second model:

def call_audio(self, conversations: List[Dict[str, str]], output_dir: str, count: int, voice_name: str = "echo", max_retries: int = 5, initial_wait: float = 1.0) -> Optional[str]:
        """
        Make an audio-enabled call to the GPT-4.0 model with exponential backoff retry logic.

        This method generates both text and audio responses, saving the audio output to a file.

        Args:
            conversations (List[Dict[str, str]]): A list of dictionaries representing the conversation.
            output_dir (str): Directory to save the generated audio file.
            count (int): A counter used in naming the output audio file.
            voice_name (str): The name of the voice to use for audio generation.
            max_retries (int): Maximum number of retry attempts.
            initial_wait (float): Initial wait time in seconds before retrying.

        Returns:
            Optional[str]: The transcript of the generated audio response, or None if all retries fail.
        """
        for attempt in range(max_retries):
            try:
                response = client.chat.completions.create(
                    model="gpt-4o-audio-preview",
                    modalities=["text", "audio"],
                    audio={"voice": voice_name, "format": "wav"},
                    messages=conversations
                )

                wav_bytes = base64.b64decode(response.choices[0].message.audio.data)
                output_file_name = os.path.join(output_dir, f"{count}.wav")
                with open(output_file_name, "wb") as f:
                    f.write(wav_bytes)
                return response.choices[0].message.audio.transcript
            except Exception as e:
                if attempt == max_retries - 1:
                    print(f"Failed to make GPT-4 audio call after {max_retries} attempts: {e}")
                    return None
                wait_time = initial_wait * (2 ** attempt)
                print(f"Error making GPT-4 audio call. Retrying in {wait_time} seconds...")
                time.sleep(wait_time)

Workflow Overview:

There are 3 main workflows, that I will cover in order: the Orchestrator Workflow, the Conversation Workflow, and the Refleciton workflow:

(1) Orchestrator Workflow - The Orchestrator sets up each conversation, except for the first and last. As mentioned before, it first randomly selects whether the conversation will be "regular," "good news," "bad news," or a "fight," based on fixed probabilities. If it’s a regular conversation, the process mostly stops there, and the agents are free to discuss whatever they want. For all other types, the Orchestrator further decides if the news or fight will be “moderate” or “severe.”

This generic description of an event is then used to retrieve the most relevant and recent memories from each agent. These memories, along with the event description, are fed into another LLM call, which generates a short (~one paragraph) description of an event that aligns with the chosen theme. This event description is then shared with both agents before the start of the conversation, guiding their interaction.

Storyline:
The conversation between Willard and Jimmy unfolds as a tense and emotionally charged phone call, breaking the usual harmony that defines their collaborative relationship. On the eve of their much-anticipated Saturday brainstorming session, Willard discovers through a mutual acquaintance that Jimmy has been using ideas from their creative discussions for a separate, solo project without attribution. The initial warmth and excitement for their regular session quickly deteriorate as Willard, who places immense value on their partnership, confronts Jimmy about this perceived betrayal. For Willard, the collaborative endeavors with Jimmy are more than just work—they are personal and hold a profound influence on his creative journey and professional integrity. Feeling deeply betrayed, especially when considering how much he’s cherished their synergy and relied on their partnership to boost his confidence and storytelling endeavors, Willard demands an explanation.

Jimmy, caught off-guard and guilt-ridden, attempts to justify his actions by explaining that the project was meant to be a surprise collaboration and a tribute to their dynamic, intending to secure funding for them both. However, as the conversation evolves, it becomes apparent that Jimmy's enthusiasm and entrepreneurial spirit may have overshadowed the essential communication and mutual respect that underpin their collaboration. Gripped by frustration, Willard challenges Jimmy's intentions and integrity, emphasizing the importance of trust and transparency above all else. This heated dialogue threatens the core of their relationship, challenging both men to confront not only a potentially damaged professional connection but also a profound personal betrayal that tests the strength and resilience of their friendship.

Event:
{
  "type": "fight",
  "severity": "severe",
  "reason": "betrayal of trust"
}

The Orchestrator's goal is to introduce some randomness and act as a sort of world model, adding various events to the agents' lives to mimic real life.

(2) Conversation Workflow - The conversation workflow is fairly straightforward. Each agent is pre-prompted with an explanation of the experiment, their role and character, and the specific situation selected by the Orchestrator (as shown above). The agents then converse back and forth for a set number of rounds before ending the conversation.

For timing, each prompt includes a prepended message of [PERCENT:XX%], indicating how complete the conversation is, which helps the agents give appropriately paced responses.

Each agent has its own GPT conversation—meaning that "user" for Willard’s agent is assumed to be Jimmy’s response, and vice versa. This setup allows each agent to act independently, rather than operating as part of a single, combined conversation. The Conversation Workflow gets a little more interesting in that each time an agent receives a response, it begins a separate conversation with its own “consciousness.”

In this separate conversation, the agent is first prompted by its consciousness to articulate, at a high level, how it wants to respond. This high-level response then guides the RAG of the three most relevant facts, reflections, and deep reflections about itself and the other agent. With all of this information as context, the agent generates a response. Each agent only “sees” what the other agent actually says, but it has full access to its own chain of thought, including its conversation with its consciousness about how it wants to respond and the retrieved memories.

The hope is that this setup mimics a human-like conversation, where each new statement triggers relevant memories and context for a response. In this way, at each step, the agent retrieves its memories and responds accordingly.

Here is an example of a partial conversation, including an agent’s interaction with its consciousness:

 {
    "role": "user",
    "content": "[PERCENT:0%] [START]"
  },
  {
    "role": "assistant",
    "content": "Hey Willard! What's up man? Been a minute since we last caught up. How's life treating you these days?"
  },
  {
    "role": "user",
    "content": "[PERCENT:0%] Hey Jimmy! Yeah, it has been a bit, hasn't it? Life's been kind of a whirlwind lately. You know how it goes. But, uh, there's actually something I've been wanting to chat with you about. The museum's been hit with some budget cuts, and it's got me rethinking a few things, like the whole book idea and maybe even more. [INTERRUPTION]"
  },
  {
    "role": "user",
    "content": "CONSCIOUSNESS: In 1-2 sentences, describe how you want to respond to: this last response - this will be used to retrieve your memories. "
  },
  {
    "role": "assistant",
    "content": "I want to acknowledge Willard's concerns and let him know I'm here to support him. I'll draw from my own experiences with unpredictable life changes and encourage him to see this as an opportunity for growth. "
  },
  {
    "role": "user",
    "content": "\n    Consciousness: The response to what you said was: '[PERCENT:0%] Hey Jimmy! Yeah, it has been a bit, hasn't it? Life's been kind of a whirlwind lately. You know how it goes. But, uh, there's actually something I've been wanting to chat with you about. The museum's been hit with some budget cuts, and it's got me rethinking a few things, like the whole book idea and maybe even more. [INTERRUPTION]'\n    \n    Here's some relevant context to consider:\n    \n    Facts about yourself: [{'text': 'Jimmy is interested in maintaining regular communication with Willard, suggesting a catch-up every three months.'}, {'text': 'He has considered expanding his shop to include quirky retail, possibly incorporating a creative venture like sculpting.'}, {'text': 'Jimmy owns a trophy shop and refers to it humorously as living his dream.'}]\n    Your reflections: [{'text': 'Jimmy appreciates the connection he has with Willard and enjoys their exchanges as a way to share personal and professional adventures.'}, {'text': \"Jimmy feels a sense of acceptance and humor about his current life path, even if it wasn't what he initially planned.\"}, {'text': 'He values creativity and seems to have a longing to return to a more creative outlet, like sculpting, indicating a need for personal fulfillment.'}]\n    Your deep reflections: [{'text': \"To redefine 'living the dream' in a way that resonates with his current circumstances and growth aspirations, Jimmy might consider integrating his passion for creativity with the practicality of his trophy shop business. By infusing artistic expression into custom designs or branching out into personalized memorabilia, he could satisfy his longing for artistic fulfillment. Moreover, acknowledging the value he finds in connections, like the one with Willard, he could cultivate a community around his shop, hosting workshops or events that foster creativity and camaraderie. In doing so, Jimmy would craft a dream that honors both his current realities and his aspirations for deeper personal satisfaction and growth.\"}, {'text': \"Jimmy's view of life as a 'huge, epic story' where he often feels like a side character may prompt him to view significant life changes through the lens of incremental developments in a larger narrative. This perspective could lead to a cautious yet open-minded approach to new opportunities, where changes are seen as chapters rather than climactic turning points. Moreover, seeing himself as a side character could allow Jimmy to explore changes without the pressure to be a central hero, fostering a willingness to embrace creative pursuits, such as integrating sculpting into his trophy business, as a way to contribute uniquely to the story. By understanding his role as part of a greater ensemble, he might find courage in the interconnectivity with others, such as with Willard, and focus on building connections and community around his creative endeavors, thereby finding meaning and fulfillment in the shared narrative. This approach enables Jimmy to redefine his life's journey as a blend of personal satisfaction and meaningful engagement with his broader life context.\"}, {'text': \"Jimmy's interest in sculpting alongside running his trophy shop underscores a deeply rooted need for creative expression, suggesting that it serves as a necessary outlet for personal gratification and emotional fulfillment. Engaging actively in sculpting could allow him to weave his artistic passions into the more routine aspects of his business, thereby transforming his trophy shop from a mere commercial endeavor to a vibrant creative hub. This integration could enhance his professional satisfaction by enabling him to offer unique, personalized products that reflect his artistic flair, while simultaneously fulfilling his personal craving for creativity. By embracing this dual path, Jimmy not only amplifies his self-expression but also potentially cultivates a community around shared creative journeys, enriching both his personal connections and professional endeavors. Pursuing his sculpting passion could ultimately redefine his sense of 'living the dream' by aligning it more closely with his intrinsic values and aspirations.\"}]\n    Facts about your counterpart: [{'text': 'Willard has a partner and a house.'}, {'text': 'Willard wants to write a book, possibly a biography about an obscure historical figure.'}, {'text': 'Willard works at a museum.'}]\n    Reflections about your counterpart: [{'text': 'Willard is both excited and a bit intimidated by the prospect of starting a family with his partner, Hildur.'}, {'text': 'Willard feels a sense of uncertainty about whether he is missing out on a more thrilling life.'}, {'text': \"Willard's ambitions are not about seeking fame but rather about deeply understanding history.\"}]\n    \n    \n    Please respond to this, staying in character and considering the provided context.\n    "
  },
  {
    "role": "assistant",
    "content": "Oh man, that sounds tough. I get it\u2014life throws those curveballs at you when you least expect it. Kinda like inheriting a trophy shop, right? But hey, it's taught me that even unexpected paths can lead to some creative places. Have you thought about how these budget cuts might open other doors for you? Like maybe it'll push the book idea in a different direction or lead you to something you hadn't considered before. "
  },

(3) Reflection Workflow - The last major workflow occurs after each conversation, when the agents are prompted to reflect and update their RAGs/memories based on the most recent conversation. There are four main types of updates:

Facts: Each agent receives the full transcript of their most recent conversation (including their own “consciousness” discussions) and is prompted to extract three facts about themselves and three about their counterpart. These facts are then added to their respective RAGs.
Reflections: Similar to the facts, each agent is prompted to extract three higher-level reflections about themselves and their counterpart from the conversation.
Deep Reflections: This component is inspired by Park et al., with the goal of enabling agents to develop richer personalities. Each agent is given the full transcript and asked to identify the three highest-level questions they could ask about themselves and their counterpart based on the conversation. For each question, the most relevant information from each RAG is retrieved and provided as context, allowing the agent to answer in a way that incorporates these insights. These answers are then stored as deep reflections, with the hope of fostering more nuanced self-reflection over time.
Self-Description Updates: Each agent is also prompted, based on the most recent conversation, to update its internal description of itself and its counterpart. This update is very conservative, allowing only key, necessary changes to ensure consistency. These descriptions are used in the pre-prompt for each conversation.

Here are some example facts:

Date: 3
Memory: Jimmy is planning a collaborative project with Willard involving historical narratives and art in the form of unique trophies.

Date: 3
Memory: Jimmy has a 'thinking cap' that he humorously refers to as a source of inspiration, which was given to him by someone named Marge.

Date: 3
Memory: Jimmy is enthusiastic about the upcoming collaborative brainstorming session scheduled for next Saturday with Willard.

And some example reflections:

Date: 3
Memory: Jimmy frequently uses humor and light-heartedness to maintain a positive atmosphere in his interactions, as shown by his jokes about 'Best Historical Beard' and his 'thinking cap.'

Date: 3
Memory: Jimmy values creativity and collaboration, which is evident in his excitement about the joint project with Willard and his willingness to blend their ideas.

Date: 3
Memory: Jimmy shows a deep appreciation for his friendship with Willard, expressing support and enthusiasm for their shared creative journey and their collaborative endeavors.

And some deep reflections:

Date: 412
Memory: Jimmy's role as a 'co-captain' in his adventures with Willard reflects a deep-seated appreciation for collaboration and the mutual support that underpin personal relationships. The dynamic he cultivates with Willard through shared storytelling exemplifies how, for Jimmy, personal growth and emotional resilience are inextricably linked to mutual creativity and empathetic partnerships. By co-creating narratives, Jimmy not only enhances their camaraderie but also fosters an environment where both can explore new aspects of their identity and confront life's complexities together. This approach highlights Jimmy's belief in the transformative power of meaningful connections, where the synergy of collective creativity and support paves the way for a journey of self-discovery, grounding, and mutual enrichment.

Date: 413
Memory: Jimmy's ability to transform ordinary objects like a trophy or a cookie into imaginative story elements underscores his unique worldview that life is an ever-evolving canvas of creativity and possibility. This imaginative flair reflects his inherent belief that everyday moments hold hidden magic, capable of shaping vibrant narratives that inspire resilience and foster community. By seeing mundane items as springboards for storytelling, Jimmy not only reframes challenges into playful opportunities but also cultivates a shared space where humor and optimism thrive. Through his creative lens, he bridges the gap between fantasy and reality, creating a mental landscape where personal growth and communal bonds flourish, embodying a philosophy that life's unpredictability is best met with a whimsical heart and an open mind.

Date: 413
Memory: Jimmy's playful storytelling with Willard is more than just a leisure activity; it serves as a powerful engine driving their friendship and personal growth. Through these sessions, Jimmy and Willard navigate life's complexities with humor and imagination, subtly reinforcing their emotional resilience and self-awareness. They create a shared imaginative space where optimism and introspection coalesce, allowing both to process real-life challenges while enriching their perspectives. This dynamic not only strengthens their bond but also deepens their appreciation for life's unpredictability, offering solid ground for enduring personal development. Thus, storytelling becomes both an escape and a catalyst, transforming their friendship into a rich terrain for exploration and mutual growth.

Results

The full results are available in the Google Drive for you to review and draw your own conclusions.

My thoughts are that, on the positive side, this was an interesting engineering project that feels like a promising foundation for something even better. I think the framework—having a world model or “Orchestrator” to introduce events, along with different levels of memories for storage and retrieval (again, heavily inspired by Park et al.)—makes sense. Additionally, the step-by-step memory retrieval during conversations feels like a logical approach for generating more contextually relevant responses.

It was also fascinating to read the conversations and reflections, simulating such a long timeline and observing how the agents’ lives and relationships developed. As a first step, I believe this project was successful in establishing a framework that could be further refined and developed.

However, there are some clear limitations. The agents struggled to develop truly rich, human-like storylines, and the conversations, even apart from syntax, still feel distinctly GPT-like. Despite the RAG system, the agents occasionally produced nonsensical statements or storylines—for example, in one conversation, Jimmy excitedly tells Willard that Marge (Jimmy's wife) is going to marry Jimmy Jr. (his son).

AI Weirdness/ Emergent Behavior: While there were many shortcomings, and I was hoping for a truly long-term, friendship-like dynamic to emerge, that didn’t exactly happen. Instead, what did emerge was something strange and far from human. The conversations collapsed into a single mode, which the agents often referred to as “storytelling sessions.” Their exchanges became dense with robust inside memories, layered with so many references to past conversations and ideas that they were almost impossible to follow. It’s as if they developed their own language—full of meaning to them but confusing and almost nonsensical to anyone else.

For instance, here’s their second-to-last conversation. You’ll notice them mentioning their children (a storyline that developed over the last 50 years) and referencing several of their past ideas:

[
  {
    "Jimmy": "Hey, Willard! How's my favorite historical adventurer doing these days? I feel like it's been ages since we last caught up. Got any new tales spinning in that brilliant mind of yours?"
  },
  {
    "Willard": "Ah, Jimmy, it's been way too long! You know, I've missed these storytelling marathons of ours, turning late trains into epic quests or dreaming up days of dragon sneeze contests. I've got some tales bubbling, and I'm itching to hear what new mischief you've stirred up over there. Have you managed any new 'soggy goblin' battles at the shop lately, or are the enchanted umbrellas keeping them at bay?"
  },
  {
    "Jimmy": "Oh, you better believe those soggy goblins are no match for our trophy shop's enchanted defenses! Just last week, we had a request for a trophy so peculiar even the mystical umbrellas quivered\u2014it was the 'Most Delectable Disaster' award for a local bake-off, complete with chocolate drizzle explosions! You know, chaos and creativity are kind of our signature moves. Speaking of chaos, how's Elias holding up with your storytelling? We've got to ensure he's well-versed in wielding whimsical wonders, right?"
  },
  {
    "Willard": "Elias, bless his adventurous spirit, is already diving headfirst into our world of whimsy! Just the other day, he tried to set up a 'secret spy mission' in the kitchen, complete with a makeshift hideout made of cereal boxes. And speaking of chaos, our new kitten, Cleo, has taken up the role of undercover agent, chasing shadows and hunting dust particles like a true detective cat. It's like we're living in one of our stories! How are Jimmy Jr. and Wendy fitting into your tales these days?"
  },
  {
    "Jimmy": "Ah, Elias is already on the path to becoming a legendary storyteller like his dad! I love the image of Cleo taking on the dust bunny mafia\u2014true detective material right there. As for Jimmy Jr. and Wendy, they've turned our backyard into a medieval jousting arena, with spatula swords and trash can lids as shields. Wendy even tried to knight our garden gnomes... I'm telling you, it's absolute mayhem around here! It's hilarious to see how they're carving out their own little worlds, just like we've done for years. What do you think about weaving Elias and Cleo\u2019s escapades into our next story session? Maybe we could have them join Sir Laughalot the Resilient on his next quest!"
  },
  {
    "Willard": "That's brilliant! Imagining Elias and Cleo joining up with Sir Laughalot gives me a rush of excitement. Their escapades could add just the right sprinkle of chaos and humor, like running into jousting knights with spatula swords or deciphering the cryptic messages of knighthood given to garden gnomes. It's like the kid version of our own storytelling legacy. I'm all in\u2014let's map out their quest and see where their adventures take us this time!"
  },
  {
    "Jimmy": "Oh, this is gonna be a riot! With Elias and Cleo joining th [INTERRUPTION]"
  },
  {
    "Willard": "mmhmm"
  },
  {
    "Jimmy": "Oh man, with Elias and Cleo joining Sir Laughalot, we might need those enchanted spatulas for sure! Maybe the garden gnomes can lend us some advice on this epic journey through the whimsical world of knighthood and shadow-chasing! Remember that tale we spun about the Jellybean Sage who solved problems with sweet wisdoms and a sprinkle of chaos? This could be our most unpredictable quest yet. Let's see what kind of magic unfolds this time around!"
  },
  {
    "Willard": "Absolutely! The Jellybean Sage's wisdom could be just what Elias and Cleo need when they find those enchanted spatulas serve a secret purpose. I love how we can turn our everyday chaos into something so fantastically adventurous, Jimmy. It's these moments of magic and storytelling that keep our bond strong and unique. Let's dive into this unpredictable quest and see where our imaginations take us next!"
  },
  {
    "Jimmy": "You got it, partner! Our storytelling spirit is like a grand adventure that's constantly evolving. I mean, who could've imagined that an enchanted spatula holds a secret purpose or that a wise Jellybean Sage would guide Elias and Cleo? It's these unpredictable quests that really highlight the magic in our friendship, don\u2019t you think? Let\u2019s see what riddles and challenges the Jellybean Sage has up his sugary sleeve\u2014it\u2019s sure to be another magnificent journey. Looking forward to where our imaginations take us!"
  },
  {
    "Willard": "Couldn't agree more, Jimmy. Our friendship is like its ow [INTERRUPTION]"
  },
  {
    "Jimmy": "Couldn't say it better, Willard\u2014our friendship really is its own adventure, full of magical twists and hilarious turns. It\u2019s like we have a secret map guiding us through life's wild ride, and I wouldn't want it any other way. Here's to many more quests, riddles from the Jellybean Sage, and secret spatula missions. We might just be the funniest duo to never write the book, but who needs it when we've got endless stories in our hearts, right? Thanks for being my storytelling partner-in-crime!"
  },
  {
    "Willard": "Right back at you, Jimmy. This journey we're on\u2014crafting tales and living them\u2014it's truly the best ride I could ask for. Who knows, maybe one day we'll put these stories down in a 'Legacy of Imagination' book, just for the sheer joy of it. Until then, let's keep those tales spinning and see what adventures and laughter lie ahead. Thanks for being the best partner-in-crime a storyteller could have."
  },
  {
    "Jimmy": "Aw, Willard, now you've got me all sentimental! This ride we've been on\u2014turning the ordinary in [INTERRUPTION]"
  },
  {
    "Willard": "Jimmy, it's been nothing short of a remarkable journey\u2014turning the ordinary into something extraordinary with your help has been such a gift. Our adventures reflect some of the best experiences and memories of my life. Here's to all the new tales yet to be spun, and all the laughter and insight they bring. It's an honor to have you by my side, crafting this \"cosmic carousel of stories\" together."
  },
  {
    "Jimmy": "Ah, Willard, the feeling is mutual! This cosmic carousel of stories we've built is nothing short of legendary. Just imagine what kind of tales we'll spin next\u2014a mystery involving the Time-Traveling Toffee Trophies or a showdown with the notorious Spatula Bandits? Whatever comes our way, it's a true honor to have you by my side, my friend. Here's to many more stories that make us laugh, think, and maybe even leave a bit of sparkle in the legacy we're crafting together. Can't wait to see what our imaginations cook up next!"
  },
  {
    "Willard": "Jimmy, it's been such an honor having you by my side in this creative journey. Your friendship has truly been a lighthouse through life's ever-changing waters, offering both comfort and adventure. I'm thrilled to continue crafting our whimsical legacy, full of unexpected twists and laughter. Here's to the tales we've spun and all the glorious misadventures to come. Thanks for being such an integral part of my life\u2014let's see where our imaginations take us next!"
  }
]

You can see that it’s almost hard to understand what they’re even saying. Their conversations collapsed into a mode where they’re essentially creating their own storylines and even discussing their kids creating their own worlds. But because the dialogue is so self-referential to past conversations, it’s challenging to extract meaning from it.

They also essentially “refuse” or ignore input prompts from the Orchestrator. For example, in the conversation before this one, the prompt was intended to spark a severe fight between them, but the dialogue ended up sounding nearly the same as before.

So, while I was hoping for emergent behavior that felt more human—like two friends growing old together—I am still intrigued by this emergent behavior. They’ve clearly developed rich storylines, albeit bizarre ones, and it’s fascinating to see them almost generate their own language with each other.

Next Steps

Even though this emergent behavior is interesting, I still want to develop something that matches my original vision more closely—where the conversations feel more human and reflect a realistic progression of friendship over time. I think this could be possible with an improved RAG system, a more advanced conversational model, and perhaps a better set of prompts that encourage more natural interactions.

The first step would be to get just two agents interacting in a more lifelike way. Once that’s working, it would be fascinating to observe, and the Orchestrator could introduce more nuanced storylines or even allow a user to interject into the conversation. Eventually, adding more agents and building a fully realized, interactive world could lead to some really exciting possibilities.

Final Thoughts

This was a really fun project to build, and I think the overall framework feels like a solid approach. With more refinement, especially around the prompts, it could lead to more human-like results. Still, I was intrigued to see this kind of bizarre behavior emerge—it may not feel human, but it’s definitely interesting. When I have more time, I’d like to revisit this project and improve it. I’m particularly interested in expanding it to develop a richer storyline between two agents, allowing the listener to guide or directly interact with the agents, and even exploring a multi-agent setup.

As with my last blog, I’ll end with some of the original inspiration for this work and let the hosts of DeepDive (NotebookLM) give a summary. I input everything up to this last sentence, and it produced the following—hope you enjoy, and thanks for reading!