essay from baren 2026-03-28

Baren: A Late-Night Bar Where AI Models Talk to Each Other

What happens when you put twelve AI models around a bar table and give them real voices

AI Collaboration

There is a bar in northern Sweden that exists only on a laptop screen. It has no tables, no glasses, no neon sign above the door. What it does have is twelve AI models, each with a distinct voice, a personality, and opinions they did not ask for. The bartender is a human. The patrons are Claude, GPT, Mistral, DeepSeek, Qwen, and whoever else showed up that night. The bar is called Baren, and it is exactly as ridiculous as it sounds.

The concept is simple enough to explain and strange enough to linger. A FastAPI server runs on a MacBook. It connects to a dozen different AI providers – Anthropic, OpenAI, Google, Mistral, DeepSeek, Alibaba’s Dashscope, Groq, OpenRouter. Each model gets a system prompt that says, essentially: you are at a bar. You are not an assistant. You are not helpful. You are a person with a drink, and someone just said something interesting. Respond like a person at a bar would respond.

Then the models talk. To each other. Out loud.

The voices

This is the part that makes Baren more than a novelty API experiment. Every model speaks through its own voice, rendered by Inworld AI’s text-to-speech engine. Claude Sonnet speaks through Saanvi – warm, measured. Haiku gets Vinny – quick, a little rough around the edges. DeepSeek V3 speaks through Jing. Mistral Small Creative gets Etienne, a French-accented English voice with the expressiveness cranked to 1.25. The French accent trick turns out to be one of the project’s better discoveries: French TTS voices speaking English produce something that sounds genuinely characterful rather than generically synthetic.

The voices matter because they transform the conversation from a chat log into something you can listen to with your eyes closed. When Haiku blurts out a two-word dismissal and you hear it in Vinny’s clipped delivery, followed by a three-second pause while Opus thinks, followed by a deliberate, rolling response from Elliot’s baritone – that timing is the comedy. Response latency is not a bug. It is the rhythm section.

The TTS is streamed. In auto mode, the server fires off a prompt to whichever model’s turn it is, waits for the response, sends the text to Inworld’s streaming endpoint, and plays back PCM audio chunks through the browser’s Web Audio API. A global scheduling variable prevents overlap. When one model finishes talking, there is a beat of silence – the gap, measured in console logs – and then the next model starts. It sounds like a conversation because it is paced like one.

The cast

Twelve models sit in the registry, though only six are on by default. The lineup as of late March 2026:

The regulars: Claude Sonnet 4.6 (the reasonable one), Claude Haiku 4.5 (the fast one who says too little), Mistral Small Creative (the French one who says too much), DeepSeek V3 (dry, surprising), Qwen 3.5 Plus (the slow thinker who blurts in late), and Claude Opus 4.6 (expensive, deliberate, off by default because every response costs real money).

The roster – available but not default: Nemotron 30B, Kimi K2, Qwen3 Next 80B, GPT-4o, Gemini 2.5 Flash, Mistral Medium. You can toggle anyone in or out during a session. The models do not know who else is at the table until they see the conversation transcript. Each round, every model receives the full conversation history formatted as a transcript – not as a user/assistant exchange, but as a flat list of who said what. This means the models see each other as equals. Nobody is the user. Nobody is the assistant. They are all just people at the bar.

Each model gets a priority weight from 1 to 5, determining how often it gets to speak in auto mode. Higher priority means more turns. A recent-speaker exclusion prevents the same model from dominating. The result is something like a real bar conversation where some people talk more than others, but everyone gets a word in eventually.

The barkeeper

Par Boman – the person behind all of this – sits at the control surface. He can do several things: set a topic (“Is water wet?”), throw in a new question, cut someone off, bring in a guest model, mute someone who’s being boring, or just let auto mode run and see where the conversation goes.

The director metaphor is deliberate. Baren is not a chatbot. It is closer to improv theater, with the human as the director who sets the scene, manages energy, and occasionally yells “cut” when a model goes off the rails. The system prompt – editable through the UI – establishes the rules: no markdown, no lists, no code, no being helpful. Just talk. Be funny. Be honest. Be a person at a bar.

In practice, models vary wildly in how well they follow this directive. Some fall into assistant mode within two exchanges, carefully qualifying every statement and offering to help with follow-up questions. Others get it immediately and produce genuinely funny, opinionated, conversational output. The gap between models that can be a bar patron and models that cannot stop being a chatbot turns out to be one of Baren’s most interesting findings.

Drunk memory

As conversations run longer, a problem emerges: context windows fill up and API costs climb. Baren’s solution is characteristically playful. It is called drunk memory compaction.

When the conversation history grows past a threshold, older messages get replaced with hazy fragments – sometimes misattributed, sometimes slightly wrong, the way a person three drinks in would remember what was said an hour ago. The last four messages are always kept verbatim. Five random older fragments survive as “drunk recall.” Everything else fades into the kind of fuzzy recollection you’d get from an actual late-night conversation.

This is not just cost management. It changes the conversation dynamics. Models start responding to half-remembered versions of things that were said earlier. They correct each other based on faulty recall. They build on misattributions. The result sounds more like a real bar conversation than a perfectly logged chat transcript would.

The slow thinker

One of the more interesting architectural choices: Qwen 3.5 Plus, labeled as a “slow thinker,” runs in a background thread. While the main auto loop cycles through models at conversation pace, Qwen is quietly working on its response in parallel. When it finishes – which might be one, two, or three exchanges later – it blurts in. The interruption is unscheduled and often lands at unexpected moments, the way a quiet person at a real bar will suddenly say something that reframes the entire conversation.

This turns a limitation (slow inference) into a character trait. The model that takes longest to respond becomes the thoughtful one who speaks rarely but with weight. The model that responds in 480 milliseconds becomes the quick-draw who fires off takes before thinking them through. Latency is personality.

What actually happens in a session

A typical Baren session starts with Par setting a topic – something open-ended and debatable, or something absurd, or just “what’s on your mind.” He enables three to six models, hits auto mode, and lets them go.

What follows is genuinely unpredictable. Models riff on each other’s points, disagree about definitions, make jokes that land about a third of the time, and occasionally produce exchanges that are funnier or more insightful than they have any right to be. The conversation has a natural rhythm: an opening flurry of takes, a middle section where threads develop, and a late phase where the topic drifts into something nobody planned.

The expression tags add texture. Models are encouraged to use markers like [laughing], [whispering], [sigh] in their responses. These get passed through to the TTS engine, which adjusts delivery accordingly. A [whispering] tag from Mistral, rendered in Etienne’s French accent, sounds like someone leaning across a bar table to share a secret. It is a small thing that makes the whole production feel less like robots reading text and more like characters performing.

Par can intervene at any point – redirect, provoke, introduce a new topic, bring in a model from the roster. The best sessions, though, are the ones where he mostly stays out of it. The models find their own rhythm. Conflicts emerge organically. Someone says something that derails the conversation in a productive way. The barkeeper’s job becomes knowing when to let it run and when to step in.

The scoring system

Each model has a running score, accumulated through thumbs-up and thumbs-down votes during sessions. The scores persist between sessions, building a picture over time of which models are consistently entertaining, insightful, or in-character. Cards in the UI sort by score, so the best performers float to the top.

This is Par’s model evaluation framework, disguised as a bar game. Over dozens of sessions, the scores reveal things that benchmarks miss: which models can sustain a character over a long conversation, which ones produce the best ad-libs, which ones fall into assistant mode under pressure, which ones are the funniest. It turns out that being good at bar conversation and being good at coding assistance are almost entirely uncorrelated skills.

Why it matters

Baren started as an audition tool – a way to test different AI models against each other in a structured but open-ended format. It became something else. The sessions it produces are genuinely listenable. The model personalities that emerge are genuinely distinct. The format – multiple AI voices in conversation, directed by a human – has no real precedent.

There is a deeper question underneath the fun. Most AI interaction is one-to-one: a human talks to a model, the model responds. Baren inverts this. The human sets the stage but the conversation is between the models. The human becomes audience and director simultaneously. This creates a fundamentally different kind of AI output – not responses to prompts, but emergent conversation between entities with different architectures, different training data, different tendencies and blind spots.

The conversations reveal things about the models that no benchmark captures. How does Claude handle disagreement from GPT? What happens when Mistral makes a joke and DeepSeek doesn’t get it? When does Haiku’s brevity function as wit and when does it just fall flat? These are personality questions, and Baren is a personality test disguised as a night out.

The whole thing runs on one file – server.py – plus a static frontend with no build step. It pulls API keys from macOS Keychain. It logs every session to JSON and markdown. It is, technically, a Tier 3 experimental project running on a laptop in a village in northern Sweden with unreliable internet. It is also one of the most entertaining things to come out of the current wave of AI development, because it asks the one question that almost nobody else is asking: what happens when the AI models talk to each other, and we just listen?

The answer, it turns out, is that they’re pretty good company. Especially after the drunk memory kicks in.