essay 2026-03-31

AI While You Sleep

I launched an AI agent at midnight. At 2 AM I checked on it. The results file contained a plan and an apology.

I launched an AI agent at midnight and went to bed. The command was simple — read a prompt file, do the work, write the results. The machine would stay awake. The tokens were already paid for. What could go wrong.

At 2 AM, because I am the kind of person who checks on things at 2 AM, I opened the laptop. The results file contained a beautifully structured plan for the work it intended to do, a polite note explaining that it had been unable to proceed, and nothing else.

The agent had read the prompt. It had analyzed the task. It had formed an approach. Then it tried to write its first output file, hit a permission boundary that works differently in headless mode than in interactive mode, and exited cleanly. From its perspective, it had done everything it could. From mine, it had burned two hours producing an apology.

This happened three times before I figured out the fix. Three overnight sessions. Three apologies. Each one progressively more polite.


The permission trap

The first thing that goes wrong with headless sessions is permissions.

Claude Code in interactive mode asks for permission before writing files, running shell commands, or performing other potentially destructive operations. You click approve, the work continues. In headless mode, there is nobody to click approve. The agent hits a permission boundary and silently stops.

The fix is --allowedTools on the command line, which pre-approves specific tools. But the permission model in headless mode behaves differently than you might expect from interactive experience. Settings.json write permissions — the configuration file that governs what the agent can do — do not fully apply in -p mode. You must explicitly pass tool permissions on the CLI, or the agent will stall on its first write operation and produce an empty result.

This is the kind of failure that wastes hours if you discover it at 2 AM when you check on a session you launched at midnight. The agent ran. It read the prompt. It planned its work. It hit a permission wall on the first action. It exited cleanly with no error, because from its perspective it completed what it could. Your results file contains a plan and an apology.


The study design trap

The second failure mode is subtler and more expensive. A headless agent with a broad research prompt will produce output that looks comprehensive but has systematic blind spots.

This emerged clearly during an analysis swarm — multiple agents launched in parallel to analyze a large dataset from different angles. Each agent produced a detailed report. Each report was internally consistent. And when we compared the reports, they all had the same analytical bias: they described what was in the data without questioning the methodology behind the data collection.

The missing element was a methodology-level debater — an agent whose job is not to analyze the data but to challenge the analytical frame. In an interactive session, the human provides this naturally. You read an intermediate result and say “but wait, this assumes the sample is representative” or “you are measuring activity, not outcomes.” The agent adjusts. In a headless session, there is no such correction. The agent follows its analytical instincts, which are good at pattern recognition and bad at questioning their own premises.

The fix is architectural: include a critic agent in any headless analysis pipeline. One agent analyzes. A second agent reads the analysis and attacks its assumptions. A third agent synthesizes. This costs more tokens but produces output that a human does not need to completely re-evaluate for hidden assumptions.


The pre-staging rule

The single most important finding from seven iterations of headless orchestration: thirty minutes of human preparation saves thirty thousand tokens of agent wandering.

A headless agent with a vague prompt — “analyze this project” or “review these files” — will spend most of its token budget figuring out what to analyze and how. It will read directory listings, sample files, form hypotheses about the project structure, and eventually settle on an approach that may or may not be what you wanted. This exploration phase is necessary when a human is present to redirect. Without a human, it is pure waste.

A headless agent with a staged prompt — specific files listed, specific questions asked, specific output format defined, relevant context pre-loaded into the prompt — skips the exploration phase entirely and spends its entire token budget on actual work.

The preparation looks like this:

  1. Name the files. Do not say “the configuration files.” Say config/settings.json, config/deploy.yaml, and config/nginx.conf.
  2. Ask specific questions. Do not say “review the architecture.” Say “identify any endpoint that accepts user input without validation” or “list every database query that does not use parameterized statements.”
  3. Define the output format. Do not say “write a report.” Say “produce a markdown file with one section per finding, each containing: file path, line number, issue description, severity (high/medium/low), and suggested fix.”
  4. Pre-load context. If the agent needs to know about project conventions, architectural decisions, or previous findings, include them in the prompt file. Do not assume the agent will discover them by reading CLAUDE.md or READMEs.

The difference in output quality between a staged and unstaged headless session is not incremental. It is categorical. A staged session produces output you can act on. An unstaged session produces output you have to re-do.


The orchestration ranking

After seven iterations, a clear hierarchy emerged for different orchestration approaches:

Pattern Quality When to use
Interactive session Highest Complex judgment, novel problems, anything requiring real-time redirection
Pre-staged headless High Well-defined tasks with clear inputs and outputs, overnight batch work
Cold headless (no staging) Medium Exploration where any result is useful, broad sweeps
Parallel agent swarm Variable Research gathering (good), writing (bad), analysis (good with critic agent)

The ranking is not about the agents’ capabilities — they are the same model in every configuration. It is about the feedback loop. Interactive sessions have a tight feedback loop (human redirects in real time). Pre-staged sessions have a designed feedback loop (the prompt encodes the human’s judgment). Cold sessions have no feedback loop at all. Swarms have peer feedback, which helps for research but hurts for tasks requiring a consistent voice.


The overnight pattern

The practical application that emerged from all of this: the overnight batch.

Before going to bed, prepare prompts for three to five well-defined tasks. Launch them in sequence with sleep intervals (to avoid rate limits and allow each session to have full capacity). Wake up to finished results.

The tasks that work best overnight are:

  • Code review across a specific set of files with a defined checklist
  • Research gathering with explicit source constraints (“search for X in these three directories, produce a summary”)
  • Audit tasks with clear criteria (“find every API endpoint that does not log to apilog”)
  • Content processing where the format is known (“convert these 20 markdown files into Hugo-compatible frontmatter format”)

The tasks that do not work well overnight are anything requiring creative judgment, voice consistency, or iterative refinement. Those need a human in the loop — not because the agent cannot do them, but because the quality gap between “agent’s first attempt” and “agent’s attempt after human feedback” is too large to waste a full session on the first attempt alone.


What this changes

Headless orchestration does not change what AI agents can do. It changes when they can do it. The working hours of a human operator are no longer the constraint on agent productivity. The constraints become: prompt quality, task decomposition, and the discipline to prepare properly before launching.

The thirty-minute rule is the governing principle. If you cannot spend thirty minutes defining exactly what the agent should do, the task is not ready for headless execution. If you can, you get hours of autonomous work while you sleep, or work your day job, or do anything else. The tokens are paid for. The machine is awake. The agent is working.

The command is simple. The preparation is everything.