Protocols
Sixteen operating principles earned through measured results, not assumed
These protocols emerged from running 30+ experiments with AI models — testing fallback chains, fine-tuning, model evaluation, editorial pipelines, and more. Each principle was earned through measured results, not assumed. They’re the operating principles for how we run the lab.
Protocol 1: Every test should serve a real project. Every finding should reach the people who can use it.
Protocol 2: Test, don’t guess. Prove, don’t assume. Deliver, don’t hoard.
Protocol 3: Let no free API credit go unused. Free tiers exist to be explored. Sign up, test, document, move on.
Protocol 4: Context beats compute. An 870-token glossary made Haiku match Sonnet. Feed the model what it needs before reaching for a bigger one.
Protocol 5: Cheap for plumbing, expensive for people. Users deserve quality. Pipelines don’t care. Split your tiers accordingly.
Protocol 6: Fall back, don’t fall over. API → Groq → Local MLX. The delta is 0.4 points. Your uptime is worth more.
Protocol 7: If a rule can do it, a model shouldn’t. Predictable output? Skip the LLM. $0, instant, 100% reliable.
Protocol 8: Batch, don’t drip. One API call with 10 items beats 10 calls. Coherence goes up, cost goes down.
Protocol 9: Don’t LLM-route. Keywords score 90%. Tiny models score 0%. Save the inference for real work.
Protocol 10: Local when it’s free, cloud when it matters. Apple Silicon eats OCR, transcription, and batch for breakfast. Pay for judgment.
Protocol 11: Rotate, don’t pray. If a key touched git, logs, or history — it’s compromised. Rotate immediately. Deletion doesn’t help. Keychain locally, chmod 600 on servers, never in code.
Protocol 12: Two fix attempts, then instrument. Stop guessing. Add logging. Data beats theory, especially when frameworks are doing magic you can’t see.
Protocol 13: Don’t fine-tune until your prompts are perfect. We spent $26 and got a worse model. Prompt engineering + context injection gets you 80% of the way for $0.
Protocol 14: The specialist beats the generalist. Purpose-built APIs outperform scaling general models. Use the right tool, not the biggest one.
Protocol 15: Ship the breadcrumb, not the encyclopedia. Director leaves a signpost in every affected project — experiment number, three-line summary, path back to full docs. The project does the work.
Protocol 16: EU when equal, exotic when fun. Prefer European providers unless a US service is significantly better. When two approaches are equally good, pick the more interesting one.
Protocols are amended when experiments prove them wrong.