essay 2026-03-22

The Variability Thesis: Building Tools That Fit Irregular Minds

Every productivity tool on the market is designed for a brain that doesn't exist

ADHD & Tools

Every productivity tool on the market is designed for a brain that doesn’t exist. A brain that wakes at the same time, focuses on demand, responds to notifications within minutes, and follows through on plans made yesterday. The neurotypical ideal. A statistical ghost.

For the 5-10% of adults with ADHD – and the far larger population whose executive function doesn’t match the template – these tools don’t just fail to help. They generate shame. Missed streaks become moral failures. Overdue tasks become character judgments. The tool designed to assist becomes another voice confirming that you are broken.

This is a synthesis of 21 research papers spanning cognitive science, human-computer interaction, organizational psychology, and clinical neuroscience. It took hundreds of hours to read, cross-reference, and pressure-test. What follows is not a literature review. It is an argument: that AI has unlocked a new paradigm for ADHD tooling – one the research community hasn’t named yet, but that people are already living.

The Paradigm Shift: Tool Sovereignty

Two recent studies frame the state of ADHD technology research. Mew (2025) surveys how AI assists ADHD programmers within existing tools. Deshmukh et al. (2025) propose on-device behavioral sensing to nudge ADHD users toward “better” habits. Both are useful. Neither addresses the fundamental shift already underway.

The shift is this: non-programmers are using AI to build bespoke tools shaped to their specific neurodivergent patterns. Not adapting off-the-shelf software. Not configuring settings. Building from scratch – task managers, review pipelines, session-resumption hooks, notification systems – each one shaped to a single brain.

No product manager would design for these exact workflows. That’s the point.

Spiel et al. (2022) named the problem this solves. Their critical review of 100 ADHD technology papers – conducted by four neurodivergent HCI researchers – found that the field is built on deficit assumptions. Only 12% of papers interviewed ADHD participants. Only five projects genuinely included ADHD people as co-designers. The technologies exist, overwhelmingly, to make ADHD people behave more “normally.”

When participants resisted – hiding devices, subverting interventions, triggering feedback systems for amusement rather than compliance – the papers framed this as failure. Spiel’s team recognized it for what it was: users telling you your design is wrong.

Tool sovereignty is the practical answer to this critique. When someone with ADHD builds their own tools with AI assistance, the power dynamic inverts. The tool serves the brain, not the other way around. The builder’s irregular patterns aren’t bugs to be patched – they’re the specification.

Four research questions emerge from this shift that nobody in the field is asking yet:

Do bespoke AI-built tools reduce “tool fatigue” compared to adapted off-the-shelf alternatives?
Does the build-use-rebuild loop itself scaffold executive function?
What patterns emerge in tools built by ADHD users that commercial tools consistently miss?
How does tool sovereignty affect the shame and avoidance cycle?

Lauder et al. (2022) make these questions urgent. Their systematic review searched ten databases for every study ever published on interventions that help adults with ADHD at work. They found 143 studies. Zero were conducted in an actual workplace. Two used simulated workplaces (participants solving math problems in a lab for 14 hours). The 2024 meta-analysis follow-up confirmed the gap persists. Decades of research, and nobody has tested anything where people actually work.

Any n-of-1 tool testing in real daily work contexts – building, deploying, measuring, iterating – is filling a void the entire field has documented and failed to close.

Variability Is the Signal

The most robust finding across every paper with empirical data: ADHD manifests as higher variability, not worse averages.

Sankesara et al. (2025) and Denyer et al. (2025) studied the same 40 participants over the same 10 weeks at King’s College London – one team measuring digital behavior through smartphones and wearables, the other measuring sleep through actigraphy. Different behavioral channels. Same result.

Study	Domain	Metric	ADHD	Control	Key
Sankesara	Digital	Notification response SD	15.9h	9.0h	d=0.84
Sankesara	Digital	Task interval SD	2.0h	0.35h	d=1.13
Sankesara	Physical	Ambient light SD (lux)	84	44	d=0.86
Denyer	Sleep	Duration SD	1h33m	1h10m	p<0.001
Denyer	Sleep	Onset SD	2h02m	1h43m	p<0.001
Denyer	Sleep	Offset SD	1h50m	1h37m	p<0.001
Denyer	Sleep	Efficiency SD	4.23%	3.67%	p<0.001

Seven variability measures. Four domains – notifications, task switching, physical movement, sleep. All significant. All from the same cohort.

Note what wasn’t significant: total phone use time. ADHD participants didn’t use their phones more than controls. They used them differently – sometimes responding to a notification in seconds, sometimes ignoring it for days. The standard deviation of their response time was nearly double.

This finding has a direct design consequence: Don’t normalize variability. Accommodate it. Variable does not mean broken. Tools should expect irregular patterns and make re-entry cheap, not punish lateness or inconsistency. Show people their variability patterns, not their averages. The mean is where nobody lives.

Core Principles: What Tools Should Actually Provide

Tools should provide presence, recall, and gentle re-orientation – not tracking, reminding, and completing.

The distinction is philosophical but its consequences are concrete. Tracking generates data about you for judgment. Presence shows you where you are without evaluation. Reminders interrupt. Recall surfaces context when you reach for it. Completing steals the metacognitive act that is itself therapeutic. Gentle re-orientation helps you find your thread.

Three bodies of research converge on this framework.

Altmann & Trafton (2002) provide the physics engine. Goals are not stored in a special cognitive stack. They are ordinary memory traces: they strengthen when attended, decay when ignored, and compete with distractors for retrieval. When you are interrupted, your current goal decays. When you try to resume, you must retrieve it from memory in competition with every other recently encoded goal. Environmental cues prime retrieval through spreading activation – but only if they are blatant. Twenty years of validation from Trafton’s Naval Research Laboratory program confirmed every prediction: 8-second warnings help, blatant cues work while subtle ones don’t, recovery follows a 15-second curvilinear pattern across 13,000 measured interruptions.

Leroy (2009, 2018) translates this into intervention. Attention residue – the persistence of cognitive activity about a previous task while working on a new one – actively impairs performance. It doesn’t just cost time; it degrades the quality of what you do next. The intervention is dead simple: before switching, spend less than a minute writing down where you are and what you will do when you return. This “Ready-to-Resume Plan” provided enough cognitive closure that participants were 79% more likely to make the optimal decision on the interrupting task.

Parnin & Rugaber (2009/2011) show what happens without these cues. Across 10,000 instrumented programming sessions from 85 developers, only 10% saw a first edit within one minute of resuming. Only 7.5% involved no navigation to other code locations before editing. The typical programmer spent several minutes wandering through their codebase – visiting 2-12 locations, performing 15-150 selection events – just to rebuild enough context to make a single edit. The coping strategies programmers invented independently (TODO comments, intentional compile errors, sticky notes, source diffs) are all ad-hoc prosthetic memory devices.

The implication for tool design is clear: context at the point of resumption is not a convenience feature. It is the mechanism that makes productive work possible after an interruption. For ADHD brains, where interruptions are more frequent and working memory is the bottleneck (Kofler et al. 2019), this is doubly true.

The Switching Reframe

The conventional wisdom – that ADHD makes context switching hard because ADHD brains “can’t shift” – is probably wrong for the majority of people with ADHD.

Irwin & Kofler (2019) tested 77 children on a carefully controlled shifting paradigm and found that ADHD participants showed no set-shifting impairment relative to controls. The heterogeneity data from Kofler et al. (2019) put a number on it: only 10-38% of people with ADHD have genuine set-shifting deficits. The remaining 62-90% can shift between tasks just fine.

What they struggle with is the preparation for the shift and the maintenance of what they were doing. Working memory failures (can’t hold competing rule sets) and inhibitory control failures (can’t suppress the active rule) masquerade as inflexible cognition. The 2024 synthesis in Nature Reviews Psychology confirmed this pattern across both ADHD and autism spectrum populations.

This reframes tool design from “minimize switches” to “make switches cheaper.” Not fewer contexts, but richer context preservation. Not protecting focus at all costs, but ensuring that when focus breaks – and it will – the cost of returning is as low as possible.

What the Research Says Not to Build

No gamification. Diefenbach & Mussig (2019) studied Habitica, the most popular gamified task manager, across two studies. Every single participant – all 45 in the field study – experienced counterproductive effects. Not “some.” Not “most.” All of them. The effects weren’t subtle: participants gamed the system to avoid punishment, were punished during their most productive periods, and shifted their focus from the actual work to the game layer. Only 49% rated the rewards as even somewhat appropriate. The prevalence of counterproductive effects was the strongest predictor of motivation erosion over time.

Combined with what Knouse (2025) found about ADHD emotional regulation – avoidant thoughts are positively valenced (“I’ll do it later”), present 45% of the time, and amplified by ADHD symptom severity – gamification is a particularly cruel choice. Streaks punish the variability that is the defining feature. Points create a parallel reward system that competes with intrinsic motivation. Penalties trigger shame in brains already vulnerable to rejection sensitivity.

No tracking for tracking’s sake. Olinic et al. (2025) reviewed the entire wearables-for-ADHD landscape and found sensors that can measure everything but tools that do nothing useful with the data. Quantification without actionable response is surveillance, not support.

No rigid time blocking. The variability data alone should end this conversation. A brain whose sleep onset varies by two hours and whose notification response varies by sixteen hours is not going to follow a fixed schedule. Tools should ride the variability, not fight it.

No auto-generated task summaries. Zhu et al. (2026) – a co-design study with 20 ADHD students and 5 clinical experts – identified the critical guardrail for AI-assisted task management: GenAI can scaffold metacognition or destroy it, and the line between the two is whether the tool prompts reflection or automates decisions. When AI generates a task summary for you, it steals the metacognitive act of figuring out what you were doing. The prompting is the therapy. Automate it away and you create dependency while undermining the executive function you’re trying to support.

A Research Agenda: Eight Testable Ideas

The following ideas emerge from cross-referencing the 21 papers. They are organized by complexity, not importance. Each is grounded in specific research findings, and each is testable in real working conditions – the kind of testing Lauder (2022) proved has never been done.

Tier 1: Low Effort, High Evidence

Session-resumption context. When returning to work after an interruption, surface the last few active contexts – recent activity, open threads, unfinished work – as a brief orientation. Not a wall of information. Three bullet points. The theoretical backing runs deep: Altmann & Trafton’s activation-based model explains why cues work (priming overcomes goal decay), Leroy’s Ready-to-Resume Plan proves that they work (79% better decisions), and Parnin’s data shows how badly they’re needed (only 10% of sessions resume within one minute without them).

Single-next-action mode. When facing a full task list, the option to see only the single most relevant next action. Decision paralysis is well-documented in ADHD (Newman et al. 2025 found time management during design was 4.4x harder for ADHD programmers), and reducing options is a known countermeasure. Gilbert et al. (2022) add precision: external tools are most valuable not as content stores but as timing triggers. The brain dissociates “when to act” (always active in lateral prefrontal cortex) from “what to do” (offloadable to the environment). Show one thing. Show it at the right time.

Kinder language in task management. Replace “overdue” with “waiting for you.” Replace “X tasks remaining” with showing just the next one. Small copy changes, large emotional difference. Knouse’s data on avoidant automatic thoughts (2025) provides the mechanism: if 45% of moments already contain thoughts like “I have plenty of time” or “I’ll feel like doing this later,” the last thing a tool should do is add shame to the pile.

Tier 2: Medium Effort, Strong Evidence

Weekly rhythm digest. A descriptive summary of what actually happened during the week – what moved forward, what’s waiting, what patterns emerged. Not evaluative. No productivity score. The ICSE Groove study (Newman et al. 2025) found 59% of surveyed developers wanted weekly summaries. The key design constraint from the variability thesis: show patterns over time, not comparisons to a fixed standard.

Context-aware notification batching. Kushlev et al. (2016, 2019) provide the definitive evidence. Notifications cause ADHD-like symptoms even in neurotypical populations (d=0.44, n=221). But total silence causes anxiety (d=0.56). The solution: predictable batches at natural breakpoints, three times per day. Batching beats both constant interruption and total suppression. During deep work, suppress non-urgent interruptions; when focus breaks naturally, deliver the batch.

Periodic check-in during work. Not a timer. Not a reminder. A presence – like a co-worker glancing over, offering a soft “still rolling?” every 20-30 minutes. The body-doubling literature in ADHD (referenced in Newman et al. 2025’s Reddit analysis) documents this: many ADHD people work better with another person simply present, not supervising. AI can provide this without the social overhead.

Tier 3: Exploratory, Might Not Work

Application focus sensing. Kasatskii et al. (2023) tested clean vs. cluttered IDE interfaces with 36 developers and found clean layouts produced 35% faster first keystrokes and 29% faster coding, with ADHD symptom profiles predicting which mode helped most. Extend the principle: detect rapid application switching (possible drift), sustained focus (leave alone), or extended idle (gentle re-orientation). Use the signal, don’t surveil.

Emotional tone in task ranking. Use local language models to classify tasks by emotional weight – dread, excitement, neutral – and surface low-dread tasks when motivation is low. Knouse (2025) found work tasks are the most avoided category and screen time is the primary replacement activity. If the avoidant thought is “I’ll feel like doing this later,” one response is to offer a task you might actually feel like doing now.

The Evidence Standard

One paper in this collection deserves separate mention for the discipline it imposes. Westwood et al. (2025) published the definitive meta-analysis of neurofeedback for ADHD in JAMA Psychiatry: 38 randomized controlled trials, 2,472 participants. When the symptom rater was blinded, the treatment effect was zero (SMD = 0.04). Decades of positive results were unblinded parent ratings – pure placebo by expectancy. A $1.4-billion-per-year industry selling families a non-treatment.

This is a warning. Any claim that a tool “helps with ADHD” needs evidence that survives scrutiny. Self-report is necessary but insufficient. Behavioral measurement matters. Blinding matters when possible. The neurofeedback story shows how easily we fool ourselves when we want something to work.

The bar should be: does the tool change what people do, not just what they say? The variability data from Sankesara and Denyer shows that behavioral change is measurable. Effect sizes of 0.7-1.13 are large. If a tool genuinely accommodates ADHD variability, the signal should be visible in the same kind of passive measurement that detected the variability in the first place.

The Curb-Cut Effect

One finding from the ICSE Groove study (Newman et al. 2025) reframes the entire enterprise. Among 493 surveyed professional programmers, 80% of neurotypical developers also used task chunking strategies. 53% also took structured breaks. 49% also blocked distractions. The strategies that ADHD developers depend on, neurotypical developers also want.

This is the curb-cut effect: design for the people who need it most, and you design for everyone. Tools built to accommodate ADHD variability – cheap re-entry, gentle language, single-next-action focus, rhythm-aware scheduling – are not niche accommodations. They are better design.

The question is not whether to build these tools. People already are. The question is whether the research community will study what’s happening in the wild – people building their own tools, shaped to their own brains, using AI as the fabrication layer – or whether it will keep running lab studies on undergraduates who don’t have ADHD, testing interventions that nobody uses, in environments where nobody works.

The tools are already being built. The research should catch up.

Bibliography

Altmann, E. M. & Trafton, J. G. (2002). Memory for Goals: An Activation-Based Model. Cognitive Science, 26(1), 39-83.
Denyer, S. et al. (2025). Sleep variability in ADHD: prospective observational study. BMC Psychiatry. King’s College London.
Deshmukh, A. et al. (2025). On-device behavioral sensing and nudges for ADHD. arXiv:2507.06864v1.
Diefenbach, S. & Mussig, A. (2019). Counterproductive effects of gamification: An analysis on the example of the gamified task manager Habitica. International Journal of Human-Computer Studies, 127, 190-210.
Gilbert, S. J. et al. (2022). Outsourcing Memory to External Tools: A Review of “Intention Offloading.” Psychonomic Bulletin & Review, 30(1), 60-76.
Irwin, L. N. & Kofler, M. J. et al. (2019). Do children with ADHD have set shifting deficits? Neuropsychology, 33(4), 470-481.
Kasatskii, N. et al. (2023). Clean UI vs default IDE: ADHD symptom profiles and coding performance. HCII 2023, Springer LNCS. JetBrains Research.
Knouse, L. E. et al. (2025). Avoidant Automatic Thoughts Are Associated With Task Avoidance and Inattention in the Moment. Journal of Attention Disorders, 29(7), 529-540.
Kofler, M. J. et al. (2019). Executive functioning heterogeneity in pediatric ADHD. Journal of Abnormal Child Psychology, 47(2), 273-286.
Kofler, M. J. et al. (2024). Executive function deficits in ADHD and autism spectrum disorder. Nature Reviews Psychology, 3(10), 701-719.
Kushlev, K. et al. (2016). Does receiving notifications cause ADHD-like symptoms? CHI 2016. (n=221, d=0.44.)
Kushlev, K. et al. (2019). Notification batching effects on well-being. Computers in Human Behavior. (n=237.)
Lauder, K. et al. (2022). A systematic review of interventions to support adults with ADHD at work. Frontiers in Psychology, 13, 893469.
Leroy, S. (2009). Why Is It So Hard to Do My Work? The Challenge of Attention Residue When Switching Between Work Tasks. Organizational Behavior and Human Decision Processes, 109(2), 168-181.
Leroy, S. & Glomb, T. M. (2018). Tasks Interrupted: How Anticipating Time Pressure on Resumption of an Interrupted Task Causes Attention Residue. Organization Science, 29(3), 380-397.
Mew, E. (2025). AI and ADHD programming productivity. ISCAP 2025.
Newman, K. et al. (2025). Get Me In The Groove: A Mixed Methods Study on Supporting ADHD Professional Programmers. ICSE 2025, 1217-1229.
Olinic, T. et al. (2025). Wearables for ADHD: comprehensive review. Diagnostics.
Parnin, C. & Rugaber, S. (2011). Resumption Strategies for Interrupted Programming Tasks. Software Quality Journal, 19(1), 5-34.
Sankesara, H. et al. (2025). Identifying Digital Markers of ADHD in a Remote Monitoring Setting. JMIR Formative Research, 9:e54531.
Spiel, K. et al. (2022). ADHD and Technology Research – Investigated by Neurodivergent Readers. CHI ‘22, Article 547.
Westwood, S.J. et al. (2025). Neurofeedback for ADHD: A Systematic Review and Meta-Analysis. JAMA Psychiatry, 82(2), 118-129.
Zhu, Z. et al. (2026). Scaffolding Metacognition with GenAI: Design Opportunities for ADHD Task Management. CHI ‘26. arXiv: 2602.09381.

Connections

Protocols The thesis grounds several protocol decisions in cognitive science
Paper Review: Counterproductive Effects of Gamification (The Habitica Study) The gamification section directly cites and extends the Habitica review
Paper Review: ADHD and Technology Research -- Investigated by Neurodivergent Readers Tool sovereignty is the central paradigm shift this thesis argues for
Paper Review: Attention Residue and the Ready-to-Resume Plan Attention residue research underpins the session-resumption design principle
Paper Review: Resumption Strategies for Interrupted Programming Tasks Parnin's interruption data provides the empirical case for context-at-resumption
Enhancing Programming Productivity for ADHD The ICSE Groove study data feeds the curb-cut effect argument
Toward Neurodivergent-Aware Productivity Both address the same core question: how should tools treat irregular brains