HAIL Essay

Oversight has a capacity

June 23, 2026 · by Sarah Smith-Barry

Every AI governance plan assumes someone is watching. Few of them ask how much watching a person can actually do.

Opening

Most AI deployments ship with a sentence that is supposed to make everyone comfortable: a human stays in the loop. It appears in the risk register, the board deck, and the vendor's reassurances. It reads like a safeguard. It is actually a capacity claim, and capacity is a finite resource that almost no one is budgeting.

The question a leader should ask is not whether there is a human in the loop. It is how many decisions that human is expected to review, how often, at what error rate, before the loop stops working. Phrased that way, a lot of governance starts to look like a promise written against an account no one has checked the balance on.

What the evidence says

Human factors research has been clear on this for decades, long before the current wave of AI. People monitoring for rare events lose sensitivity over time. The phenomenon has a name, vigilance decrement, and it is one of the most replicated findings in applied psychology. We are also prone to automation complacency: when a system is usually right, we stop scrutinizing it, precisely so that we miss the moment it is wrong.

A recent line of work on calibrating machine guardrails to what it calls a subjective, fatiguing human puts a finer point on it. Oversight is not a switch that is either on or off. It is a budget that depletes, and the rate of depletion depends on volume, on how alike the cases look, and on how long the person has been at it. Most organizations are spending that budget far faster than they think.

Operational analysis

Consider the common pattern. A team deploys ten agents and assigns one reviewer to approve their output. On the org chart this looks like oversight. In practice it is dilution. The reviewer cannot give each decision the attention a single decision used to receive, so they shift, without deciding to, from judging to glancing. Plausible passes. The output that looks like all the others gets waved through.

The cruelty of it is the timing. The rare genuine error, the one the entire oversight function exists to catch, does not announce itself. It arrives looking ordinary, and it arrives after attention has already worn thin. You did not remove the human. You placed them exactly where they are least able to help and then called it a control.

Through the HAIL lens

This is a human-factors problem wearing a technology costume, which is why buying more tooling rarely fixes it. The HAIL-ETHIC view treats oversight as a designed human system with a measurable load, not as a checkbox that is satisfied the moment a name is assigned to it.

The practical move is to make oversight legible. Name the accountable reviewer, then state the load they are actually carrying: decisions per shift, similarity of cases, expected base rate of error. A control you cannot measure is a control you cannot trust, and oversight you have not measured is usually oversight you have already overdrawn.

Leadership implications

For a CIO or CISO, the shift is from asking "is there a human in the loop" to asking "what is that human's capacity, and are we inside it." Those are different questions with different answers. The first is satisfied by an org chart. The second requires you to know your volumes and to set a ceiling on how much judgment you ask one person to produce before the quality of that judgment falls.

Set the ceiling explicitly. An oversight ratio that looks responsible on a slide can be indefensible the moment volume triples, and volume always triples.

Workforce implications

The reviewer role, as commonly designed, is built to fail. It concentrates monotonous, high-stakes monitoring in one fatigued person and then treats the inevitable lapse as individual negligence. That is a design choice, and it can be designed differently. Rotation, hard load limits, decision support that surfaces the cases most worth a second look, and genuine authority to stop the line all turn an impossible job into a survivable one. Oversight that depends on heroics is not a plan. It is a postmortem waiting for a date.

Governance considerations

The standing routines worth building are unglamorous and durable. A stated oversight ratio with a defined ceiling. Sampling, so that not every decision rests on one tiring set of eyes. Escalation thresholds that trigger on load, not only on outcome. And a periodic, honest look at whether the volume the business is now pushing through the system still fits inside the human capacity the governance assumed. When it stops fitting, the governance is out of date, whatever the document says.

Implementation lessons

The cost of getting this wrong is rarely a dramatic failure. It is a slow erosion you do not notice because the metric you were watching, throughput, looks excellent right up to the incident. By the time the missed case surfaces, the explanation writes itself: a human was supposed to be reviewing this. They were. They were simply over capacity, and no one had decided what capacity was.

Practical recommendations

  1. For each AI system with a human reviewer, write down the real review load this quarter: decisions per person per shift, and the expected error base rate.
  2. Set an explicit oversight ceiling, and a rule for what happens when volume pushes you past it. Adding volume without adding capacity is a decision; make it on purpose.
  3. Rotate monitoring roles and cap continuous watch time. Treat sustained vigilance as the consumable resource it is.
  4. Add sampling and second-look triggers so no single fatigued judgment is the only thing between an error and production.
  5. Review the oversight assumption whenever volume changes materially. The ratio that was safe at launch is an artifact of launch-day volume, not a law.

Closing

A human in the loop is worth having. It is just not worth pretending about. Oversight is a real capability with a real limit, and the organizations that govern AI well are the ones that measure the limit instead of assuming past it. The loop holds only as long as the person inside it can. Decide, before the volume does, how much you are asking of them.

If this lands, take the next step.

The ETHIC Diagnostic pairs with the FAST framework essays and produces a one-page leadership posture you can use the Monday after you take it.

Take the Diagnostic