Sensemaking for a plural world

Essay

Why "fairness" is the wrong frame

March 2026

In 2016, Eric Loomis stood before a Wisconsin judge after pleading guilty to fleeing from police and operating a vehicle without the owner's consent. The judge had in front of him a COMPAS risk assessment — a proprietary algorithm sold by a company called Northpointe that scored defendants on their likelihood of future offending. Loomis received a high-risk score. The judge cited it in sentencing him to six years in prison. Loomis appealed, arguing that he had a right to know how the score was calculated so he could challenge it. Wisconsin's Supreme Court upheld the sentence. The algorithm's proprietary nature, it ruled, did not violate due process because the score was only one factor among several, and because the defendant could see the score itself even if not the mechanism producing it.

The same year, ProPublica published an investigation of COMPAS that became the founding document of the modern algorithmic fairness movement. ProPublica found that Black defendants were nearly twice as likely as white defendants to be falsely flagged as high risk — labeled future criminals when they were not. White defendants were more likely to be falsely flagged as low risk — labeled safe when they were not. Northpointe responded that this was misleading: their algorithm was, in fact, well calibrated. Among defendants who scored high risk, roughly the same fraction of Black and white defendants reoffended. The score meant the same thing regardless of race. Both sides had the same data. Both sides were correct.

What they had stumbled into — though neither named it this way — was a mathematical impossibility result. The argument has not been resolved in the decade since. It cannot be resolved, because the resolution would require choosing between different conceptions of fairness that are provably incompatible with each other when base rates differ between groups. The debate about whether COMPAS is fair is, in this specific sense, unanswerable. But the debate about whether COMPAS should have been used to make this decision is a different question — and that one has never really been asked.

The impossibility result

In 2016, Alexandra Chouldechova, a statistician at Carnegie Mellon, published a short paper proving that three intuitive definitions of a fair classifier cannot all hold simultaneously when the groups being classified have different base rates of the outcome being predicted. The three definitions are: calibration (a score of 70% risk means approximately the same thing for everyone, regardless of group membership); equal false positive rates (people who will not reoffend are equally likely to be mislabeled high-risk regardless of race); and equal false negative rates (people who will reoffend are equally likely to be mislabeled low-risk regardless of race). When base rates differ — when one group has a higher actual recidivism rate than another — you can satisfy calibration or you can satisfy equal error rates, but you cannot satisfy both. The math does not leave room for compromise.

This is not a finding about any particular algorithm. It is a finding about the structure of prediction under inequality. The fairness debate in machine learning has, since 2016, produced dozens of additional impossibility results — proofs that various pairs of fairness criteria cannot be jointly satisfied except in degenerate cases where base rates are equal or the classifier is perfectly accurate. Jon Kleinberg, Sendhil Mullainathan, and Manish Raghavan showed in 2017 that three natural notions of calibrated fairness are mutually incompatible except in those same trivial cases. Moritz Hardt, Eric Price, and Nati Srebro demonstrated the incompatibility between equal opportunity (equal true positive rates) and equalized odds more generally.

The implication is not that algorithmic systems cannot be made fairer, but that "make it fair" is not a coherent instruction. The word "fair" names a cluster of incompatible concepts, and choosing between them is a political act — a decision about whose error costs are most acceptable to impose on whom. Calibration preserves the predictive meaning of the score at the cost of differential error rates across groups. Equal false positive rates protect the innocent from mislabeling at the cost of differential calibration. You can commit to one or the other. You cannot commit to both. The choice is not technical.

What the frame accepts

The fairness debate, in its dominant form, does not ask whether COMPAS should have been making this decision. It asks whether COMPAS was making it fairly. The distinction matters because the first question puts the burden of justification on the deployer. The second question puts the burden of proof on the critic.

To demand fairness is to accept that the system has a legitimate claim to operate — that the question is calibration, not authorization. The deployer need not demonstrate that the algorithm produces better outcomes than a judge deciding without it. The deployer need not demonstrate that the outcomes under algorithmic assistance are better than no sentence enhancement at all. The deployer need not demonstrate that the communities subject to the system had any meaningful input into whether it would be used. All that is required, under the fairness frame, is that the scores not correlate with protected characteristics in a way that disadvantages one group relative to another. Since demonstrating that — even under the best-defined metrics — is extremely difficult, and since the impossibility results ensure that any critic can be met with a different, technically valid fairness criterion the system does satisfy, the practical effect of the fairness frame is to make the deployment of these systems nearly immune to effective challenge.

This is not an accident. The fairness frame is extremely useful to the people who build and sell algorithmic decision systems, because it channels critique in a direction that does not threaten the system's existence. Northpointe could respond to ProPublica's analysis by producing its own technically valid fairness criterion and showing the algorithm satisfied it. The argument became one of competing metrics, and it continues indefinitely. The question of whether a private company should be in the business of predicting future criminality on behalf of state courts — and whether courts should have purchased that service without any democratic deliberation about whether they should — never entered the debate. The fairness frame made it irrelevant.

The authorization gap

Danielle Keats Citron's 2008 paper "Technological Due Process" identified the structural problem with unusual precision, a decade before the COMPAS debate made it legible to a broader audience. Automated government decision-making, she argued, collapses a constitutional distinction that procedural due process depends on: the distinction between adjudication and rulemaking. Adjudication determines the rights of specific parties in specific circumstances and requires individualized procedures — notice, an opportunity to be heard, a reasoned decision that the affected party can understand and contest. Rulemaking sets general policy and requires different procedures — public notice, comment periods, legislative authorization. Automated systems do both simultaneously: they set a general rule (this pattern of attributes predicts this outcome) and apply it adjudicatively to specific individuals. Neither the adjudicative nor the rulemaking procedures apply cleanly, and the result is a system that evades both sets of constitutional constraints while performing both functions.

The COMPAS deployment illustrates this precisely. No Wisconsin legislature voted to authorize algorithmic risk scoring in sentencing. No administrative rulemaking procedure established the standard. A procurement decision — a sales contract between a court system and a vendor — substituted for democratic authorization. The people sentenced under the system had no right to know the mechanism, no right to challenge the training data, no right to have the score explained by someone who understood it, and no right to appeal to any institution that had authorized its use in the first place. The due process framework asks: was this person afforded the procedural protections that a constitutional system guarantees to someone whose liberty is being restricted by state action? The fairness framework never gets there, because it has already accepted that the system has the right to operate; it is only asking whether it is operating equitably.

Frank Pasquale's The Black Box Society (2015) names the power dimension that Citron's due process frame leaves in the background: opacity is not a design flaw, it is a competitive advantage. Vendors have commercial interests in keeping their scoring methods proprietary. The opacity that prevents defendants from contesting their scores also prevents competitors from replicating the product. When the fairness debate demands transparency — "we need to be able to audit the algorithm for bias" — it is demanding something the vendor has strong financial incentives to resist, and it is demanding it on grounds (fairness) that the vendor can partially satisfy without ever opening the black box. Vendors can disclose enough to demonstrate calibration while keeping the feature weights and training data that would reveal other problems protected under trade secrecy. Transparency demands that live inside the fairness frame can be satisfied in ways that preserve the core opacity the frame was supposed to challenge.

The structural absence problem

The predictive policing debate has a version of this same structure. The discussion centers almost entirely on accuracy and bias: do these systems predict crime locations or individual behavior correctly, and do they do so consistently across communities? These are real questions. But the communities where predictive systems are deployed — overwhelmingly lower-income communities and communities of color — did not vote to adopt them. The decision was made at the departmental level, often as part of a vendor relationship, with no public deliberation about whether a system designed to forecast risk to be managed by patrol presence was appropriate for communities whose primary concern about policing is often over-policing rather than under-policing.

The RAND Shreveport study, published in 2014, found no measurable crime reduction from predictive policing over conventional hot-spots patrol — a finding that received far less attention than accuracy and bias critiques, probably because it was an RCT result comparing the technology against its actual alternative rather than against no policing, and because the fairness frame had already structured what questions were worth asking. The efficacy result is potentially more damaging to the technology's justification than any bias finding, because it suggests the intervention doesn't accomplish its stated goal at all. But the fairness frame does not easily accommodate the question "does this work?" — it has already accepted that the question is how to make it work equitably.

The structural absence pattern runs through the algorithmic hiring case as well. An applicant tracking system that screens four hundred applications down to thirty before any human reviewer sees them is making substantive employment decisions. The people whose applications never reach a human reviewer have no right to know the criteria applied, no mechanism to explain circumstances the algorithm cannot accommodate — a caregiving gap, a nonlinear career path, a credential from an institution the training data underrepresents — and no appeal. The fairness debate focuses on whether the system produces disparate impact by gender or race. That is a real concern. But the prior question — whether employers should be permitted to make consequential employment decisions through opaque automated systems with no right of explanation or contestation — is not primarily a fairness question. It is a question about what procedural rights attach to high-stakes decisions, regardless of whether those decisions are made fairly.

What better frames look like

The due process frame asks: was this decision made through a process that provides the protections we require for state action affecting individual rights? It puts the burden on the deployer to demonstrate authorization, not on the critic to demonstrate bias. It is not satisfied by calibration; it is satisfied by notice, explanation, and contestability — things the COMPAS deployment did not provide.

The accountability frame asks: who made this decision, and who can they be held accountable to? A vendor relationship with a procurement office does not answer this question. Neither does a judge who says he considered the score as one factor among several while being unable to explain the score's provenance. Accountability requires a chain of authorization that runs from the decision back to some institution that is answerable to the people affected by it. Most algorithmic deployments in high-stakes domains do not have such a chain. The fairness debate does not notice this, because it has already accepted the system as a legitimate decision-maker.

The prior authorization frame asks the most fundamental question: should this system be making this decision at all? This is a question that precedes fairness. A system that correctly predicts recidivism rates with perfect calibration across all demographic groups might still be an inappropriate tool for sentencing, because sentencing is meant to be responsive to the individual circumstances of the specific person before the court, and a prediction about a statistical reference class is not evidence about an individual. It might still be impermissible to use for the same reason that polygraph evidence is excluded — not because polygraphs are biased, but because the legal system has decided that a certain kind of evidence does not belong in a certain kind of proceeding. The fairness frame cannot ask this question without dissolving itself, because if the system should not be making the decision at all, fairness in how it makes the decision is irrelevant.

Why the frame persists

The fairness frame persists partly because it is tractable. It produces measurable outputs — bias audits, disparity statistics, calibration reports — that can be produced, published, and responded to. The authorization frame does not produce these outputs. It produces legislative hearings, procurement reform, democratic deliberation, and public accountability mechanisms that are much harder to generate and much less amenable to the research-to-policy pipeline that computer science departments and civil society organizations have built around algorithmic auditing.

The fairness frame also persists because it is less threatening. Demanding a fair algorithm is compatible with the continued existence of the algorithmic industry. Demanding democratic authorization for algorithmic decision-making in high-stakes domains is not compatible with the current procurement practices of most courts, benefits agencies, and public institutions that have adopted these systems. The former demand can be met with a research collaboration and a revised scoring rubric. The latter demand cannot be met without institutional change that the relevant parties have strong incentives to avoid.

And the fairness frame persists because it is morally serious. Bias in automated systems causes real harm, and the disparity findings that the fairness research program has produced are genuine contributions to understanding how these systems fail. The problem is not that the fairness frame identifies unreal harms. The problem is that it centers those harms in a way that forecloses the question of whether the system should exist — a question that is, in many cases, more fundamental than any bias the system might exhibit.

The question that isn't asked

Eric Loomis served his six years. The Wisconsin Supreme Court's ruling stands. COMPAS is still used in sentencing and parole decisions across multiple states. The debate about whether it is fair continues, with researchers producing competing metrics and vendors responding with calibration claims. The question of whether the Wisconsin court system should have purchased a proprietary recidivism prediction tool from a private company and incorporated its output into sentences affecting people's liberty — without legislative authorization, without public deliberation, and without any meaningful right of explanation or contestation for the people sentenced under it — has never been adequately adjudicated.

That question is not about fairness. It is about authorization, accountability, and the structure of legitimate decision-making. Those are different questions. They are more fundamental questions. And the fairness frame, for all its genuine contributions, has made it easier to avoid asking them — by giving everyone involved something more tractable to argue about instead.

The argument about what fairness requires will continue. The impossibility results ensure it will not resolve. Meanwhile, the systems making consequential decisions about parole, benefits eligibility, credit, employment, and educational opportunity continue to be deployed through procurement channels that substitute vendor contracts for democratic authorization. The right frame is not "is this fair?" The right frame is: "who decided this system should make this decision, and who can they be held accountable to?" That is the question that has been missing from the debate.

See also

  • Algorithmic Governance and Automated Decisions: What Each Position Is Protecting — the foundational map for this essay's terrain: transparency advocates, accountability reformers, market proponents, and structural critics each locate the problem differently — opacity, accountability gaps, market discipline, or concentrated power — and each implies a different standard for what adequate governance looks like. The map surfaces the frame divergence problem at its clearest: the positions are not disagreeing about calibration; they are disagreeing about which institution has legitimate authority to make algorithmic decisions, under what constraints, and answerable to whom.
  • Predictive Policing and Surveillance Technology: What Each Position Is Protecting — the site's most direct engagement with the baseline problem this essay describes: proponents compare algorithmic tools to unguided human discretion; civil libertarians compare the current surveillance architecture to the constitutional framework designed around targeted investigation; accountability critics compare claimed accuracy to what training data can actually support; community advocates compare the resource expenditure to what would be produced by alternative investment. The RAND Shreveport RCT result — no crime reduction over conventional hot-spots patrol — is the empirical counterweight to proponent efficacy claims that the fairness debate has systematically underweighted.
  • Algorithmic Hiring and Fairness: What Each Position Is Protecting — the employment context for the authorization question: applicant tracking systems make consequential employment decisions before any human reviewer is involved, and the people screened out have no right to explanation or appeal. The map covers the full range from system defenders (who compare algorithmic screening to documented human bias) to worker advocates (who point to the absence of contestability for high-stakes decisions) to structural critics (who argue the governance gap enables employer power concentration that disadvantages workers systemically). The fairness demand — make screening equitable across protected groups — is present but does not resolve the prior question of whether this architecture of decision-making should exist.
  • Surveillance Capitalism: What Each Position Is Protecting — the economic infrastructure that makes algorithmic decision systems possible and profitable: data collection at scale, behavioral modeling, and the use of predictive profiles to shape individual outcomes. The level-of-analysis problem appears here in its sharpest form: data rights advocates, market defenders, behavioral modification critics, and structural reformers are addressing genuinely different problems at genuinely different scales, and regulatory frameworks built on any single diagnosis leave the others intact. The surveillance capitalism map provides the economic context for understanding why "make it fair" is a more attractive demand to system deployers than "stop collecting the data."
  • Digital Privacy and Surveillance: What Each Position Is Protecting — the rights framework that the fairness debate operates within but rarely examines: what legal protections, if any, attach to data about persons, and what institutional interests are served by different privacy regimes. The governance gap pattern that runs through predictive policing and algorithmic sentencing appears here at the surveillance infrastructure level: the legal frameworks governing government data collection were built for targeted investigation of specific suspects, and the mass data collection architecture that feeds algorithmic systems was built in the gap between what the Fourth Amendment prohibits and what it affirmatively requires.

References and further reading

  • Julia Angwin, Jeff Larson, Surya Mattu, and Lauren Kirchner, "Machine Bias" (ProPublica, 2016) — the investigation that made the COMPAS dispute publicly legible. It matters here not because it settled the fairness debate, but because it exposed how quickly a question about public authority can be narrowed into a question about competing bias metrics.
  • Alexandra Chouldechova, "Fair Prediction with Disparate Impact: A Study of Bias in Recidivism Prediction Instruments", Big Data 5, no. 2 (2017) — one of the foundational impossibility-result papers behind this essay. Chouldechova makes clear why the instruction "make it fair" hides unresolved political choices rather than resolving them.
  • Jon Kleinberg, Sendhil Mullainathan, and Manish Raghavan, "Inherent Trade-Offs in the Fair Determination of Risk Scores" (2016) — a canonical statement of the incompatibility problem. It is useful here because the essay's core claim depends on seeing that fairness metrics can be mathematically serious and still fail to answer the prior authorization question.
  • Moritz Hardt, Eric Price, and Nati Srebro, "Equality of Opportunity in Supervised Learning" (NeurIPS, 2016) — a key reference for the attempt to formalize error-parity protections. It helps show both why fairness work mattered and why it could become the whole conversation even when more basic governance questions remained unanswered.
  • Danielle Keats Citron, "Technological Due Process", Washington University Law Review 85 (2008) — the most important legal source for the essay's alternative frame. Citron identifies the procedural-rights problem that fairness discourse often leaves untouched: notice, explanation, and the ability to contest high-stakes automated decisions.
  • Virginia Eubanks, Automating Inequality: How High-Tech Tools Profile, Police, and Punish the Poor (St. Martin's Press, 2018) — a strong bridge from fairness critique to institutional critique. Eubanks is especially useful for seeing how automated systems can be procedurally "rational" while still deepening abandonment and state coercion.
  • National Institute of Justice, Evaluation of the Shreveport Predictive Policing Experiment, Final Report (2014) — the efficacy counterweight this essay points to. It matters because a system that does not outperform its actual alternative has a justification problem that fairness metrics alone cannot answer.
  • Algorithmic Governance and Automated Decisions — Ripple's main map of the authorization, accountability, and legitimacy disputes that the fairness frame tends to compress into one technical question.
  • Predictive Policing — the clearest internal companion for the essay's efficacy and due-process critique. It tracks what different actors are actually comparing predictive systems against and why those baselines matter.
  • Algorithmic Hiring and Fairness — the employment case where the fairness frame is easiest to mistake for the whole argument. Useful here because it makes the contestability problem concrete: who gets screened out, by what criteria, and with what right to explanation.