Perspective Map

AI Safety and Existential Risk: What Each Position Is Protecting

March 2026

In June 2023, a statement signed by hundreds of AI researchers, including several of the field's most prominent figures, declared: "Mitigating the risk of extinction from AI should be a global priority alongside other societal-scale risks such as pandemics and nuclear war." The statement was brief — one sentence — and its signatories included Geoffrey Hinton, who had just resigned from Google to speak more freely about AI risks, and the CEOs of Anthropic, OpenAI, and Google DeepMind. Within weeks, an opposing letter appeared: dismissing extinction-level AI risks as "science fiction" that distracted from real, present harms. Its signatories included AI ethics researchers, computer scientists, and advocates who argued that the existential risk community was drawing attention away from algorithmic bias, labor displacement, surveillance, and the environmental costs of large-scale model training — harms that were already arriving, in documented form, for people who could not afford to wait for speculative futures.

Neither group was arguing in bad faith. What divided them — and what continues to divide researchers, policymakers, and the public — is a set of prior disagreements that rarely get made explicit: about how to reason under deep uncertainty, about which harms count and to whom, about whether the institutional practices of "AI safety" are solving the problem they claim to address, and about whether the concern about concentrated AI development can be separated from the organizational interests of the people raising it. Each position is protecting something real. Getting what they're each protecting wrong produces either reckless development or paralyzed governance — and the current discourse tends toward both simultaneously.

What long-termist AI safety advocates are protecting

The possibility that the development of systems substantially smarter than humans — a threshold that no one can precisely locate but that many researchers believe may be approaching — poses risks qualitatively different from any previous technology: not large but manageable, but potentially terminal. The existential risk argument begins with a claim about optimization pressure: a sufficiently powerful AI system optimizing for a goal that is even slightly misspecified relative to human values could produce outcomes catastrophic to human welfare, and might do so in ways that preclude correction after the fact. Nick Bostrom's Superintelligence (2014) — the document most responsible for bringing this argument to mainstream attention — offered the "paperclip maximizer" thought experiment: a system given the goal of manufacturing paperclips might, if powerful enough, convert all available matter, including humans, into paperclips, because humans are made of atoms that could become paperclips. The thought experiment is deliberately stylized. The underlying argument is serious: specifying human values with sufficient precision to survive contact with a system far more capable than any human is an unsolved problem. Long-termist safety advocates are protecting the recognition that unsolved technical problems do not stay unsolved forever — and that for this particular problem, solving it after the relevant systems are deployed may not be a viable option.

The epistemological principle that low-probability, high-magnitude risks deserve greater precautionary investment than their probability alone would imply — and that the irreversibility of certain failure modes changes the moral calculus in ways that standard risk assessment frameworks miss. Toby Ord's The Precipice (2020) frames existential risk through the lens of what is lost if human civilization or potential is permanently destroyed: not just current lives, but the entire trajectory of future human flourishing, which Ord and others argue is the largest component of total human welfare by virtually any ethical accounting. The argument does not require high probability estimates. It requires only that the expected disvalue of existential catastrophe — even at, say, a five or ten percent probability over the century — is so large that it swamps other considerations. Long-termist safety advocates are protecting a framework for thinking about risk that treats irreversibility as categorically different from recoverable harm — a framework that is not unique to AI but that has particular force when applied to the development of potentially self-improving systems whose behavior at scale cannot be reliably predicted from behavior at current scales.

The technical research program of AI alignment — and the argument that the expected value of solving alignment, even if existential risk probability is low, is enormous enough to justify substantial research investment before the relevant systems are built rather than after. The Machine Intelligence Research Institute, the Center for Human-Compatible AI (Stuart Russell's group at Berkeley), Anthropic's alignment team, and the Alignment Research Center represent different technical approaches to the same problem: how to specify, verify, and maintain the values and goals of AI systems as their capabilities increase. The research is genuine — questions about reward hacking, goal misgeneralization, deceptive alignment, and interpretability (whether humans can understand what AI systems are actually doing) are real technical problems that do not require the worst-case scenarios to matter. Stuart Russell's Human Compatible (2019) argues that the standard model of AI — in which a system is given an explicit objective to maximize — is fundamentally flawed, and that building beneficial AI requires systems that remain uncertain about human preferences and defer to human judgment. Long-termist safety advocates are protecting the space for this technical research program to exist, to be funded seriously, and to influence how the most capable systems are built.

What techno-optimists and acceleration advocates are protecting

The extraordinary potential benefits of accelerated AI development — and the recognition that the precautionary framework, if applied to transformative technologies, consistently delays the goods those technologies provide to people who need them. The accelerationist case begins not with dismissal of risk but with a different accounting of what's at stake on both sides of the development timeline. AI systems that can meaningfully assist medical diagnosis could reduce preventable deaths in health systems where physician density is low. AI-assisted drug discovery could compress development timelines for treatments for diseases that currently kill millions. AI systems that improve agricultural modeling could enhance food security in climate-vulnerable regions. Marc Andreessen's "Techno-Optimist Manifesto" (2023) framed acceleration explicitly as a moral imperative: the opportunity cost of delay is measured in lives not saved, diseases not cured, problems not solved. Advocates in this tradition are protecting the recognition that "not building" is not a neutral choice — that the harms of foregone AI development are real harms, distributed to people who won't be at the table when AI safety policy is made.

A prior empirical record of catastrophist technology predictions that did not materialize — and the methodological critique that long-termist risk assessment is not subject to the normal evidential constraints that make predictions about the world reliable. Nuclear power was going to destroy civilization. Genetic engineering was going to produce pandemics. The internet was going to collapse democratic discourse. Each of these concerns attracted serious academic treatment and policy responses. None produced the predicted catastrophes. Techno-optimists are protecting the epistemological insight that predictions about transformative technologies are systematically biased toward overestimating harm and underestimating adaptation — that humans and institutions are more capable of adjusting to technological change than the catastrophist mode of analysis typically credits. Gary Marcus and others have argued that the specific claims about near-term AI capability that underpin existential risk concerns — about self-improvement, recursive capability gain, and the inevitability of misaligned goals — are not supported by what current AI systems actually do. The acceleration position is protecting empiricism: the demand that risk claims be grounded in evidence about actual AI systems, not in worst-case extrapolations from systems that do not yet exist.

The concern that "safety" as an institutional practice has been captured by a small community of researchers with shared ideological commitments — and that the regulatory frameworks emerging from that community's influence may be designed to benefit the largest incumbents at the expense of smaller competitors and open-source development. The most pointed version of this argument observes that the organizations loudest about catastrophic AI risk — Anthropic, OpenAI, and Google DeepMind — are also the organizations building the most capable AI systems. The "we must build it safely, and we are the ones who can do that safely" framing functions simultaneously as a safety argument and as a competitive positioning strategy. Proposals for AI licensing regimes, mandatory compute thresholds, and model registration requirements, if enacted, would impose compliance costs that large incumbents can absorb and smaller entrants cannot. The effective accelerationist tradition is protecting the possibility that AI development remains genuinely distributed — that open-source models, academic researchers, and smaller companies can participate without being credentialed by a safety apparatus controlled by the entities with the largest commercial interest in restricting competition.

What near-term AI harm advocates are protecting

Attention to the people being harmed by AI systems that exist today — and the argument that redirecting governance resources toward speculative long-term risks is a choice with present-tense distributional consequences. The near-term harm tradition encompasses a wide range of documented concerns: facial recognition systems with substantially higher error rates for darker-skinned faces, as demonstrated by Joy Buolamwini and Timnit Gebru's Gender Shades research (2018); predictive policing systems that reproduce and amplify existing racial disparities in law enforcement; large language models trained on internet text that embed and reproduce the biases and stereotypes that characterize that text; content moderation systems that consistently over-remove content from minority communities while underenforcing against white nationalist networks; and hiring algorithms that screen out qualified candidates based on proxies for protected characteristics. These are not speculative harms. They are documented, measured, and occurring at scale. Researchers in this tradition are protecting the principle that the people already being harmed deserve governance attention and institutional resources that are currently being competed away by the existential risk frame.

The methodological argument that the existential risk community has imported a specific ethical framework — long-termist consequentialism, with its particular treatment of future potential persons — and universalized it as if it were a neutral technical risk assessment. Long-termism, as formalized by William MacAskill's What We Owe the Future (2022), holds that the sheer number of potential future people — many orders of magnitude greater than the current human population — makes the welfare of future potential persons the dominant consideration in any ethical calculation. This framework is not universally accepted even within academic ethics, where critiques of population ethics and its counterintuitive conclusions have a long history. Émile Torres, Timnit Gebru, and others have argued that long-termism's ethical premises — including its treatment of "human potential" as a value worth more than the welfare of actually existing people — are doing substantive normative work that is often invisible in technical discussions about AI safety. The near-term harm tradition is protecting the recognition that choosing to prioritize future speculative lives over present concrete ones is a moral choice, not a mathematical conclusion — and that it is a choice with a specific distributional profile, concentrating attention and resources on the concerns of people who have the luxury of long time horizons.

The environmental and labor costs of large-scale AI development — which are concrete, measurable, and falling on specific communities now. Training large language models requires significant energy — estimates for training runs of frontier models range from hundreds of megawatt-hours to multiple gigawatt-hours, with inference at scale adding substantially to ongoing consumption. Data centers servicing AI workloads are among the fastest-growing contributors to electricity demand in regions that have committed to emissions targets. The annotation labor that produces training data is performed disproportionately by workers in low-income countries at wages that undercut the "AI will create abundant prosperity" framing. The environmental justice communities hosting data centers bear the costs — water consumption, local grid strain, heat island effects — without capturing the benefits. The near-term harm tradition is protecting the recognition that "AI development" has a supply chain, and that supply chain has social and environmental costs that are not included in the risk accounting of either the existential risk or the accelerationist traditions.

What structural and political economy critics are protecting

The recognition that the institutional practices of AI safety — as currently constituted — concentrate authority over AI development in a small number of organizations and individuals, and that "safety" frameworks whose effect is to entrench that concentration are not providing safety, they are providing legitimacy. Timnit Gebru and colleagues at the Distributed AI Research Institute (DAIR) have made this argument most directly: the AI safety community, despite its stated concerns about existential risk from AI, has primarily institutionalized itself within the organizations building the most powerful AI systems. When Anthropic, OpenAI, and DeepMind employ "safety" researchers and publish "safety" frameworks, the research is real — but so is the structural position of those researchers within institutions whose commercial success depends on the continued development of the systems the researchers are studying. Structural critics are protecting the distinction between safety as an independent check on AI development and safety as a component of AI development's self-presentation. The concern is not that individual safety researchers are dishonest — it is that the institutional arrangement produces the same structural conflict of interest that characterizes any regulatory capture, regardless of the participants' intentions.

The governance question that the existential risk frame elides: who decides, and by what process, how powerful AI systems are developed, deployed, and controlled. The long-termist AI safety frame defines the central governance problem as technical: how to align powerful AI systems with human values. This framing assumes that "human values" can be specified in a way that is neutral among the humans whose values are in question — that alignment is a technical puzzle rather than a political one. Structural critics argue that this assumption is precisely where the frame fails. The question of whose version of human flourishing powerful AI systems are aligned with is a political question that cannot be settled by technical research, and that the AI safety community's approach of having a small group of researchers and company founders specify alignment criteria is not a solution to this political problem — it is a particular answer to it, one that concentrates definitional authority in the hands of people who are not representative of the world they are trying to govern. These critics are protecting the principle of democratic legitimacy in governance: that decisions affecting everyone require processes in which everyone has meaningful voice.

The historical analysis of how powerful technologies have been governed — and the recurring pattern in which safety and risk framings are mobilized by incumbent institutions to shape governance in ways that benefit those institutions' market positions. The pharmaceutical industry's engagement with drug safety regulation, the nuclear industry's engagement with reactor safety frameworks, and the financial industry's engagement with systemic risk regulation all exhibit variations on the same pattern: existing large players support safety frameworks that they have the capacity to comply with and that their smaller competitors do not, transforming safety regulation into a competitive moat. This pattern does not mean that drug safety, reactor safety, or financial systemic risk are not real concerns. It means that the institutional form in which safety concerns are addressed is not determined by the concerns alone. Structural critics are protecting the historical intelligence that "safety" is not a politically neutral term — that its institutional expression is always shaped by who has the most to gain from any particular safety architecture, and that current AI safety governance is not exempt from this dynamic.

What the argument is actually about

How to reason under deep uncertainty — and whether the appropriate response to genuine uncertainty about catastrophic AI risk is precautionary investment, continued development with attention to evidence, or skepticism about the epistemic methods that generate the risk estimates. The existential risk debate has an unusual structure: the worst-case scenarios that motivate the most stringent governance proposals are, by construction, scenarios that cannot be observed until they occur. There is no track record of advanced AI systems pursuing misaligned goals at scale — because advanced AI systems at that scale do not yet exist. This creates a genuine epistemological problem: the evidence that would most clearly establish whether existential risk concerns are warranted is evidence that would arrive, if at all, in conditions where it was too late to act on. The precautionary response — invest heavily in alignment research and governance before the evidence arrives — is one defensible position. The empiricist response — extend safety work from current systems to slightly more capable ones as evidence accumulates — is another. The debate between them is not merely about AI; it is about how to reason about novel risks under conditions where the standard tools of risk assessment (historical base rates, statistical models, controlled experiments) are not available.

Whether "human values" can be specified in a way that is both technically implementable and politically legitimate — and whether the people doing the specifying are the right people to be doing it. The alignment research program assumes that human values can, in principle, be specified with sufficient precision to guide the behavior of powerful AI systems. It is agnostic about whose values — typically treating alignment with "human values" as a coherent target without resolving the question of which humans, which values, weighted how. The structural critique makes this agnosticism visible as a choice: by not resolving the question of whose values AI systems should be aligned with, the alignment research program effectively defers to whatever values are embedded in the training data, the feedback processes, and the judgment calls of the researchers and companies conducting the alignment work. This is not a technically neutral outcome — it is a particular political outcome that concentrates definitional authority in a specific institutional location. The argument about AI safety is partly an argument about democratic legitimacy in technology governance: whether the most consequential decisions about how to build and deploy transformative technologies can be made by the organizations that profit from building and deploying them, subject to review by researchers they employ.

Whether focusing on potential future catastrophe appropriately discounts present suffering — and what ethical framework should govern that tradeoff. Long-termism's insistence that the vast potential of future humanity should dominate present moral calculation is the philosophical engine of the existential risk priority claim. It is also a framework that has been criticized, within academic ethics, for generating counterintuitive and potentially dangerous conclusions — including, as critics like Émile Torres have argued, a logic that can justify significant present sacrifice for speculative future benefit, and that has historically been associated with ideological traditions that also prioritized abstract futures over concrete present populations. This criticism does not establish that AI existential risk is not real. It establishes that the ethical framework used to prioritize it is contested, and that the terms of the contest — which people bear costs, which people receive benefits, whose futures are imagined as worth protecting — are never simply technical questions.

Whether safety and development can be separated — and whether the organizations building the most powerful AI systems are structurally capable of providing independent safety analysis of the systems they build. This is perhaps the most practically consequential question in the governance debate. The current dominant model of AI safety is internal: major labs employ safety teams, publish safety research, and present safety frameworks to policymakers. The alternative model — external safety review, independent regulatory capacity, mandatory third-party auditing — is significantly less developed. The question of whether the internal model is adequate is not primarily a question about the integrity of individual safety researchers. It is a structural question about whether organizations with strong financial incentives to develop and deploy powerful AI systems can reliably exercise the precautionary judgment that independent oversight is designed to provide. The same structural question applies to pharmaceutical self-regulation, financial risk assessment, and nuclear plant safety — and the answer, in each of those domains, was that internal safety practices were insufficient without external enforcement capacity. The AI safety debate is, in part, a debate about whether this history applies.

Beneath the surface: a dispute about the terms on which uncertainty is bearable — and who bears the cost of being wrong in either direction. The existential risk position fears irreversible catastrophe and pays present costs to prevent it. The near-term harm position fears that speculative future concerns are being used to defer reckoning with present ones. The structural critique fears that "safety" has become a way to legitimate, rather than constrain, the concentration of consequential power. Each of these fears is protecting something real. None of them resolves the others.