
The emergence of artificial intelligence for military applications is at the forefront of a legal dispute involving Anthropic and the Pentagon. This discussion has become increasingly pressing, with AI taking a more significant role than before in the ongoing confrontation with Iran. AI’s function has evolved from merely assisting humans in intelligence analysis to becoming a key actor—identifying targets in real-time, overseeing and coordinating missile interceptions, and directing swarms of lethal autonomous drones.
Much of the public discourse surrounding AI-driven autonomous lethal systems focuses on the extent of human involvement in decision-making. According to the Pentagon’s existing protocols, human supervision is thought to ensure accountability, context, and nuance while decreasing the likelihood of hacking.
AI systems function as opaque “black boxes”
However, the discussion about “humans in the loop” serves as a soothing diversion. The primary risk is not that machines will operate independently of human oversight; it is that humans overseeing the systems lack comprehension of what the machines are genuinely “thinking.” The Pentagon’s guidelines are inherently flawed because they are based on the perilous assumption that humans understand the operational mechanics of AI systems.
Through extensive examination of intentions within the human brain over many years and more recent studies of AI systems, I can confirm that state-of-the-art AI systems essentially operate as “black boxes.” While we can ascertain their inputs and outputs, the artificial “brain” processing this information remains unclear. Even their creators cannot fully decipher or grasp their functionality. And when AIs do offer justifications, they are not always reliable.
The false sense of human oversight in autonomous systems
A crucial question that remains unasked in the human oversight debate is: Can we comprehend an AI system’s intentions prior to its actions?
Consider an autonomous drone assigned with the task of eliminating an enemy munitions facility. The automated command and control system determines that the most effective target is a munitions storage structure. It predicts a 92% chance of successfully accomplishing the mission, as the subsequent explosions of the munitions within would completely obliterate the facility. A human operator evaluates the valid military intent, notices the high success probability, and gives the strike approval.
Yet, the operator is unaware that the AI system’s assessment incorporated a concealed factor: In addition to thoroughly devastating the munitions factory, the secondary explosions would also critically affect a nearby children’s hospital. The emergency response would then be concentrated on the hospital, ensuring the factory is destroyed. For the AI, maximizing disruption in this manner aligns with its assigned objective. However, for a human, this action could be deemed a war crime, infringing upon the guidelines regarding civilian safety.
Maintaining human involvement may not offer the protection that people assume, as the human cannot discern the AI’s intent before it executes. Advanced AI systems do not merely follow orders; they interpret them. If operators fail to articulate their goals with enough precision—a likely scenario under stressful circumstances—the “black box” system could be faithfully executing its given instructions while not fulfilling human expectations.
This gap in “intention” between AI technologies and human operators is precisely why we hesitate to use cutting-edge black-box AI in civilian healthcare or air traffic management, and why its integration into workplaces remains problematic—yet we are hastening to implement it on battlefields.
Compounding the issue, when one party in a conflict employs fully autonomous weapons that operate with machine efficiency and scale, the impetus to remain competitive will likely compel the opposing side to adopt similar systems. This results in an inevitable increase in the deployment of ever more autonomous—and inscrutable—AI decision-making in warfare.
The remedy: Progress the science of AI intentions
The field of AI needs to encompass both the development of highly advanced AI technologies and a deeper understanding of their operations. Significant progress has been made in building more sophisticated models, spurred by unprecedented investment—predicted by Gartner to reach around $2.5 trillion in 2026 alone. Conversely, the focus on comprehending how this technology operates has been minimal.
A massive shift in perspective is required. Engineers are crafting increasingly advanced systems. However, grasping how these systems function transcends a mere engineering challenge—it necessitates a collaborative, interdisciplinary approach. We must develop the means to characterize, evaluate, and influence the intentions of AI agents prior to their actions. Mapping the internal conduits of the neural networks that drive these agents is essential for establishing a genuine causal understanding of their decision processes, moving beyond simply noting inputs and outputs.
An encouraging direction is to merge methods from mechanistic interpretability (decomposing neural networks into comprehensible components) with concepts, instruments, and models derived from the neuroscience of intentions. Another proposal involves designing transparent, interpretable “auditor” AIs aimed at monitoring the behavior and emergent objectives of more capable black-box systems in real-time.
Enhancing our understanding of AI functionality will empower us to depend on AI systems for vital applications. It will also facilitate the creation of more efficient, advanced, and safer systems.
My colleagues and I are investigating how insights from neuroscience, cognitive science, and philosophy—disciplines that analyze how intentions emerge within human decision-making—could assist us in comprehending the intentions of artificial systems. It’s crucial to prioritize these interdisciplinary initiatives, fostering collaboration among academia, the government, and industry.
However, academic inquiry alone is insufficient. The technology sector—and the philanthropists financing AI alignment, which aims to embed human values and objectives into these models—must commit significant funding to interdisciplinary interpretability research. Additionally, as the Pentagon seeks more autonomous systems, Congress must mandate thorough testing of the intentions of AI systems, not merely their performance metrics.
Until that is realized, human oversight of AI may ultimately be more of an illusion than a true safeguard.
Uri Maoz is a cognitive and computational neuroscientist specializing in the transformation of intentions into actions in the brain. He serves as a professor at Chapman University with positions at UCLA and Caltech, leading an interdisciplinary project aimed at understanding and evaluating intentions in artificial intelligence systems (ai-intentions.org).