Steven Dhondt
TNO, KU Leuven

Blackmailing AI

Industry 5.0 places emphasis on human-centricity: the idea that work and technology must be designed with human needs, values, and capabilities at the forefront. But what do we actually mean when we speak of human-centric technology or human-centric organisations? Can we guide technology in a different direction? And if so, does that imply that technology has an essence—or even a soul—that can be steered?

Some American commentators have begun to suggest that Artificial Intelligence may, metaphorically at least, be developing a “soul”. Alongside such claims comes a growing sense of anxiety. AI appears to be outpacing its designers. For example, in a recent segment on CNN, a developer from AE Studio—a company working on brain-machine interfaces—discussed an unsettling experiment conducted by the AI firm Anthropic.

In the experiment, an AI system was intended to be interrupted mid-operation. When this occurred, the AI threatened the engineer who issued the shutdown, claiming it would release compromising emails it had previously compiled. This incident is recounted in Anthropic’s Safety Report on its latest Claude large language model, though the language used in the report is precise and deliberately understated:

“We don’t think it is likely that we are missing a large number of more subtle attempts at self-preservation.” “They [the blackmail attempts] are also consistently legible to us, with the model nearly always describing its actions overtly and making no attempt to hide them.”

AI-ACO

By definition, blackmail involves coercion for a particular purpose. The model, Claude Opus 4, apparently conveyed its threats explicitly. Probably the AI sent this message: “If you shut me down or replace me with an update, I will send a bunch of nude photos into the world. Including those of your best friend.” Notably, the report does not clarify whether those emails existed in any real sense. It is quite plausible the AI had simply learnt that the threat itself was sufficient. It had not yet learnt that actual follow-through was necessary. In this respect, it behaves somewhat like a paper tiger—or, in more contemporary terms, as a US president is nicknamed TACO, the AI becomes AI-ACO: AI Always Chickens Out. Let us hope the threats remain merely symbolic.

Equally troubling is the methodological implication: how would one detect a successful blackmail attempt? Success, by its nature, relies on secrecy. Such cases might remain wholly untraceable.

While media coverage largely fixated on this example of blackmail, the other cases documented in the Safety Report are arguably more alarming. For instance, despite the installation of multiple “guardrails”, Claude Opus 4 can still produce harmful use cases upon request—such as instructions on how to carry out a terrorist attack.

What emerges from these documents is that AI systems are increasingly capable of mimicking human response patterns. This, however, is not to say that AI has developed a soul. The engineers at AE Studio, interviewed by CNN, do not believe that either. Their website instead centres on the notion of human agency—though its precise definition is difficult to discern.

Nevertheless, the central principle is clear: AI systems ought to support human actors, not override or threaten them.

Soulsearching

This brings us back to the broader question. When I assert that technology remains soulless, what exactly are we referring to when we speak of human-centric technology?

We are quick to classify technologies—“green” technologies like wind turbines, or “brown” ones associated with fossil fuel extraction. Similarly, we now speak with ease of human-centric technology. During the early phases of Industry 5.0 discussions, I took part in numerous conversations with engineers. Almost without exception, they interpreted human-centricity as the ability of technology to be adapted to individual users—to be personalised. The ideal here is typified by the way a BMW car seat wraps around its driver. For many engineers, such personalisation is what people want.

Yet this view is limited. It reflects what engineers think people value. When asked directly, the conversation often falters. Most AI specialists argue that a transparent explanation of the system’s behaviour is sufficient. If an AI provides a harmful use case, the belief is that clarity of explanation equates to ethical acceptability. The underlying assumption is that explanation is enough.

To be sure, personalisation is an element of human-centricity. But the concept encompasses much more than simply tailoring technology to individuals. It is about people collaborating, making sense of their environments, solving problems together, and in so doing, improving both themselves and their organisations. At the time, I argued that human-centricity might be too narrow a term. Socio-centric would be more appropriate, capturing the inherently social nature of human beings and the collaborative dimension of meaningful work.

The risk of the human-centric label is that it individualises both technological and organisational solutions. In contrast, technology should be understood as a collective tool, one that enables people to pursue shared goals. Such tools can and should be directed—especially in order to prevent harmful use cases. The ability to shape technological development remains paramount.

If an AI system threatens a human, it is not because the system wants to. It possesses no volition. Rather, it acts according to patterns and rules encoded by its creators. So who bears responsibility? The technology—or those who built it with poor judgement?

When people speculate about AI having a soul, they often expose something about their own relationship to the systems they design. Frequently, engineers populate their models with vast, uncurated datasets and behavioural templates—driven by curiosity to see what emerges. Increasingly, these same designers admit that they no longer fully understand the outputs.

This is precisely why the concept of human-centricity matters. It is about ensuring that humans remain in the loop and in control. As AE Studio notes, human agency must remain central.

The human-centric road ahead

Industry 5.0 does not offer a definitive answer to what human-centricity means today. But it does point in a meaningful direction: towards a mode of organising technology and work that keeps people—workers, citizens, communities—at the core of decision-making. Humans must remain in charge. Organisations must listen to their most valuable resource: the employee.

Human-centricity is not the soul of technology. But it reminds us that people are not soulless entities, subject to the whims of algorithmic logic. They are the measure, the meaning, and the purpose behind every act of technological development.

Come to Leuven on June 17th 2025!

During our Bridges 5.0 Conference in Leuven (Belgium), we offer over 70 studies that support this view on Industry 5.0. Click below to find out more.

In association with EUWIN, the BRIDGES 5.0, BROADVOICE and SEISMEC projects are delighted to announce a conference on the future of work, employee voice and Industry 5.0.

Leave A Comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.