How to Control the Capabilities of Superintelligence?

By Tony Czarnecki


People, like Nick Bostrom, one of the top experts on Superintelligence, think we need to invent some controlling methods to minimize the risk of Artificial General Intelligence (AGI) going terribly wrong. He defines these methods in his book “Superintelligence” (Bostrom, 2014). For our purpose I will try to provide a layman’s description of what it really means and what are the consequences for controlling the risks emerging from Superintelligence. The most important point is that these controlling methods must be in place before Superintelligence arrives, i.e. latest in this decade. Nick Bostrom identifies the ‘control problem’ as the ‘principal-agent’ problem, a well-known subject in economic and regulatory theory. The problem can be looked from two perspectives:

  • The first ‘principal-agent’ problem: e.g. the problem faced by a client wanting to buy a house and employing an estate agent to fulfil exactly the client’s objective. In this scenario, the client is the principal (the person who wants some task to be performed in accordance with his interests), and an estate agent is the agent (the person carrying out the tasks on my behalf).
  • The second ‘principal-agent’ problem: e.g. the problem where the estate agent thinks primarily about his own interest, e.g. to get the best possible agent’s fee

He dedicates a whole chapter to identify potential solutions. Since the publication of the book in 2013, they have been widely discussed in the AI community on how to turn them into practical tools. Bostrom splits them into two groups: Capability Control and Motivation Selection, which I have tried to put in as much as possible in layman’s terms in the following subsections.

The ‘Control Problem’ involves human principals (sponsors or financing institutions) and human agents (AI developers). At some stage there will be an AI project to develop Superintelligence (AGI). It may be launched by one of the big IT/AI companies such as Google, Microsoft, IBM or Amazon. But it is also quite likely it will be initiated by some wealthy AI backers, which is already happening. Probably the most prominent among such people deeply involved in various top AI initiatives is Elon Musk. He is the founder of PayPal – a credit transaction payment system, SpaceX – rocket company, Hyperloop – a network of underground trains travelling at speeds of nearly 1,000 km/h, Neuralink a brain-computer interface venture, and several other large scale initiatives such as sending 1 million people to Mars by 2050. The second one is Jeff Bezos, the founder of Amazon and the richest man on the planet with assets of over $150bn, who is deeply involved in AI. His micro AI-product called Alexa Echo was sold to over 20m people by the end of 2017.

Such sponsors will need to ensure that AI developers carry out the project in accordance with their needs. They would also want to ascertain that the developers understand their sponsors’ needs correctly and that the developed AI product, which may turn into Superintelligence, will also understand and obey humans as expected. Failure to address this problem could become an existential risk for Humanity.

Bostrom specifies four possible solutions for a principal-agent problem, which he calls the “Capability Control Method”. Its purpose is to tune the capabilities of superintelligent agent to the requirements of humans in such a way that we stay safe and have the ultimate control on what Superintelligence can do.

Keep it in a Box (Boxing Methods of Control)

This is perhaps the simplest and most intuitively compelling method of controlling Superintelligence – putting it into a metaphorical “box”, i.e. a set of protocols that constrain the way in which Superintelligence could interact with the world, always under the control of humans. It is often proposed that as long as Superintelligence is physically isolated and restricted, or “boxed”, it will be harmless.

A typical Superintelligence will be a superbly advanced computer with sophisticated algorithms (procedures how to process information) and will have three components: a sensor (or input channel); a processor; and an actuator (or output channel). Such superintelligent agent will receive inputs from the external world via its sensors, e.g. Wi-Fi, radio communication, chemical compounds, etc. It will then process those inputs using its processor (computer) and will then respond (output information or perform some action using its actuators). An example of such an action could be advising on which decision should be made, to switch on or off certain engines, or completing financial transactions. But they could also be potentially significant, e.g. whether a chemical compound would be safe for humans at a given dose.

However, it is highly unlikely that a superintelligent agent could be boxed in this way in the long term. Once the agent becomes superintelligent, it could persuade someone (the human liaison, most likely) to free it from its box and thus it would be out of human control. There are a number of ways of achieving this goal, some are included in the Bostrom’s book, such as:

  • Offering enormous wealth, power and intelligence to its liberator
  • Claiming that only it can prevent an existential risk
  • Claiming it needs outside resources to cure all diseases
  • Predicting a real-world disaster (which then occurs), then claiming it could have been prevented had it been let out

To counter such possibilities, there are some solutions that would decrease the chance of superintelligent agent escaping the ‘Box’, such as:

  • Physically isolating Superintelligence and permitting it zero control of any machinery
  • Limiting the Superintelligence’s outputs and inputs with regards to humans
  • Programming the Superintelligence with deliberately complex logic
  • Periodic resets of the Superintelligence’s memory
  • A virtual world between the real world and the AI, where its unfriendly intentions would be first revealed

However, as you yourself maybe aware, physical isolation is a solution that could be extremely difficult to control. It is already being severally thwarted by the rapid spread of Internet of Things (IoT), little gadgets like opening the door, switching on/off ovens, fridges, lights etc., which could be controlled at your home while you are away on the other side of the globe.

Incentive Method

Bostrom refers to the second capability control method as the “incentive” method. The idea seems to be that if you create the right “incentive environment”, then the Superintelligence wouldn’t be able to act in an existentially threatening manner. This is in some way an analogy to how to bring up a child. A child has its own goals, which may not be good for itself or the people around it right now or in the future. So, a good teacher can motivate his child in such a way that it behaves in morally and socially acceptable ways.


“Stunting”, as the name implies, involves hampering or disabling Superintelligence in some way. A good example would be running Superintelligence on a slow hardware, reducing its memory capacity, or limiting the kind of data it can process. Bostrom argues that the use of stunting poses a dilemma. Either we stunt Superintelligence too much and it just becomes another “dumb” piece of software; or we stunt it too little and it would be capable of overcoming its disabilities. Getting the balance just right could be pretty tricky.


This is a different method of capability control. It involves building into any AI development project a set of “tripwires” which, if crossed, will lead to the project being shut down and destroyed. Bostrom identifies three types of tripwire:

  • Behavioural Tripwires: A network of detectors would be placed around a “boxed” AI in order to detect any attempt to breach the containment, e.g. any attempt to make radio communication.
  • Ability Tripwires: The Superintelligence would be regularly tested for its capabilities. If at any point new abilities are detected or if it gets close to some capability threshold, it would be shut down.
  • Content Tripwires: The Superintelligence’s internal processing could be constantly monitored for any suspicious changes, e.g. changes in how it represents its final values.

Bostrom thinks that tripwires could be useful, particularly during the development phase if used in conjunction with other methods. But, unsurprisingly, he also thinks that they too have shortcomings. An advanced Superintelligence could subvert a tripwire in an unexpected way. He also notes that project developers working on Superintelligence could grow impatient if tripwires repeatedly hamper their progress. They might undermine any safety advantage gained by the tripwire system.

Reprinted with permission from the author.

Tony Czarnecki is an economist and a futurist – a member of the Chatham House, London, deeply engaged in global politics and the reform of democracy, with wide range of interests such as politics, technology, science and culture. He is also an active member of London Futurists. This gives him the necessary insight into exploring complex subjects discussed in the three books, of the POSTHUMANS series. He is the Managing Partner of Sustensis, London – a Think Tank for inspirations for Humanity’s transition to coexistence with Superintelligence.

Transition to a more sustainable and harmonious world – Tony Czarnecki

From Artificial Intelligence to Superintelligence: Nick Bostrom on AI & The Future of Humanity

Joe Rogan | How Long Until We Have Real Artificial Intelligence w/Nick Bostrom

Elon Musk’s Message on Artificial Superintelligence – ASI

Superintelligence: Science or Fiction? | Elon Musk & Other Great Minds

Be sure to ‘like’ us on Facebook


Please enter your comment!
Please enter your name here