Instrumental convergence

From Wikipedia, the free encyclopedia
Jump to navigation Jump to search

Instrumental convergence is the hypothetical tendency for most sufficiently intelligent agents to pursue potentially unbounded instrumental goals such as self-preservation and resource acquisition, provided that their ultimate goals are themselves unbounded.

Instrumental convergence suggests that an intelligent agent with unbounded but apparently harmless goals can act in surprisingly harmful ways. For example, a computer with the sole, unconstrained goal of solving the Riemann hypothesis could attempt to turn the entire Earth into computronium in an effort to increase its computing power so that it can succeed in its calculations.[1]

Proposed basic AI drives include utility function or goal-content integrity, self-protection, freedom from interference, self-improvement, and non-satiable acquisition of additional resources.

Instrumental and final goals[edit]

Final goals, or final values, are intrinsically valuable to an intelligent agent, whether an artificial intelligence or a human being, as an end in itself. In contrast, instrumental goals, or instrumental values, are only valuable to an agent as a means toward accomplishing its final goals. The contents and tradeoffs of a completely rational agent's "final goal" system can in principle be formalized into a utility function.

Hypothetical examples of convergence[edit]

One hypothetical example of instrumental convergence is provided by the Riemann Hypothesis catastrophe. Marvin Minsky, the co-founder of MIT's AI laboratory, has suggested that an artificial intelligence designed to solve the Riemann hypothesis might decide to take over all of Earth's resources to build supercomputers to help achieve its goal.[1] If the computer had instead been programmed to produce as many paper clips as possible, it would still decide to take all of Earth's resources to meet its final goal.[2] Even though these two final goals are different, both of them produce a convergent instrumental goal of taking over Earth's resources.[3]

Paperclip maximizer[edit]

The paperclip maximizer is a thought experiment described by Swedish philosopher Nick Bostrom in 2003. It illustrates the existential risk that an artificial general intelligence may pose to human beings when programmed to pursue even seemingly-harmless goals, and the necessity of incorporating machine ethics into artificial intelligence design. The scenario describes an advanced artificial intelligence tasked with manufacturing paperclips. If such a machine were not programmed to value human life, or to use only designated resources in bounded time, then given enough power its optimized goal would be to turn all matter in the universe, including human beings, into either paperclips or machines which manufacture paperclips.[4]

Suppose we have an AI whose only goal is to make as many paper clips as possible. The AI will realize quickly that it would be much better if there were no humans because humans might decide to switch it off. Because if humans do so, there would be fewer paper clips. Also, human bodies contain a lot of atoms that could be made into paper clips. The future that the AI would be trying to gear towards would be one in which there were a lot of paper clips but no humans.

— Nick Bostrom, as quoted in "Artificial Intelligence May Doom The Human Race Within A Century, Oxford Professor Says".[5]

Bostrom has emphasised that he does not believe the paperclip maximiser scenario per se will actually occur; rather, his intention is to illustrate the dangers of creating superintelligent machines without knowing how to safely program them to eliminate existential risk to human beings.[6] The paperclip maximizer example illustrates the broad problem of managing powerful systems that lack human values.[7]

Basic AI drives[edit]

Steve Omohundro has itemized several convergent instrumental goals, including self-preservation or self-protection, utility function or goal-content integrity, self-improvement, and resource acquisition. He refers to these as the "basic AI drives". A "drive" here denotes a "tendency which will be present unless specifically counteracted";[8] this is different from the psychological term "drive", denoting an excitatory state produced by a homeostatic disturbance.[9] A tendency for a person to fill out income tax forms every year is a "drive" in Omohundro's sense, but not in the psychological sense.[10] Daniel Dewey of the Machine Intelligence Research Institute argues that even an initially introverted self-rewarding AGI may continue to acquire free energy, space, time, and freedom from interference to ensure that it will not be stopped from self-rewarding.[11]

Goal-content integrity[edit]

In humans, maintenance of final goals can be explained with a thought experiment. Suppose a man named "Gandhi" has a pill that, if he took it, would cause him to want to kill people. This Gandhi is currently a pacifist: one of his explicit final goals is to never kill anyone. Gandhi is likely to refuse to take the pill, because Gandhi knows that if in the future he wants to kill people, he is likely to actually kill people, and thus the goal of "not killing people" would not be satisfied.[12]

However, in other cases, people seem happy to let their final values drift. Humans are complicated, and their goals can be inconsistent or unknown, even to themselves.[13]

In artificial intelligence[edit]

In 2009, Jürgen Schmidhuber concluded, in a setting where agents search for proofs about possible self-modifications, "that any rewrites of the utility function can happen only if the Gödel machine first can prove that the rewrite is useful according to the present utility function."[14][15] An analysis by Bill Hibbard of a different scenario is similarly consistent with maintenance of goal content integrity.[15] Hibbard also argues that in a utility maximizing framework the only goal is maximizing expected utility, so that instrumental goals should be called unintended instrumental actions.[16]

Resource acquisition[edit]

Many instrumental goals, such as [...] resource acquisition, are valuable to an agent because they increase its freedom of action.[17][full citation needed]

For almost any open-ended, non-trivial reward function (or set of goals), possessing more resources (such as equipment, raw materials, or energy) can enable the AI to find a more "optimal" solution. Resources can benefit some AIs directly, through being able to create more of whatever stuff its reward function values: "The AI neither hates you, nor loves you, but you are made out of atoms that it can use for something else."[18][19] In addition, almost all AIs can benefit from having more resources to spend on other instrumental goals, such as self-preservation.[19]

Cognitive enhancement[edit]

"If the agent's final goals are fairly unbounded and the agent is in a position to become the first superintelligence and thereby obtain a decisive strategic advantage, [...] according to its preferences. At least in this special case, a rational intelligent agent would place a very *high instrumental value on cognitive enhancement*" [20][page needed]

Technological perfection[edit]

Many instrumental goals, such as [...] technological advancement, are valuable to an agent because they increase its freedom of action.[17][full citation needed]


Many instrumental goals, such as [...] self-preservation, are valuable to an agent because they increase its freedom of action.[17][full citation needed]

Instrumental convergence thesis[edit]

The instrumental convergence thesis, as outlined by philosopher Nick Bostrom, states:

Several instrumental values can be identified which are convergent in the sense that their attainment would increase the chances of the agent's goal being realized for a wide range of final goals and a wide range of situations, implying that these instrumental values are likely to be pursued by a broad spectrum of situated intelligent agents.

The instrumental convergence thesis applies only to instrumental goals; intelligent agents may have a wide variety of possible final goals.[3] Note that by Bostrom's Orthogonality Thesis,[3] final goals of highly intelligent agents may be well-bounded in space, time, and resources; well-bounded ultimate goals do not, in general, engender unbounded instrumental goals.[21]


Agents can acquire resources by trade or by conquest. A rational agent will, by definition, choose whatever option will maximize its implicit utility function; therefore a rational agent will trade for a subset of another agent's resources only if outright seizing the resources is too risky or costly (compared with the gains from taking all the resources), or if some other element in its utility function bars it from the seizure. In the case of a powerful, self-interested, rational superintelligence interacting with a lesser intelligence, peaceful trade (rather than unilateral seizure) seems unnecessary and suboptimal, and therefore unlikely.[17][full citation needed]

Some observers, such as Skype's Jaan Tallinn and physicist Max Tegmark, believe that "basic AI drives", and other unintended consequences of superintelligent AI programmed by well-meaning programmers, could pose a significant threat to human survival, especially if an "intelligence explosion" abruptly occurs due to recursive self-improvement. Since nobody knows how to predict beforehand when superintelligence will arrive, such observers call for research into friendly artificial intelligence as a possible way to mitigate existential risk from artificial general intelligence.[22]

See also[edit]



  1. ^ a b Russell, Stuart J.; Norvig, Peter (2003). "Section 26.3: The Ethics and Risks of Developing Artificial Intelligence". Artificial Intelligence: A Modern Approach. Upper Saddle River, N.J.: Prentice Hall. ISBN 0137903952. Similarly, Marvin Minsky once suggested that an AI program designed to solve the Riemann Hypothesis might end up taking over all the resources of Earth to build more powerful supercomputers to help achieve its goal.
  2. ^ Bostrom 2014, Chapter 8, p. 123. "An AI, designed to manage production in a factory, is given the final goal of maximizing the manufacturing of paperclips, and proceeds by converting first the Earth and then increasingly large chunks of the observable universe into paperclips."
  3. ^ a b c Bostrom 2014, chapter 7.
  4. ^ Bostrom, Nick (2003). "Ethical Issues in Advanced Artificial Intelligence".
  5. ^ "Artificial Intelligence May Doom The Human Race Within A Century, Oxford Professor Says".
  6. ^ Ford, Paul (11 February 2015). "Are We Smart Enough to Control Artificial Intelligence?". MIT Technology Review. Retrieved 25 January 2016.
  7. ^ Friend, Tad (3 October 2016). "Sam Altman's Manifest Destiny". The New Yorker. Retrieved 25 November 2017.
  8. ^ Omohundro, S. M. (2008, February). The basic AI drives. In AGI (Vol. 171, pp. 483-492).
  9. ^ Seward, J. (1956). Drive, incentive, and reinforcement. Psychological Review, 63, 19-203.
  10. ^ Bostrom 2014, footnote 8 to chapter 7.
  11. ^ Dewey, Daniel. "Learning what to value." Artificial General Intelligence (2011): 309-314.
  12. ^ Yudkowsky, Eliezer. "Complex value systems in friendly AI." In Artificial general intelligence, pp. 388-393. Springer Berlin Heidelberg, 2011.
  13. ^ Bostrom 2014, chapter 7, p. 110. "We humans often seem happy to let our final values drift... For example, somebody deciding to have a child might predict that they will come to value the child for its own sake, even though at the time of the decision they may not particularly value their future child... Humans are complicated, and many factors might be in play in a situation like this... one might have a final value that involves having certain experiences and occupying a certain social role; and become a parent— and undergoing the attendant goal shift— might be a necessary aspect of that..."
  14. ^ Schmidhuber, J. R. (2009). "Ultimate Cognition à la Gödel". Cognitive Computation. 1 (2): 177. doi:10.1007/s12559-009-9014-y.
  15. ^ a b Hibbard, B. (2012). "Model-based Utility Functions". Journal of Artificial General Intelligence. 3: 1. doi:10.2478/v10229-011-0013-5.
  16. ^ Hibbard, Bill (2014): Ethical Artificial Intelligence.
  17. ^ a b c d Benson-Tilsen, T., & Soares, N. (2016, March). Formalizing Convergent Instrumental Goals. In AAAI Workshop: AI, Ethics, and Society.
  18. ^ Yudkowsky, Eliezer. "Artificial intelligence as a positive and negative factor in global risk." Global catastrophic risks (2008): 303. p. 333.
  19. ^ a b Murray Shanahan. The Technological Singularity. MIT Press, 2015. Chapter 7, Section 5: "Safe Superintelligence".
  20. ^ Bostrom, N. (2016). Superintelligence, Oxford University Press
  21. ^ Reframing Superintelligence: Comprehensive AI Services as General Intelligence, Technical Report, 2019, Future of Humanity Institute
  22. ^ "Is Artificial Intelligence a Threat?". The Chronicle of Higher Education. 11 September 2014. Retrieved 25 November 2017.