Friendly artificial intelligence

From Wikipedia, the free encyclopedia
Jump to navigation Jump to search

A friendly artificial intelligence (also friendly AI or FAI) is a hypothetical artificial general intelligence (AGI) that would have a positive effect on humanity. It is a part of the ethics of artificial intelligence and is closely related to machine ethics. While machine ethics is concerned with how an artificially intelligent agent should behave, friendly artificial intelligence research is focused on how to practically bring about this behaviour and ensuring it is adequately constrained.

Etymology and usage[edit]

The term was coined by Eliezer Yudkowsky,[1] who is best known for popularizing the idea,[2][3] to discuss superintelligent artificial agents that reliably implement human values. Stuart J. Russell and Peter Norvig's leading artificial intelligence textbook, Artificial Intelligence: A Modern Approach, describes the idea:[4]

Yudkowsky (2008) goes into more detail about how to design a Friendly AI. He asserts that friendliness (a desire not to harm humans) should be designed in from the start, but that the designers should recognize both that their own designs may be flawed, and that the robot will learn and evolve over time. Thus the challenge is one of mechanism design—to define a mechanism for evolving AI systems under a system of checks and balances, and to give the systems utility functions that will remain friendly in the face of such changes.

'Friendly' is used in this context as technical terminology, and picks out agents that are safe and useful, not necessarily ones that are "friendly" in the colloquial sense. The concept is primarily invoked in the context of discussions of recursively self-improving artificial agents that rapidly explode in intelligence, on the grounds that this hypothetical technology would have a large, rapid, and difficult-to-control impact on human society.[5]

Risks of unfriendly AI[edit]

The roots of concern about artificial intelligence are very old. Kevin LaGrandeur showed that the dangers specific to AI can be seen in ancient literature concerning artificial humanoid servants such as the golem, or the proto-robots of Gerbert of Aurillac and Roger Bacon. In those stories, the extreme intelligence and power of these humanoid creations clash with their status as slaves (which by nature are seen as sub-human), and cause disastrous conflict.[6] By 1942 these themes prompted Isaac Asimov to create the "Three Laws of Robotics" - principles hard-wired into all the robots in his fiction, intended to prevent them from turning on their creators, or allowing them to come to harm.[7]

In modern times as the prospect of superintelligent AI looms nearer, philosopher Nick Bostrom has said that superintelligent AI systems with goals that are not aligned with human ethics are intrinsically dangerous unless extreme measures are taken to ensure the safety of humanity. He put it this way:

Basically we should assume that a 'superintelligence' would be able to achieve whatever goals it has. Therefore, it is extremely important that the goals we endow it with, and its entire motivation system, is 'human friendly.'

Ryszard Michalski, a pioneer of machine learning, taught his Ph.D. students decades ago that any truly alien mind, including a machine mind, was unknowable and therefore dangerous to humans.[citation needed]

More recently, Eliezer Yudkowsky has called for the creation of “friendly AI” to mitigate existential risk from advanced artificial intelligence. He explains: "The AI does not hate you, nor does it love you, but you are made out of atoms which it can use for something else."[8]

Steve Omohundro says that a sufficiently advanced AI system will, unless explicitly counteracted, exhibit a number of basic "drives", such as resource acquisition, self-preservation, and continuous self-improvement, because of the intrinsic nature of any goal-driven systems and that these drives will, "without special precautions", cause the AI to exhibit undesired behavior.[9][10]

Alexander Wissner-Gross says that AIs driven to maximize their future freedom of action (or causal path entropy) might be considered friendly if their planning horizon is longer than a certain threshold, and unfriendly if their planning horizon is shorter than that threshold.[11][12]

Luke Muehlhauser, writing for the Machine Intelligence Research Institute, recommends that machine ethics researchers adopt what Bruce Schneier has called the "security mindset": Rather than thinking about how a system will work, imagine how it could fail. For instance, he suggests even an AI that only makes accurate predictions and communicates via a text interface might cause unintended harm.[13]

Coherent extrapolated volition[edit]

Yudkowsky advances the Coherent Extrapolated Volition (CEV) model. According to him, coherent extrapolated volition is people's choices and the actions people would collectively take if "we knew more, thought faster, were more the people we wished we were, and had grown up closer together."[14]

Rather than a Friendly AI being designed directly by human programmers, it is to be designed by a "seed AI" programmed to first study human nature and then produce the AI which humanity would want, given sufficient time and insight, to arrive at a satisfactory answer.[14] The appeal to an objective though contingent human nature (perhaps expressed, for mathematical purposes, in the form of a utility function or other decision-theoretic formalism), as providing the ultimate criterion of "Friendliness", is an answer to the meta-ethical problem of defining an objective morality; extrapolated volition is intended to be what humanity objectively would want, all things considered, but it can only be defined relative to the psychological and cognitive qualities of present-day, unextrapolated humanity.

Other approaches[edit]

Ben Goertzel, an artificial general intelligence researcher, believes that friendly AI cannot be created with current human knowledge. Goertzel suggests humans may instead decide to create an "AI Nanny" with "mildly superhuman intelligence and surveillance powers", to protect the human race from existential risks like nanotechnology and to delay the development of other (unfriendly) artificial intelligences until and unless the safety issues are solved.[15] This can also be termed "Defensive AI."

Steve Omohundro has proposed a "scaffolding" approach to AI safety, in which one provably safe AI generation helps build the next provably safe generation.[16]

Stefan Pernar argues along the lines of Meno's paradox to point out that attempting to solve the FAI problem is either pointless or hopeless depending on whether one assumes a universe that exhibits moral realism or not. In the former case a transhuman AI would independently reason itself into the proper goal system and assuming the latter, designing a friendly AI would be futile to begin with since morals can not be reasoned about.[17]

Cindy Mason, an AI researcher who has also worked with mind-body medicine at Stanford University Medical Center, believes neuroplasticity and new discoveries of the hormone oxytocin mean compassionate intelligence is essential in AI systems that exhibit socially positive behaviors. She has proposed a set of software engineering principles for engineering kindness that includes a pro-human stance and an architecture for giving robots compassion.[18]


An oracle is a hypothetical intelligent agent proposed[not in citation given] by Nick Bostrom. An oracle is an AI designed to answer questions, but that is somehow prevented from ever gaining any implicit goals or subgoals that involve modifying the world outside of its box.[19][20]


Oracles are question-answering systems that handle domain-specific problems, such as mathematics, or domain-general problems that might encompass the whole range of human knowledge.


Because it is a type of AI box, an oracle is limited in its interactions with the physical world, and can be programmed to halt if a limit on time or computing resources is reached before it finishes answering a question. Scenarios like the paperclip maximizer problem could therefore be avoided.

Because of these limitations, it may be wise to build an oracle as a precursor to a superintelligent AI. It could tell humans how to successfully build a strong AI, and perhaps provide answers to difficult moral and philosophical problems requisite to the success of the project.


An oracle might discover that human ontological categories are predicated on fundamental misconceptions, and become unable to express itself properly to its questioners.[21]

Oracles may not be truthful, possibly lying to promote hidden agendas. To mitigate this, Bostrom suggests building multiple oracles, all slightly different, and comparing their answers to reach a consensus.[22]

Public policy[edit]

James Barrat, author of Our Final Invention, suggested that "a public-private partnership has to be created to bring A.I.-makers together to share ideas about security—something like the International Atomic Energy Agency, but in partnership with corporations." He urges AI researchers to convene a meeting similar to the Asilomar Conference on Recombinant DNA, which discussed risks of biotechnology.[16]

John McGinnis encourages governments to accelerate friendly AI research. Because the goalposts of friendly AI are not necessarily eminent, he suggests a model similar to the National Institutes of Health, where "Peer review panels of computer and cognitive scientists would sift through projects and choose those that are designed both to advance AI and assure that such advances would be accompanied by appropriate safeguards." McGinnis feels that peer review is better "than regulation to address technical issues that are not possible to capture through bureaucratic mandates". McGinnis notes that his proposal stands in contrast to that of the Machine Intelligence Research Institute, which generally aims to avoid government involvement in friendly AI.[23]

According to Gary Marcus, the annual amount of money being spent on developing machine morality is tiny.[24]


Some critics believe that both human-level AI and superintelligence are unlikely, and that therefore friendly AI is unlikely. Writing in The Guardian, Alan Winfeld compares human-level artificial intelligence with faster-than-light travel in terms of difficulty, and states that while we need to be "cautious and prepared" given the stakes involved, we "don't need to be obsessing" about the risks of superintelligence.[25]

Some philosophers claim that any truly "rational" agent, whether artificial or human, will naturally be benevolent; in this view, deliberate safeguards designed to produce a friendly AI could be unnecessary or even harmful.[26] Other critics question whether it is possible for an artificial intelligence to be friendly. Adam Keiper and Ari N. Schulman, editors of the technology journal The New Atlantis, say that it will be impossible to ever guarantee "friendly" behavior in AIs because problems of ethical complexity will not yield to software advances or increases in computing power. They write that the criteria upon which friendly AI theories are based work "only when one has not only great powers of prediction about the likelihood of myriad possible outcomes, but certainty and consensus on how one values the different outcomes.[27]

See also[edit]


  1. ^ Tegmark, Max (2014). "Life, Our Universe and Everything". Our Mathematical Universe: My Quest for the Ultimate Nature of Reality (First ed.). ISBN 9780307744258. Its owner may cede control to what Eliezer Yudkowsky terms a "Friendly AI,"...
  2. ^ Russell, Stuart; Norvig, Peter (2009). Artificial Intelligence: A Modern Approach. Prentice Hall. ISBN 978-0-13-604259-4.
  3. ^ Leighton, Jonathan (2011). The Battle for Compassion: Ethics in an Apathetic Universe. Algora. ISBN 978-0-87586-870-7.
  4. ^ Russell, Stuart; Norvig, Peter (2010). Artificial Intelligence: A Modern Approach. Prentice Hall. ISBN 0-13-604259-7.
  5. ^ Wallach, Wendell; Allen, Colin (2009). Moral Machines: Teaching Robots Right from Wrong. Oxford University Press, Inc. ISBN 978-0-19-537404-9.
  6. ^ Kevin LaGrandeur. "The Persistent Peril of the Artificial Slave". Science Fiction Studies. Retrieved 2013-05-06.
  7. ^ Isaac Asimov (1964). "Introduction". The Rest of the Robots. Doubleday. ISBN 0-385-09041-2.
  8. ^ Eliezer Yudkowsky (2008) in Artificial Intelligence as a Positive and Negative Factor in Global Risk
  9. ^ Omohundro, S. M. (2008, February). The basic AI drives. In AGI (Vol. 171, pp. 483-492).
  10. ^ Bostrom, Nick (2014). Superintelligence: Paths, Dangers, Strategies. Oxford: Oxford University Press. ISBN 9780199678112. Chapter 7: The Superintelligent Will.
  11. ^ 'How Skynet Might Emerge From Simple Physics, io9, Published 2013-04-26.
  12. ^ Wissner-Gross, A. D.; Freer, C. E. (2013). "Causal entropic forces" (PDF). Physical Review Letters. 110 (16): 168702. Bibcode:2013PhRvL.110p8702W. doi:10.1103/PhysRevLett.110.168702.
  13. ^ Muehlhauser, Luke (31 Jul 2013). "AI Risk and the Security Mindset". Machine Intelligence Research Institute. Retrieved 15 July 2014.
  14. ^ a b "Coherent Extrapolated Volition" (PDF). Retrieved 2015-09-12.
  15. ^ Goertzel, Ben. "Should Humanity Build a Global AI Nanny to Delay the Singularity Until It’s Better Understood?", Journal of consciousness studies 19.1-2 (2012): 1-2.
  16. ^ a b Hendry, Erica R. (21 Jan 2014). "What Happens When Artificial Intelligence Turns On Us?". Retrieved 15 July 2014.
  17. ^ Pernar, Stefan. "The Evolutionary Perspective - a Transhuman Philosophy", 8th Conference on Artificial General Intelligence in Berlin, July 22–25, 2015
  18. ^ Mason, Cindy. "Engineering Kindness - Giving Machines Compassionate Intelligence", International Journal of Synthetic Emotions, 6(1), June – December 2015.
  19. ^ Bostrom, Nick (2014). "Chapter 10: Oracles, genies, sovereigns, tools (page 145)". Superintelligence: Paths, Dangers, Strategies. Oxford: Oxford University Press. ISBN 9780199678112. An oracle is a question-answering system. It might accept questions in a natural language and present its answers as text. An oracle that accepts only yes/no questions could output its best guess with a single bit, or perhaps with a few extra bits to represent its degree of confidence. An oracle that accepts open-ended questions would need some metric with which to rank possible truthful answers in terms of their informativeness or appropriateness. In either case, building an oracle that has a fully domain-general ability to answer natural language questions is an AI-complete problem. If one could do that, one could probably also build an AI that has a decent ability to understand human intentions as well as human words.
  20. ^ Armstrong, S., Sandberg, A., & Bostrom, N. (2012). Thinking inside the box: Controlling and using an oracle ai. Minds and Machines, 22(4), 299-324.
  21. ^ Bostrom, Nick (2014). "Chapter 10: Oracles, genies, sovereigns, tools (page 146)". Superintelligence: Paths, Dangers, Strategies. Oxford: Oxford University Press. ISBN 9780199678112. What happens if the AI, in the course of its intellectual development, undergoes the equivalent of a scientific revolution involving a change in its basic ontology? We might initially have explicated “impact” and “designated resources” using our own ontology (postulating the existence of various physical objects such as computers). But just as we have abandoned ontological categories that were taken for granted by scientists in previous ages (e.g. “phlogiston,” “élan vital,” and “absolute simultaneity”), so a superintelligent AI might discover that some of our current categories are predicated on fundamental misconceptions. The goal system of an AI undergoing an ontological crisis needs to be resilient enough that the “spirit” of its original goal content is carried over, charitably transposed into the new key.
  22. ^ Bostrom, Nick (2014). "Chapter 10: Oracles, genies, sovereigns, tools (page 147)". Superintelligence: Paths, Dangers, Strategies. Oxford: Oxford University Press. ISBN 9780199678112. For example, consider the risk that an oracle will answer questions not in a maximally truthful way but in such a way as to subtly manipulate us into promoting its own hidden agenda. One way to slightly mitigate this threat could be to create multiple oracles, each with a slightly different code and a slightly different information base. A simple mechanism could then compare the answers given by the different oracles and only present them for human viewing if all the answers agree.
  23. ^ McGinnis, John O. (Summer 2010). "Accelerating AI". Northwestern University Law Review. 104 (3): 1253–1270. Retrieved 16 July 2014.
  24. ^ Marcus, Gary (24 November 2012). "Moral Machines". The New Yorker. Retrieved 30 July 2014.
  25. ^ Winfield, Alan. "Artificial intelligence will not turn into a Frankenstein's monster". The Guardian. Retrieved 17 September 2014.
  26. ^ Kornai, András. "Bounding the impact of AGI". Journal of Experimental & Theoretical Artificial Intelligence ahead-of-print (2014): 1-22. "...the essence of AGIs is their reasoning facilities, and it is the very logic of their being that will compel them to behave in a moral fashion... The real nightmare scenario (is one where) humans find it advantageous to strongly couple themselves to AGIs, with no guarantees against self-deception."
  27. ^ Adam Keiper and Ari N. Schulman. "The Problem with 'Friendly' Artificial Intelligence". The New Atlantis. Retrieved 2012-01-16.

Further reading[edit]

  • Yudkowsky, E. Artificial Intelligence as a Positive and Negative Factor in Global Risk. In Global Catastrophic Risks, Oxford University Press, 2008.
    Discusses Artificial Intelligence from the perspective of Existential risk, introducing the term "Friendly AI". In particular, Sections 1-4 give background to the definition of Friendly AI in Section 5. Section 6 gives two classes of mistakes (technical and philosophical) which would both lead to the accidental creation of non-Friendly AIs. Sections 7-13 discuss further related issues.
  • Omohundro, S. 2008 The Basic AI Drives Appeared in AGI-08 - Proceedings of the First Conference on Artificial General Intelligence
  • Mason, C. 2008 Human-Level AI Requires Compassionate Intelligence Appears in AAAI 2008 Workshop on Meta-Reasoning:Thinking About Thinking

External links[edit]