Wednesday, 11 January 2017

Discussion: Superintelligence by Nick Bostrom

This post is framed as a discussion rather than a review because frankly, I don't feel qualified to review this book. It's a very academic book that was honestly barely within my reading capacity, so I don't think I can say whether it was good or bad because it was so far above everything I'd read previously on the topic.

Suffice to say that it has completely changed my attitudes to Artificial Intelligence, that it is a very comprehensive book and that I (along with Bill Gates, Elon Musk, Nils Nilsson, Martin Rees and other luminaries) recommend it for any intelligent person -- as long as you're willing to work at it, because this is not a light read. 

(I mean it -- the book is 260 pages of sentences like: "Anthropics, the study of how to make inferences from indexical information in the presence of observational selection effects, is another area where the choice of epistemic axioms could prove pivotal." It actually doesn't require any prior knowledge of computers or philosophy, the language is just consistently highbrow. It's certainly interesting, but it's interesting in the same way physics is interesting -- you have to work for it.)

In short, Superintelligence offers an aerial view of AI, starting from how we could get to superintelligence, taking us through its possible dangers and then elucidating some possible methods of avoiding having our entire universe turned to paperclips, our own bodies included.

Two posts on are a large part of what got me into this topic, and if you don't want to get a more thorough understanding by slogging through the book they offer a much more enjoyable and easy (but still worthwhile) view of the topic. Part 1, The Artificial Intelligence Revolution; Part 2, Our Immortality or Extinction

So! Time for a brief discussion of the points I found most interesting.

Chapter 2: Paths to Superintelligence

Bostrom lays out the paths to five different kinds of superintelligence: AI (entirely software based), whole brain emulation (in which human brains are mapped and transferred to digital substrates so they're still themselves but can think way faster and are less vulnerable to harm, etc., biological cognition (humans still in human bodies but improved by better nutrition, education, gene editing and selective breeding), brain-computer interfaces (essentially what we have now with Google but internal), and networks (collective superintelligence). 

I found this interesting in that I hadn't really thought of collective superintelligence as a thing before. His discussion of the merits of each was interesting (e.g. biological cognition more familiar and less dangerous but far slower if it's using the selective breeding path, AI completely unfamiliar but we have more control over its design). 

Chapter 3: Forms of Superintelligence

In this chapter, Bostrom elaborates on three types of superintelligence: quality, speed and collective.

Quality superintelligence is what I always thought of as superintelligence -- a mind that just better, the mind of a genius, that can make leaps no one else can no matter how hard they try, that can understand more. We have quality superintelligence compared to an ant; no matter how long you gave an ant to understand algebra, it wouldn't. We just think on a different plane.

Speed superintelligence exists, I would think, in computers now, which can perform many orders of magnitude more calculations per second than humans can. It means that a problem that might take a team of human workers 10 years to do could be done by a computer in 10 seconds, as long as it didn't require intelligence of a quality the computer doesn't possess. This is why we use calculators and supercomputers to do our maths.

Collective superintelligence is quite interesting -- it's the cumulative intelligence of a community if all worked together in perfect harmony and intelligences could be linearly added. This seems unlikely in humans but could definitely work in a network of computers.

Most interesting was Bostrom's discussion of where each of these superintelligences would come in useful. Collective superintelligence is most useful when a project can be broken down into many small, independent parts that can be done in parallel. Speed superintelligence is useful for a project that can be broken down into parts that can be done in series. And quality superintelligence is useful when you need leaps of logic or intuition or genius that nothing else can manage.

Chapter 6: Cognitive Superpowers

This chapter broke down a superintelligence's potential cognitive superpowers into six categories: (a) intelligence amplification (recursively improving its own intelligence) (b) strategizing to achieve distance goals and overcome an intelligent opposition, e.g. humans (c) social manipulation (like convincing the researchers to let it connect to the internet) (d) hacking (e) technology research, for space colonisation, military force, nanotech assembly to act as its arms... (f) economic productivity, so it could earn money to buy e.g. hardware, influence if it didn't want to take them by force.

The chapter explains that a superintelligence with an intelligence amplification superpower could get all the other superpowers, and that in general if an AI has one of the six superpowers the others will soon follow. Also, something interesting that appears throughout the book is the idea of an AI-complete problem, saying that if a certain kind of problem with AI has been mastered then this will only have come after all of AI is mastered (e.g natural language recognition and understanding). 

Chapter 8: Is the default outcome doom?

This chapter was very interesting and scary. It lays out a case for why we can never be too careful with AI -- not only are we constantly thinking of more ways an AI could turn malicious against our wishes, a superintelligent AI would by definition be capable of thinking in more ways than us. It could complete a treacherous turn, seeming docile and friendly while "boxed" but when let out turning harmful.

A discussion of malignant failure modes followed:

1. Perverse instantiation: we tell the AI to do something, and it follow our command according to its interpretation rather than ours, e.g. we set its final goal as maximising human happiness, and it puts all of our brains in vats with electrodes stimulating the pleasure pathways in our brains.

2. Infrastructure profusion: we are not precise enough with the AI, and it destroys the universe innocently trying to reach some other goal, e.g. we tell it to come up with a mathematical proof and it realises there's some probability that it did it wrong and so it spends eternity checking over its answer again and again and turns the entire universe, including our bodies, into hardware to run more calculations on so it can keep checking its work, killing us all. Lethal perfectionism, if you will. Bostrom laid out lots of ways this infrastructure profusion could happen, and it's pretty scary how even with an innocent goal like increasing the number of paperclips in the world the superintelligence could kill us all for resources to reach that goal, by concerting "first the Earth and then increasing portions of the observable universe into paperclips". It really hammered in the point that a superintelligence could be highly rational and capable, but neither of those things requires that it have common sense

3. Mind crime: do whole brain emulations count as people? If an evolutionary selection algorithm is employed to come up with an intelligent machine and all the poor performers are killed for being intelligent but not intelligent enough, is that murder? To study human psychology, an AI might create trillions of conscious simulations of human brains and experiment on causing them pain and pleasure and kill them afterwards, much like today's scientists do with lab rats. 

Chapter 9: The Control Problem

This chapter discusses various ways we might be able to control an AI's capabilities and motivations.

Capability could be controlled via:
(a) boxing - system is blocked off from the external world e.g. no internet and in cage
(b) incentive - system incentivised by reward tokens or social integration with other superintelligences
(c) stunting - system is built with key handicaps so that it can't get too intelligent
(d) tripwires - diagnostic tests performed and system changed or shut down if signs of too much power or some dangerous intention are seen

Motivation (what the machine wants to do) could be controlled via:
(a) Direct specification - 
(b) Domesticity - the system will only want to have a certain probability of being correct, or it will only want to control a small number of things, so that it doesn't convert the universe into paperclips
(c) Indirect normativity - essentially a way to push off specifying the motivation system
(d) Augmentation - start with a humanlike system and make it more intelligent

Chapter 10: Oracles, Genies, Sovereigns, Tools

This was really interesting. An oracle answers questions, a genie does what you tell it to for some specific defined goal, a sovereign does what it wants in the service of some broader goal you've set (like "cure cancer") and a tool is like today's software, like a flight control assistant. 

There are control issues with all of these -- an oracle seems like the safest since it just spits out an answer, but what if it converted the universe to servers to be sure it had the right answer? (This is something you could tackle with domesticity motivation, but it's never really safe.) A sovereign seems most obviously dangerous, but the line between genie and sovereign is blurry. And the only reason today's tools aren't dangerous is that they aren't capable of provoking an existential threat - they mess up plenty, it's just usually not very consequential. 

Chapter 11: Multipolar Scenarios

This chapter was simultaneously illuminating and so dark (ha). It talked about life for humans and human brain emulations in an AI world, comparing the fall in demand for human labour with the fall in demand for horses between 1900 and 1950. It talked about the Malthusian principle, in which a population grows until all members are eking out miserable subsistence lives on the currently available resources, then the pressure is released by mass death. That's not even the dark part; here, quoted, is the dark part. 

"Life for biological humans in a post-transition [AI transition] Malthusian state need not resemble any of the historical states of man (as hunter-gatherer, farmer or office worker). Instead, the majority of humans in this scenario might be idle rentiers who eke out a marginal living on their savings. They would be very poor, yet derive what little income they have from savings or state subsidies. They would live in a world with extremely advanced technology, including not only superintelligent machines but also anti-aging medicine, virtual reality, and various enhancement technologies and pleasure drugs, yet these might be generally unaffordable. Perhaps instead of using enhancement medicine, they would take drugs to stunt their growth and slow their metabolism in order to reduce their cost of living (fast-burners being unable to survive at the gradually declining subsistence income). As our numbers increase and our average income declines further, we might degenerate into whatever minimal structure still qualifies to receive a pension --perhaps minimally conscious brains in vats, oxygenized and nourished by machines, slowly saving up enough money to reproduce by having a robot technician develop a clone of them."

Now, is this speculative? Absolutely. But it's plausible if we're not careful with AI, and even if we are. 

I'm not even going to transcribe the next paragraph, which is headed "Voluntary slavery, casual death". 

Chapter 12: Acquiring values

This chapter discusses eight ways of loading values into a system before it becomes superintelligent and is out of our control: explicit representation (implausible because humanity can't even describe our full values in words, never mind in code), evolutionary selection (bad because leads to mind crime, plus if systems were evaluated by running them bad systems could escape, plus the evolutionary selection might produce something that fits our formal parameter of success but not what we meant), reinforcement learning (inadequate because the system's final goal would be a reward, which when superintelligent it would just get via wireheading, i.e shortcircuiting and stimulating its own reward center), motivational scaffolding (come up with some high-level motivation now that makes the AI want to come up with a better one that's still congruent with human values when it's more intelligent), value learning (like a child), emulation modulation (influence emulations using digital drugs), and institution design (social control). 

Chapter 13: Choosing the criteria for choosing

This was interesting. So, not only do we not know how to install our values into an AI, we don't even know what our values are. So we need indirect normativity - letting the AI decide, but somehow arranging it so that it decides something in our interests. 

Eliezer Yudkowsky, AI researcher and author of my beloved Harry Potter and the Methods of Rationality, proposed Coherent Extrapolating Volition (CEV), phrased poetically as follows:

Our coherent extrapolated volition is our wish if we knew more, thought faster, were more the people we wished we were, had grown up farther together, where the extrapolation converges rather than diverges, where our wishes cohere rather than interfere, extrapolated as we wish that extrapolated, interpreted as we wish that interpreted.

Obviously, this is not something translatable into code. But it's a nice summary of the idea. The chapter also discussed morality models like moral realism (do the most moral thing -- but what if moral realism doesn't exist?) and moral permissibility (don't do something morally impermissible; out of the morally permissible things, do whatever most aligns with our CEV). 

A Tidbit

I can't find the chapter this was from, but something that really wowed me was the ingenuity of even today's "dumb" computers. Using evolutionary algorithms has come up with some pretty incredible solutions and shown the extent to which software thinks outside the boxes we've set. 

"A search process, tasked with creating an oscillator, was deprived of a seemingly even more indispensible component, the capacitor. When the algorithm presented its successful solution, the researchers examined it and at first concluded that it “should not work.” Upon more careful examination, they discovered that the algorithm had, MacGyver-like, reconfigured its sensor-less motherboard into a makeshift radio receiver, using the printed circuit board tracks as an aerial to pick up signals generated by personal computers that happened to be situated nearby in the laboratory. The circuit amplified this signal to produce the desired oscillating output.
In other experiments, evolutionary algorithms designed circuits that sensed whether the motherboard was being monitored with an oscilloscope or whether a soldering iron was connected to the lab’s common power supply."

Crazy, right?! Anyway, that's the end of the discussion. It's definitely a mind-stretching book and a lot to take in, but I think it's pretty cool. Before I read it, I just thought AI was awesome and exciting -- now I see that it really need more caution and am a lot less eager to rush it. 

As always, if you have any thoughts on the book or the post, shove 'em down in the comments below! 

No comments:

Post a Comment