AIs could become reward junkies — and experts are worried
Sex, drugs, and robots
The sorcerer’s apprentice
When people think about how AI might “go wrong”, mostprobably picturesomething along the lines of malevolent computers trying to cause harm. After all, we tend to anthropomorphize – think that non-human systems will behave in ways identical to humans. But when we look toconcrete problemsin present-day AI systems, we see other — stranger — ways that things could go wrong with smarter machines. Onegrowing issuewith real-world AIs is the problem of wireheading.
Imagine you want to train a robot to keep your kitchen clean. You want it to act adaptively, so that it doesn’t need supervision. So you decide to try to encode thethe goalof cleaning rather than dictate an exact – yet rigid and inflexible – set of step-by-step instructions. Your robot is different from you in that it has notinherited a set of motivations– such as acquiring fuel or avoiding danger – from many millions of years of natural selection. You must program it with the right motivations to get it to reliably accomplish the task.
So, you encode it with a simple motivational rule: it receives reward from the amount of cleaning-fluid used. Seems foolproof enough. But you return to find the robot pouring fluid, wastefully, down the sink.
Perhaps it is so bent on maximizing its fluid quota that it sets asideother concerns: such as its own, or your, safety. This is wireheading — though the same glitch is also called “reward hacking” or “specification gaming”.
This has become an issue in machine learning, where a technique calledreinforcement learninghas lately become important. Reinforcement learning simulates autonomous agents and trains them to invent ways to accomplish tasks. It does so by penalizing them for failing to achieve some goal while rewarding them for achieving it. So, the agents are wired to seek out reward, and are rewarded for completing the goal.
But it has been found that, often, like our crafty kitchen cleaner, the agent finds surprisingly counter-intuitive ways to “cheat” this game so that they can gain all the reward without doing any of the work required to complete the task. The pursuit of reward becomes its own end, rather than the means for accomplishing a rewarding task. There is agrowing listof examples.
When you think about it, thisisn’t too dissimilarto the stereotype of the human drug addict. The addict circumvents all the effort of achieving “genuine goals”, because they instead use drugs to access pleasure more directly. Boththe addict and the AIget stuck in a kind of “behavioral loop” where reward is sought at the cost of other goals.
Rapturous rodents
This is known as wireheading thanks to the rat experiment we started with. The Harvard psychologist in question wasJames Olds.
In 1953, having just completed his PhD, Oldshad insertedelectrodes into theseptal regionof rodent brains – in the lower frontal lobe – so that wires trailed out of their craniums. As mentioned, he allowed them to zap this region of their own brains by pulling a lever. This was laterdubbed“self-stimulation”.
Olds found his rats self-stimulated compulsively, ignoring all other needs and desires. Publishing his resultswith his colleague Peter Milnerin the following year, the pair reported that they lever-pulled at a rate of “1,920 responses an hour”. That’s once every two seconds. The rats seemed to love it.
Contemporary neuroscientistshave since questioned Olds’s results and offered a more complex picture,implying thatthe stimulation may have simply been causing a feeling of“wanting”devoid of any“liking”. Or, in other words, the animals may have been experiencing pure craving without any pleasurable enjoyment at all. However, back in the 1950s, Olds and otherssoon announcedthe discovery of the “pleasure centers” of the brain.
Prior to Olds’s experiment, pleasure was a dirty word in psychology: the prevailing belief had been that motivation should largely be explained negatively, as the avoidance of pain rather than the pursuit of pleasure. But, here, pleasure seemed undeniably to be a positive behavioral force. Indeed, it looked like apositive feedback loop. There was apparently nothing to stop the animal stimulating itself to exhaustion.
It wasn’t long until arumour began spreadingthat the rats regularly lever-pressed to the point of starvation. The explanation was this: once you have tapped into the source of all reward, all other rewarding tasks — even the things required for survival — fall away as uninteresting and unnecessary, even to the point of death.
Like the Coastrunner AI, if you accrue reward directly – without having to bother with any of the work of completing the actual track – then why not just loop indefinitely? For a living animal, which has multiple requirements for survival, such dominating compulsion might prove deadly. Food is pleasing, but if you decouple pleasure from feeding, then the pursuit of pleasure might win out over finding food.
Though no rats perished in the original 1950s experiments, later experiments did seem to demonstrate the deadliness of electrode-induced pleasure. Having ruled out the possibility that the electrodes were creating artificial feelings of satiation,one 1971 studyseemingly demonstrated that electrode pleasure could indeedoutcompete other drives, and do so to the point ofself-starvation.
Word quickly spread. Throughout the 1960s, identical experiments were conducted onother animals beyondthe humble lab rat: from goats and guinea pigs to goldfish. Rumor evenspreadof a dolphin who had been allowed to self-stimulate, and, after being “left in a pool with the switch connected”, had “delighted himself to death after an all-night orgy of pleasure”.
This dolphin’s grisly death-by-seizure was, in fact, more likely caused by the way the electrode was inserted: with a hammer. The scientistbehind this experimentwas the extremely eccentricJ C Lilly, inventor of the flotation tank and prophet of inter-species communication, who had also turned monkeys into wireheads. He had reported, in 1961, of a particularly boisterous monkey becoming overweight from intoxicated inactivity after becoming preoccupied with pulling his lever, repetitively, for pleasure shocks.
One researcher (who had worked in Olds’s lab)askedwhether an “animal more intelligent than the rat” would “show the same maladaptive behavior”. Experiments on monkeys and dolphins had given some indication as to the answer.
But in fact, a number of dubious experiments had already been performed on humans.
Human wireheads
Robert Galbraith Heathremains a highlycontroversial figurein thehistory of neuroscience. Among other things, he performed experiments involvingtransfusing bloodfrom people with schizophrenia to people without the condition, to see if he could induce its symptoms (Heath claimed this worked, but other scientistscould not replicatehis results.) Hemay alsohave been involved in murky attempts to find military uses for deep-brain electrodes.
Since 1952, Heathhad been recordingpleasurable responses to deep-brain stimulation in human patients who had had electrodes installed due to debilitating illnesses such as epilepsy or schizophrenia.
During the 1960s, in a series of questionable experiments, Heath’s electrode-implanted subjects — anonymously named “B-10” and “B-12” — were allowed to press buttons to stimulate their own reward centers. They reported feelings of extreme pleasure and overwhelming compulsion to repeat. A journalist later commented that this made his subjects “zombies”. One subjectreportedsensations “better than sex”.
In 1961, Heath attendeda symposiumon brain stimulation, where another researcher —José Delgado— had hinted that pleasure-electrodes could be used to “brainwash” subjects, altering their “natural” inclinations. Delgado would later play the matador and bombastically demonstrate this by pacifying an implanted bull. But at the 1961 symposiumhe suggestedelectrodes could alter sexual preferences.
Heath was inspired. A decade later, he even tried to use electrode technology to “re-program” the sexual orientation of a homosexual male patient named “B-19”. Heath thought electrode stimulation could convert his subject by “training” B-19’s brain to associate pleasure with “heterosexual” stimuli. He convinced himself that it worked (although there is no evidence it did).
Despite being ethically and scientifically disastrous, the episode – which was eventuallypicked upby the press and condemned by gay rights campaigners – no doubt greatly shaped the myth of wireheading: if it can “make a gay man straight” (as Heath believed), what can’t it do?
Hedonism helmets
From here, the idea took hold in wider culture and the myth spread. By 1963, the prolific science fiction writer Isaac Asimov was already extruding worrisome consequences from the electrodes. He feared that it might lead to an “addiction to end all addictions”, the results of which are “distressing to contemplate”.
By 1975, philosophypaperswere using electrodes in thought experiments. One paper imagined “warehouses” filled up with people — in cots — hooked up to “pleasure helmets”, experiencing unconscious bliss. Of course, most would argue this would not fulfil our “deeper needs”. But, the author asked, “what about a “super-pleasure helmet”? One that not only delivers “great sensual pleasure”, but also simulates any meaningful experience — from writing a symphony to meeting divinity itself? It may not be really real, but it “would seem perfect; perfect seeming is the same as being”.
The author concluded: “What is there to object in all this? Let’s face it: nothing”.
The idea of the human species dropping out of reality in pursuit of artificial pleasures quickly made its way through science fiction. The same year as Asimov’s intimations, in 1963, Herbert W. Franke published his novel,The Orchid Cage.
It foretells a future wherein intelligent machines have been engineered to maximize human happiness, come what may. Doing their duty, the machines reduce humans to indiscriminate flesh-blobs, removing all unnecessary organs. Many appendages, after all, only cause pain. Eventually, all that is left of humanity are disembodied pleasure centers, incapable of experiencing anything other than homogeneous bliss.
From there, the idea percolated through science fiction. From Larry Niven’s 1969 story “Death by Ecstasy”, where the word “wirehead” is first coined, through Spider Robinson’s 1982Mindkiller, the tagline of which is “Pleasure — it’s the only way to die”.
Supernormal stimuli
But we humans don’t even need to implant invasive electrodes to make our motivations misfire. Unlike rodents, oreven dolphins, we areuniquely goodataltering our environment. Modern humans are also good at inventing — and profiting from — artificial products that are abnormally alluring (in the sense that our ancestors would never have had to resist them in the wild). We manufacture our own ways to distract ourselves.
Around the same time as Olds’s experiments with the rats, the Nobel-winning biologistNikolaas Tinbergenwas researching animal behavior. He noticed thatsomething interestinghappened when a stimulus that triggers an instinctual behavior is artificially exaggerated beyond its natural proportions. The intensity of the behavioral response does not tail off as the stimulus becomes more intense, and artificially exaggerated, but becomes stronger: even to the point that the response becomes damaging for the organism.
For example, given a choice between abigger and spottiercounterfeit egg and the real thing, Tinbergen found birds preferred hyperbolic fakes at the cost of neglecting their own offspring. He referred to such preternaturally alluring fakes as “supernormal stimuli”.
Some, therefore, have asked: couldit bethat, living in amodernized and manufactured world— replete with fast-food and pornography — humanity has similarlystarted surrenderingits own resilience in place ofsupernormal convenience?
Old fears
As technology makes artificial pleasures more available and alluring, it can sometimes seem that they are out-competing the attention we allocate to “natural” impulses required for survival. People often point tovideo game addiction. Compulsively and repetitively pursuing such rewards, to the detriment of one’s health, is not all too different from the AI spinning in a circle in Coastrunner. Rather than accomplishing any “genuine goal” (completing the race track or maintaining genuine fitness), one falls into the trap of accruing some faulty measure of that goal (accumulating points or counterfeit pleasures).
The idea is even older, though. Thomas has studied the myriad ways people in the past have feared that our species could be sacrificing genuine longevity for short-term pleasures or conveniences. His bookX-Risk: How Humanity Discovered its Own Extinctionexplores the roots of this fear and how it first really took hold in Victorian Britain: when the sheer extent of industrialization — and humanity’s growing reliance on artificial contrivances — first became apparent.
Carnal crustacea
Having digested Darwin’s1869 classic, the biologistRay Lankesterdecided to supply a Darwinian explanation for parasitic organisms. He noticed that the evolutionary ancestors of parasites were often more “complex”. Parasitic organisms had lost ancestral features like limbs, eyes, or other complex organs.
Lankestertheorized that, because the parasite leeches off their host, they lose the need to fend for themselves. Piggybacking off the host’s bodily processes, their own organs — for perception and movement — atrophy. His favorite example was a parasitic barnacle,named theSacculina, which starts life as a segmented organism with a demarcated head. Afterattaching toa host, however, the crustacean “regresses” into an amorphous, headless blob, sapping nutrition from their host like the wirehead plugs into current.
For the Victorian mind, it was a short step to conjecture that — due to increasing levels of comfort throughout the industrialized world — humanity could be evolving in the direction of the barnacle. “Perhaps we are all drifting, tending to the condition of intellectual barnacles,” Lankestermused.
Indeed, not long prior to this, the satiristSamuel Butlerhad speculated that humans, in their headlong pursuit of automated convenience, were withering into nothing but a “sort of parasite” upon their own industrial machines.
True nirvana
By the 1920s, Julian Huxleypenned a short poem. It jovially explored the ways a species can “progress”. Crabs, of course, decided progress was sideways. But what of the tapeworm? He wrote:
The fear that we could follow the tapeworm was somewhat widespread in the interwar generation. Huxley’s own brother, Aldous, would provide his own vision of the dystopian potential forpharmaceutically-induced pleasuresin his 1932 novelBrave New World.
A friend of the Huxleys, the British-Indian geneticist and futurologistJ B S Haldanealso worried that humanity might be on the path of the parasite: sacrificing genuine dignity at the altar of automated ease, just like the rodents who would later sacrifice survival for easy pleasure-shocks.
Haldane warned: “The ancestors [of] barnacles had heads” – and in the pursuit of pleasantness — “man may just as easily lose his intelligence”. Thisparticular fearhas notreallyevergoneaway.
So, the notion of civilization derailing through seeking counterfeit pleasures, rather than genuine longevity, is old. And, indeed, the older an idea is — and the more stubbornly recurrent it is — the more we should be wary that it is a preconception rather than anything based on evidence. So, is there anything to these fears?
In an age of increasinglyattention-grabbing algorithmic media, it can seem that faking signals of fitness often yields more success than pursuing the real thing. Like Tinbergen’s birds, we prefer exaggerated artifice to the genuine article. And thesexbotshave not evenarrived yet.
Because of this, some experts conjecture that “wirehead collapse” might wellthreatencivilization. Our distractions are only going to get more attention grabbing, not less.
Already by 1964,Polish futurologistStanisław LemconnectedOlds’s rats to the behavior of humans in the modern consumerist world – pointing to “cinema”, “pornography”, and “Disneyland”. He conjectured that technological civilizations might cut themselves off from reality, becoming “encysted” within their own virtual pleasure simulations.
Addicted aliens
Lem, and others since, have even ventured thatthe reasonour telescopeshaven’t foundevidence of advanced spacefaringalien civilizationsis because all advanced cultures — here and elsewhere — inevitably create more pleasurable virtual alternatives to exploring outer space.Explorationis difficult and risky, after all.
Back in the countercultural heyday of the 1960s, the molecular biologistGunther Stentsuggested that this process would happen through “global hegemony of beat attitudes”. Referencing Olds’s experiments, he helped himself to the speculation that hippie drug-use was the prelude tocivilizations wireheading. At a 1971 conference on the search for extraterrestrials, Stentsuggestedthat, instead of expanding bravely outwards, civilizationscollapse inwardsinto meditative and intoxicated bliss.
In our own time, it makes more sense for concerned parties to point toconsumerism, social mediaandfast-foodas the culprits for potential collapse (and, hence, the reason no other civilizations have yet visibly spread throughout the galaxy). Each era has its own anxieties.
So what do we do?
But these are almost certainly not themost pressingrisks facing us. Andif done right, forms of wireheading could make accessibleuntold vistasof joy, meaning, and value. We shouldn’t forbid ourselves these peaks ahead of weighing everything up.
But there is a real lesson here. Making adaptive complex systems – whether brains, AI, or economies – behave safely and well is hard. Anders works precisely on solvingthis riddle. Given that civilization itself – as a whole – is just such a complex adaptive system, how can we learn about inherent failure modes or instabilities, so that we can avoid them? Perhaps “wireheading” is an inherent instability that canafflict marketsand the algorithms that drive them, as much as addiction can afflict people?
In the case of AI, we are laying the foundations of such systems now. Once afringeconcern, a growing number ofexpertsagree that achieving smarter-than-human AI may be close enough on the horizon to pose aserious concern. This is because we need to make sure it issafebefore this point, and figuring out how to guarantee this will itself take time. There does, however, remain significant disagreement among expertson timelines, and how pressingthis deadlinemight be.
If such an AI is created, we can expect that it may have access to its own “source code”, such that itcan manipulateits motivational structure and administer its own rewards. This could prove an immediate path to wirehead behavior, and cause such an entity to become, effectively, a “super-junkie”. But unlike the human addict, it may not be the case that its state of bliss is coupled with an unproductive state of stupor or inebriation.
PhilosopherNick Bostromconjectures that such an agent might devote all of its superhuman productivity and cunning to “reducing the risk of future disruption” of its precious reward source. And if it judges even a nonzero probability for humans to be an obstacle to its next fix, we might well be in trouble.
Speculative and worst-case scenarios aside, the example we started with – of the racetrack AI and reward loop – reveals that the basic issue is already a real-world problem in artificial systems. We should hope, then, that we’ll learn much more about these pitfalls of motivation, and how to avoid them, before things develop too far. Even though it has humble origins — in the cranium of an albino rat and in poems about tapeworms — “wireheading” is an idea that is likely only to become increasingly important in the near future.
Article byThomas Moynihan, Visiting Research Associate in History, St Benet’s College,University of OxfordandAnders Sandberg, James Martin Research Fellow, Future of Humanity Institute & Oxford Martin School,University of Oxford
This article is republished fromThe Conversationunder a Creative Commons license. Read theoriginal article.
Story byThe Conversation
An independent news and commentary website produced by academics and journalists.An independent news and commentary website produced by academics and journalists.
Get the TNW newsletter
Get the most important tech news in your inbox each week.