How a game of draughts helped unravel the brain’s reward system

Hannah Devlin Science correspondent
The early computer program used a form of Pavlovian reinforcement to learn how to play draughts. Photograph: Teresa Short/Getty Images

In 1955, American computer pioneer Arthur Samuel unveiled a draughts-playing program that human opponents described as “tricky but beatable”. The achievement sounds quaint today given the subsequent decisive triumphs of machines over humans at chess, Jeopardy, Go and poker. But according to Prof Peter Dayan, a computational neuroscientist at University College London and one of the recipients of the 2017 Brain Prize, Samuel had hit on “one of the first good ideas in AI” and a concept that has transformed our understanding of the human brain.

Samuel’s program used a souped-up form of Pavlovian reinforcement to learn how to play draughts. Pavlov’s dogs learned the simple association between hearing a bell and the arrival of food, but in a game like draughts there are many steps on the path to victory or defeat. This raises the question of how we (or a computer) learn which moves contribute to victory and should be repeated in the future.

Samuel’s key innovation was to make his programme calculate its chances of victory after each move, creating a running prediction tally. Rather than rewarding the programme simply on whether the game was lost or won, it was rewarded based how well its predictions had matched up to the final outcome, allowing it to work out over time which moves had contributed to success.

The strategy worked surprisingly well, but it was not until Cambridge neuroscientist Prof Wolfram Schultz recorded the electrical activity from neurons of monkeys which were learning a task that it became clear that this is precisely how the brain’s reward system works. Dayan, familiar with the early AI work, quickly made the connection.

Schultz’s observations showed that dopamine neurons do not simply respond based on the size of the reward (fruit juice in the monkeys’ case). Instead, they code for the discrepancy between the reward and its prediction (the so-called “prediction error”). An unanticipated reward produces a bigger response than an expected one, and there is a dip in activity when a reward is expected but withheld.

“This is the biological process that makes us want to buy a bigger car or house, or be promoted at work,” said Schultz, describing the dopamine neurons as “little devils in our brain that drive us towards more rewards”.

Since then, Prof Ray Dolan, also of UCL, has used brain imaging to probe how the same circuitry in the human brain governs social interactions, learning and decision-making.

The brain’s reward system is a rare area of neuroscience in which theory has precisely predicted what is seen biologically. This solid theoretical grounding provides a powerful way to investigate the ways in which the reward system can go awry in conditions including gambling, drug addiction and depression.

In the past decade, the field has also come full circle, with the concept of reinforcement learning enjoying a renaissance in AI, where it is now bolstered by hugely increased processing power and giant datasets.

“Things like AlphaGo [an advanced computer program that plays the board game Go] are using a lot of these ideas,” said Dayan.

An open question is where the boundary lies on the types of tasks computers can learn through this approach – Go and poker, yes, but what about writing a sonnet or solving the Riemann hypothesis?

“It’s very hard to put limits on these things,” Dayan said. “We’re getting smarter and smarter about generating learning algorithms.”

By using Yahoo you agree that Yahoo and partners may use Cookies for personalisation and other purposes