ENS, room Jaurès, 24 rue Lhomond, 75005 Paris
Modern theories of reinforcement learning posit two systems competing for control of behavior: a "model-free" or "habitual" system that learns cached state-action values, and a "model-based" or "goal-directed" system that learns a world model which is then used to plan actions. I will argue that humans can adaptively invoke model-based computation when its benefits outweigh its costs. A simple meta-control learning rule can capture the dynamics of this cost-benefit analysis. Neuroimaging evidence points to the role of cognitive control regions in this computation. The theory also resolves a number of puzzling observations about controller arbitration in the brain.
To meet Sam Gershman, please contact Srdjan Ostojic: srdjan.ostojic@ens.fr.