Neural and Computational Mechanisms of Cognitive Flexibility
Recent studies have used one-trial-back decision policies (Win-Stay/Lose-Shift) and measures of reinforcement learning (RL, e.g. learning rate) to explain decision-making in two-armed bandit tasks. In many of these studies, outcomes reverse after one option is selected repeatedly (e.g. 8 in a row), and the primary measure of performance is the number of reversals completed. However, a confound exists between the number of reversals and Win-Stay likelihood. An alternative design reverses cue- or action-outcome associations over fixed blocks of trials. We used this blocked design and tested rats in a spatial two-armed bandit task. We analyzed choice behavior as a function of reward certainty and using Win-Stay/Lose-Shift (WSLS) metrics and Q-learning, an RL algorithm. We found that WSLS policies remain stable with increasing reward uncertainty, while choice accuracy decreases. Within test sessions, learning rates increased as rats adapted their strategies over the first few reversals but inverse temperature, a measure of choice randomness, remained stable. The OFC is commonly implicated in reversal learning, and we found that muscimol inactivation of the medial orbital cortex (mOFC) increased perseveration toward the previously best option and decreased negative feedback sensitivity. We then examined the role of noradrenaline neurotransmission in bandit performance and found yohimbine (2mg/kg) decreased rats’ sensitivity to positive feedback, leading to decreases in accuracy and increase in choice randomness. These effects are partially dependent on α2-noradrenergic receptors in OFC. Finally, we demonstrate a correspondence between reward schedule, WSLS policies and RL metrics in a task design that is free of the confound between Wins and reversals, and that the noradrenergic influence of mOFC on WSLS policy is dissociable from the region’s general role in cognitive flexibility.