A Spiking Network Model of Decision Making Employing Rewarded STDP. Skorheim S, Lonjers P, Bazhenov M. PLoS One. 2014 Mar 14;9(3):e90821.

Rewarded spike timing dependent plasticity (STDP) has been implicated as a possible learning mechanism in a variety of brain systems. This mechanism combines unsupervised STDP that modifies synaptic strength depending on the relative timing of presynaptic input and postsynaptic spikes together with a reinforcement signal that modulates synaptic changes. In this study, rewarded STDP was implemented as part of a spiking network model of excitatory cells and inhibitory interneurons. The network was used to model basic foraging behavior in a simulated organism. The foraging behavior took place in a simulated environment of randomly distributed “food” particles. Input to the network corresponded to the locations of local “food”. At each time step direction of the movement was controlled by the activity of a group of output cells. Reward was applied to the network when the movement led to acquisition of a “food” particle. It was used to solidify recently created STDP event traces. Over the course of the training period the network, which begins with a set of synaptic connections of uniform strength, develops into a network capable of producing near optimal foraging behavior. Changing the density of “food” particles led to initial drop in performance that was then increased after a number of trials suggesting that synaptic changes were correlated to the statistics of the environment.

Movie of foraging behavior

 

0