In this notebook, we will simulate an experiment.

Rescorla Wagner

The matlab code

```function [a, r] = simulate_M3RescorlaWagner_v1(T, mu, alpha, beta)

Q = [0.5 0.5];

for t = 1:T

% compute choice probabilities
p = exp(beta*Q) / sum(exp(beta*Q));

% make choice according to choice probababilities
a(t) = choose(p);

% generate reward based on choice
r(t) = rand < mu(a(t));

% update values
delta = r(t) - Q(a(t));
Q(a(t)) = Q(a(t)) + alpha * delta;

end```

Q-values

Q values are the values that we give to different choices. In this function we start out with Q values of .5.

Q = np.array([0.5, 0.5])

Choice probability (p)

p is the choice probability which we calculate from the Q-values using the inverse temperature (beta), e.g.:

beta = 4 # inverse temperature
p = np.exp(Q*beta) / sum(np.exp(Q*beta))
p
array([0.5, 0.5])

Choose function

Choses an action based on our choice probabilities

choose[source]

choose(p)

action = choose(p)
print("We chose action %d."%action)
We chose action 0.

Reward

mu = [.2,.8] # Mu are the reward probabilities for each action defined by the game
reward = np.random.random() < mu[action]

Updating q-values

Note how the q's have now changed.

alpha = .2
delta = reward - Q[action] # prediction error
Q[action] = Q[action] + alpha * delta

And everything together:

simulate_M3RescorlaWagner_v1[source]

simulate_M3RescorlaWagner_v1(T, mu, alpha, beta, starting_q_values=[0.5, 0.5])

T = 10 # number of trials
mu = [.2,.8] # reward probabilities

## Participant parameters
alpha = .2 # learning rate
beta = 4 # inverse temperature

actions, rewards, Qs, deltas = simulate_M3RescorlaWagner_v1(T, mu, alpha, beta, starting_q_values = [0,0])
Qs
[array([0, 0]),
 array([0. , 0.2]),
 array([0.  , 0.36]),
 array([0.   , 0.488]),
 array([0.    , 0.5904]),
 array([0.     , 0.67232]),
 array([0.      , 0.537856]),
 array([0.       , 0.6302848]),
 array([0.        , 0.50422784]),
 array([0.        , 0.50422784])]

Simulating experiments

%matplotlib inline
%config InlineBackend.figure_format = 'retina'

plot_rescorla_game[source]

plot_rescorla_game(T, mu, alpha, beta)

T = 100 # number of trials
mu = [.2,.8] # reward probabilities
alpha = .2 # learning rate
beta = 4 # inverse temperature
plot_rescorla_game(T, mu, alpha, beta)

Reducing the learning rate (participant does not manage to learn probabilities)

T = 100 # number of trials
mu = [.2,.8] # reward probabilities
alpha = .01 # learning rate
beta = 4 # inverse temperature
plot_rescorla_game(T, mu, alpha, beta)

Let's give them more trials

T = 1000 # number of trials
mu = [.2,.8] # reward probabilities
alpha = .01 # learning rate
beta = 4 # inverse temperature
plot_rescorla_game(T, mu, alpha, beta)

An extreme beta value makes participants stop exploring.

T = 1000 # number of trials
mu = [.2,.8] # reward probabilities
alpha = .1 # learning rate
beta = 1000 # inverse temperature
plot_rescorla_game(T, mu, alpha, beta)

If beta is very low, participants still learn, but they don't use that knowledge (something that we cannot see in this plot yet).

T = 1000 # number of trials
mu = [.2,.8] # reward probabilities
alpha = .1 # learning rate
beta = .0000000001 # inverse temperature
plot_rescorla_game(T, mu, alpha, beta)

Choice Kernel

In the choice kernel function, we remove the deltas.

simulate_M4ChoiceKernel_v1[source]

simulate_M4ChoiceKernel_v1(T, mu, alpha_c, beta_c)

Note that choice_kernel decays with the inverted alpha in each trial choice_kernel = (1 - alpha_c) * choice_kernel

plot_choice_kernel_game[source]

plot_choice_kernel_game(T, mu, alpha, beta)

T = 100 # number of trials
mu = [.2,.8] # reward probabilities
alpha = .2 # learning rate
beta = 4 # inverse temperature
actions = plot_choice_kernel_game(T, mu, alpha, beta)
T = 100 # number of trials
mu = [.2,.8] # reward probabilities
alpha = .2 # learning rate
beta = .004 # inverse temperature
actions = plot_choic_kernel_game(T, mu, alpha, beta)
T = 500 # number of trials
mu = [.2,.8] # reward probabilities
alpha = .2 # learning rate
beta = 4 # inverse temperature
actions = plot_choic_kernel_game(T, mu, alpha, beta)

With a reasonably high beta one of the options should always move towards 1.