In this notebook, we will simulate an experiment.
The matlab code
```function [a, r] = simulate_M3RescorlaWagner_v1(T, mu, alpha, beta)
Q = [0.5 0.5];
for t = 1:T
% compute choice probabilities
p = exp(beta*Q) / sum(exp(beta*Q));
% make choice according to choice probababilities
a(t) = choose(p);
% generate reward based on choice
r(t) = rand < mu(a(t));
% update values
delta = r(t) - Q(a(t));
Q(a(t)) = Q(a(t)) + alpha * delta;
end```
Q = np.array([0.5, 0.5])
beta = 4 # inverse temperature
p = np.exp(Q*beta) / sum(np.exp(Q*beta))
p
action = choose(p)
print("We chose action %d."%action)
mu = [.2,.8] # Mu are the reward probabilities for each action defined by the game
reward = np.random.random() < mu[action]
Note how the q's have now changed.
alpha = .2
delta = reward - Q[action] # prediction error
Q[action] = Q[action] + alpha * delta
And everything together:
T = 10 # number of trials
mu = [.2,.8] # reward probabilities
## Participant parameters
alpha = .2 # learning rate
beta = 4 # inverse temperature
actions, rewards, Qs, deltas = simulate_M3RescorlaWagner_v1(T, mu, alpha, beta, starting_q_values = [0,0])
Qs
%matplotlib inline
%config InlineBackend.figure_format = 'retina'
T = 100 # number of trials
mu = [.2,.8] # reward probabilities
alpha = .2 # learning rate
beta = 4 # inverse temperature
plot_rescorla_game(T, mu, alpha, beta)
Reducing the learning rate (participant does not manage to learn probabilities)
T = 100 # number of trials
mu = [.2,.8] # reward probabilities
alpha = .01 # learning rate
beta = 4 # inverse temperature
plot_rescorla_game(T, mu, alpha, beta)
Let's give them more trials
T = 1000 # number of trials
mu = [.2,.8] # reward probabilities
alpha = .01 # learning rate
beta = 4 # inverse temperature
plot_rescorla_game(T, mu, alpha, beta)
An extreme beta value makes participants stop exploring.
T = 1000 # number of trials
mu = [.2,.8] # reward probabilities
alpha = .1 # learning rate
beta = 1000 # inverse temperature
plot_rescorla_game(T, mu, alpha, beta)
If beta is very low, participants still learn, but they don't use that knowledge (something that we cannot see in this plot yet).
T = 1000 # number of trials
mu = [.2,.8] # reward probabilities
alpha = .1 # learning rate
beta = .0000000001 # inverse temperature
plot_rescorla_game(T, mu, alpha, beta)
Note that choice_kernel decays with the inverted alpha in each trial
choice_kernel = (1 - alpha_c) * choice_kernel
T = 100 # number of trials
mu = [.2,.8] # reward probabilities
alpha = .2 # learning rate
beta = 4 # inverse temperature
actions = plot_choice_kernel_game(T, mu, alpha, beta)
T = 100 # number of trials
mu = [.2,.8] # reward probabilities
alpha = .2 # learning rate
beta = .004 # inverse temperature
actions = plot_choic_kernel_game(T, mu, alpha, beta)
T = 500 # number of trials
mu = [.2,.8] # reward probabilities
alpha = .2 # learning rate
beta = 4 # inverse temperature
actions = plot_choic_kernel_game(T, mu, alpha, beta)
With a reasonably high beta one of the options should always move towards 1.