In this notebook, we will we try to make the simulation a bit less abstract.
from fastcore.foundation import patch

A blue print for a (very) simple Person.

We create a person using a python class. Classes are like blue prints that let us create concrete objects (or in this case persons).

class Person():
    '''The blueprint of a very simple person'''
    def __init__(self, name):
        # The person has a name
        self.name = name
        
    def say_hi(self):
        '''And it can say hi'''
        print("Hi, I'm %s!"%self.name)
adam = Person("Adam")
adam.say_hi()
Hi, I'm Adam!

Our first "learning" model

It's nice that our persons can say their names. However, to make them psychologically more interesting, let's allow them to make choices:

class RandomModel():
    '''This model generates choice probabilities which a person can use to make choices.'''
    def __init__(self):
        self.person = None
    
    def get_choice_probabilities(self, actions):
        '''A function that gives each action the same probability.'''
        choice_probabilities = {}
        for action in actions:
            choice_probabilities[action] = 1/len(actions)
        
        return choice_probabilities

Let's make an "instance" of this model. Although this model right now is not associated to any person, we can already test how it behaves:

rando = RandomModel()
possible_actions = ['a','b']
rando.get_choice_probabilities(possible_actions)
{'a': 0.5, 'b': 0.5}

It looks like it behaves as expected.

Let's make our Person blue print a bit more complex and allow people to have learning models.

@patch
def set_learning_model(self:Person, model):
    '''This function associates a person with a learning model.  It also tells the learning model which person it is associated to.'''
    self.learning_model = model
    model.person = self
adam.set_learning_model(RandomModel())
type(adam.learning_model)
__main__.RandomModel

We also allow them to make choices based on the model-calculated choice probabilities.

@patch
def choose_action(self:Person, possible_actions):
    '''This function chooses an action based on model-calculated choice probabilities.'''
    # Getting choice probabilities from learning model
    choice_probabilities = self.learning_model.get_choice_probabilities(possible_actions)
    # Chosing action based on choice probabilities
    random = np.random.random()
    if random < list(choice_probabilities.values())[0]:
        chosen_action = list(choice_probabilities.keys())[0]
    else:
        chosen_action = list(choice_probabilities.keys())[1]
    return chosen_action

Time to check whether people can use their learning models:

Adam chooses an action.

adam.choose_action(['a','b'])
'b'

Does Adam choose randomely?

pd.Series([adam.choose_action(['a','b']) == 'a' for i in range(1000)]).mean() # Making a 1000 choices and getting the average time Adam choses "a"
0.49

Yes! :)

A more complicated model

Now that we created a person that can use a (pretty simple) learning model, we can try to create a more complicated model.

RescorlaWagnerModel

The Rescorla Wagner Model can learn from experience.

class RescorlaWagnerModel():
    def __init__(self, alpha, beta):
        self.person = None
        self.alpha = alpha # A Rescorla-Wagner Model has a learning rate...
        self.beta = beta # ...and an inverse temperature parameter
        self.expected_reward_memory = {} # It can memorize expected rewards but starts with no knowledge of the world
resco = RescorlaWagnerModel(alpha = .2, beta = 4)

A Rescorla-Wagner Model associates each action with an expected reward. If it has not encountered a possible action, it assignes .5 to it.

@patch
def get_expected_reward_for_action(self:RescorlaWagnerModel, action):
    # If we haven't encountered the action, we set its expected reward to .5 and remember it
    if action not in self.expected_reward_memory:
        self.expected_reward_memory[action] = .5
    # We return the expected reward associated with the action
    return self.expected_reward_memory[action]

Let's test our model, so far:

The model starts with no knowledge of the world.

resco.expected_reward_memory
{}

It encounteres an action and sets it's expected reward to .5.

resco.get_expected_reward_for_action('a')
0.5

After encountering an action, the model remembers it's reward.

resco.expected_reward_memory
{'a': 0.5}

So far the model only considers one action at a time. For convenience, we add a function that can process several actions at once.

@patch
def get_expected_rewards_for_possible_actions(self:RescorlaWagnerModel, actions):
    expected_rewards = {}
    for action in actions:
        expected_rewards[action] = self.get_expected_reward_for_action(action)
    return expected_rewards
resco.get_expected_rewards_for_possible_actions(['b','c'])
{'b': 0.5, 'c': 0.5}
resco.expected_reward_memory
{'a': 0.5, 'b': 0.5, 'c': 0.5}

Now that our model can associate expected rewards with actions, we write a function that translates these expected rewards to choice probabilities:

@patch
def get_choice_probabilities(self:RescorlaWagnerModel, actions):
    # Turning the expected reward dictionary into an array we can use in mathematical functions
    expected_rewards = self.get_expected_rewards_for_possible_actions(actions)
    expected_reward_values = np.array(list(expected_rewards.values()))
    # Calculating the probabilities
    choice_probabilities = np.exp(expected_reward_values * self.beta) / sum(np.exp(expected_reward_values * self.beta))
    # Turning it back into a dictionary
    choice_probabilities = dict(zip(actions,choice_probabilities))
    return choice_probabilities
resco.get_choice_probabilities(['b','c'])
{'b': 0.5, 'c': 0.5}

Now that our model can generate choice probabilities, people can use it to make choices:

richard = Person('Richard') # a new person
richard.set_learning_model(RescorlaWagnerModel(alpha = .2, beta = 4))
richard.choose_action(['a','b'])
'a'

At this point the Rescorla Wagner Model acts the same as the random model. This is because we have not given it the ability to learn (update it's expected reward values). Therefore the values stay at their initial value .5.

Let's give the model the chance to learn by associating actions with rewards.

@patch
def learn(self:RescorlaWagnerModel, action, reward):
    self.prediction_error = reward - self.get_expected_reward_for_action(action)
    # You might know the line below as Q_a = Q_a + alhpa * delta 
    self.expected_reward_memory[action] = self.expected_reward_memory[action] + self.alpha * self.prediction_error
resco.expected_reward_memory
{'a': 0.5, 'b': 0.5, 'c': 0.5}
resco.learn('a',1)
resco.expected_reward_memory
{'a': 0.6, 'b': 0.5, 'c': 0.5}

For learning to occur, of course, participants have to remember their last action (at least until receiving a reward):

@patch
def choose_and_remember_action(self:Person, possible_actions):
    # Choosing an action
    action = self.choose_action(possible_actions)
    # And remembering it 
    self.last_action = action
    return action
richard.choose_and_remember_action(['a','b'])
'b'
richard.last_action
'b'

Now, we write one more function that allows participants to receive rewards and consequently learn using their learning model:

@patch
def get_rewarded(self:Person, reward):
    self.learning_model.learn(self.last_action, reward)

Let's test it:

We give Richard a reward for his last action.

richard.get_rewarded(1)

His expected rewards get updated.

richard.learning_model.expected_reward_memory
{'a': 0.5, 'b': 0.6}
richard.choose_and_remember_action(['a','b'])
'a'

Let's reward him 100000 times for his latest action.

for i in range(100000):
    richard.get_rewarded(1) # Jackpot!!!
richard.learning_model.expected_reward_memory
{'a': 0.9999999999999998, 'b': 0.6}

Richard should be pretty fond of a now. Note that he still chooses a only in about 80% of the cases. Could you imagine why?

pd.Series([richard.choose_action(['a','b']) == 'a' for i in range(1000)]).mean()
0.82