from fastcore.foundation import patch
class Person():
'''The blueprint of a very simple person'''
def __init__(self, name):
# The person has a name
self.name = name
def say_hi(self):
'''And it can say hi'''
print("Hi, I'm %s!"%self.name)
adam = Person("Adam")
adam.say_hi()
class RandomModel():
'''This model generates choice probabilities which a person can use to make choices.'''
def __init__(self):
self.person = None
def get_choice_probabilities(self, actions):
'''A function that gives each action the same probability.'''
choice_probabilities = {}
for action in actions:
choice_probabilities[action] = 1/len(actions)
return choice_probabilities
Let's make an "instance" of this model. Although this model right now is not associated to any person, we can already test how it behaves:
rando = RandomModel()
possible_actions = ['a','b']
rando.get_choice_probabilities(possible_actions)
It looks like it behaves as expected.
Let's make our Person blue print a bit more complex and allow people to have learning models.
@patch
def set_learning_model(self:Person, model):
'''This function associates a person with a learning model. It also tells the learning model which person it is associated to.'''
self.learning_model = model
model.person = self
adam.set_learning_model(RandomModel())
type(adam.learning_model)
We also allow them to make choices based on the model-calculated choice probabilities.
@patch
def choose_action(self:Person, possible_actions):
'''This function chooses an action based on model-calculated choice probabilities.'''
# Getting choice probabilities from learning model
choice_probabilities = self.learning_model.get_choice_probabilities(possible_actions)
# Chosing action based on choice probabilities
random = np.random.random()
if random < list(choice_probabilities.values())[0]:
chosen_action = list(choice_probabilities.keys())[0]
else:
chosen_action = list(choice_probabilities.keys())[1]
return chosen_action
Time to check whether people can use their learning models:
Adam chooses an action.
adam.choose_action(['a','b'])
Does Adam choose randomely?
pd.Series([adam.choose_action(['a','b']) == 'a' for i in range(1000)]).mean() # Making a 1000 choices and getting the average time Adam choses "a"
Yes! :)
class RescorlaWagnerModel():
def __init__(self, alpha, beta):
self.person = None
self.alpha = alpha # A Rescorla-Wagner Model has a learning rate...
self.beta = beta # ...and an inverse temperature parameter
self.expected_reward_memory = {} # It can memorize expected rewards but starts with no knowledge of the world
resco = RescorlaWagnerModel(alpha = .2, beta = 4)
A Rescorla-Wagner Model associates each action with an expected reward. If it has not encountered a possible action, it assignes .5 to it.
@patch
def get_expected_reward_for_action(self:RescorlaWagnerModel, action):
# If we haven't encountered the action, we set its expected reward to .5 and remember it
if action not in self.expected_reward_memory:
self.expected_reward_memory[action] = .5
# We return the expected reward associated with the action
return self.expected_reward_memory[action]
Let's test our model, so far:
The model starts with no knowledge of the world.
resco.expected_reward_memory
It encounteres an action and sets it's expected reward to .5.
resco.get_expected_reward_for_action('a')
After encountering an action, the model remembers it's reward.
resco.expected_reward_memory
So far the model only considers one action at a time. For convenience, we add a function that can process several actions at once.
@patch
def get_expected_rewards_for_possible_actions(self:RescorlaWagnerModel, actions):
expected_rewards = {}
for action in actions:
expected_rewards[action] = self.get_expected_reward_for_action(action)
return expected_rewards
resco.get_expected_rewards_for_possible_actions(['b','c'])
resco.expected_reward_memory
Now that our model can associate expected rewards with actions, we write a function that translates these expected rewards to choice probabilities:
@patch
def get_choice_probabilities(self:RescorlaWagnerModel, actions):
# Turning the expected reward dictionary into an array we can use in mathematical functions
expected_rewards = self.get_expected_rewards_for_possible_actions(actions)
expected_reward_values = np.array(list(expected_rewards.values()))
# Calculating the probabilities
choice_probabilities = np.exp(expected_reward_values * self.beta) / sum(np.exp(expected_reward_values * self.beta))
# Turning it back into a dictionary
choice_probabilities = dict(zip(actions,choice_probabilities))
return choice_probabilities
resco.get_choice_probabilities(['b','c'])
Now that our model can generate choice probabilities, people can use it to make choices:
richard = Person('Richard') # a new person
richard.set_learning_model(RescorlaWagnerModel(alpha = .2, beta = 4))
richard.choose_action(['a','b'])
At this point the Rescorla Wagner Model acts the same as the random model. This is because we have not given it the ability to learn (update it's expected reward values). Therefore the values stay at their initial value .5.
Let's give the model the chance to learn by associating actions with rewards.
@patch
def learn(self:RescorlaWagnerModel, action, reward):
self.prediction_error = reward - self.get_expected_reward_for_action(action)
# You might know the line below as Q_a = Q_a + alhpa * delta
self.expected_reward_memory[action] = self.expected_reward_memory[action] + self.alpha * self.prediction_error
resco.expected_reward_memory
resco.learn('a',1)
resco.expected_reward_memory
For learning to occur, of course, participants have to remember their last action (at least until receiving a reward):
@patch
def choose_and_remember_action(self:Person, possible_actions):
# Choosing an action
action = self.choose_action(possible_actions)
# And remembering it
self.last_action = action
return action
richard.choose_and_remember_action(['a','b'])
richard.last_action
Now, we write one more function that allows participants to receive rewards and consequently learn using their learning model:
@patch
def get_rewarded(self:Person, reward):
self.learning_model.learn(self.last_action, reward)
Let's test it:
We give Richard a reward for his last action.
richard.get_rewarded(1)
His expected rewards get updated.
richard.learning_model.expected_reward_memory
richard.choose_and_remember_action(['a','b'])
Let's reward him 100000 times for his latest action.
for i in range(100000):
richard.get_rewarded(1) # Jackpot!!!
richard.learning_model.expected_reward_memory
Richard should be pretty fond of a now. Note that he still chooses a only in about 80% of the cases. Could you imagine why?
pd.Series([richard.choose_action(['a','b']) == 'a' for i in range(1000)]).mean()