Policies
In CoopIHC, it is assumed that agents performed decisions (take actions) based on their observations as well as their internal state. There might also be a cost associated with selecting an action, for example, a physical cost associated with moving an item, or benefit, for example if having a large diversity of actions is rewarding.
CoopIHC provides a generic object called a policy wich specifies how actions are taken by the agent. To specify a new policy, you will usually subclass the BasePolicy
, although several predefined policies also exist.
Subclassing BasePolicy
To define a policy, simply subclass :py:class:BasePolicy<coopihc.policy.BasePolicy> and redefine its sample()
method. Below, we show how ExamplePolicy
has been defined.
1class ExamplePolicy(BasePolicy):
2 """ExamplePolicy
3
4 A simple policy which assumes that the agent using it has a 'goal' state and that the task has an 'x' state. x is compared to the goal and appropriate action is taken to make sure x reaches the goal.
5
6
7 """
8
9 def __init____init__(self, *args, action_state=None, **kwargs):
10 super().__init__(*args, action_state=None, **kwargs)
11
12 @BasePolicy.default_value
13 def sample(self, agent_observation=None, agent_state=None):
14 """sample
15
16 Compares 'x' to goal and issues +-1 accordingly.
17
18 :return: action, reward
19 :rtype: tuple(`StateElement<coopihc.base.StateElement.StateElement>`, float)
20 """
21
22 if (
23 agent_observation["task_state"]["x"]
24 < agent_observation["user_state"]["goal"]
25 ):
26 _action_value = 1
27 elif (
28 agent_observation["task_state"]["x"]
29 > agent_observation["user_state"]["goal"]
30 ):
31 _action_value = -1
32 else:
33 _action_value = 0
34
35 return _action_value, 0
Note
Don’t forget to return a reward with the action.
Note
You can virtually put anything inside this function: that includes the output of a neural network, of a complex simulation process, and even the output of another bundle (see Modularity for an example.)
Other than that, there are a few predefined policies which you may find useful.
Explicit Likelihood Discrete (ELLD) Policy
Explicit Likelihood Discrete (ELLD) Policy is used in cases where the agent model is straightforward enough to be specified by an analytical model.
Below, we define a simple probabilistic model which assigns different probabilities to each possible discrete action.
1# Define the likelihood model
2def likelihood_model(self, action, observation, *args, **kwargs):
3 if action == 0:
4 return 1 / 7
5 elif action == 1:
6 return 1 / 7 + 0.05
7 elif action == 2:
8 return 1 / 7 - 0.05
9 elif action == 3:
10 return 1 / 7 + 0.1
11 elif action == 4:
12 return 1 / 7 - 0.1
13 elif action == 5:
14 return 1 / 7 + 0.075
15 elif action == 6:
16 return 1 / 7 - 0.075
17 else:
18 raise RuntimeError(
19 "warning, unable to compute likelihood. You may have not covered all cases in the likelihood definition"
20 )
21
22
You can then define your policy and attach the model to it:
1_seed = 123
2se = cat_element(init=1, N=7)
3action_state = State(**{"action": se})
4policy = ELLDiscretePolicy(action_state, seed=_seed)
5# Attach the model
6policy.attach_likelihood_function(likelihood_model)
BIGDiscretePolicy
The Bayesian Information Gain Policy is a reimplementation of BIGPoint introduced by Liu et al 1.
The main ideas/assumptions are:
A user wants the task to go to some goal state \(\\Theta\)
The assistant can set the task in a number of states (X)
The user can perform a given set of action Y
A model \(p(Y=y|X=X, \\Theta = \\theta)\) exists for user behavior. This model is exploited by the assistant, and is not necessarily the true model.
After the policy has been defined, make sure to call:
attach_set_theta
, to specify the potential goal states
attach_transition_function
, to specify how the task state evolves after an assistant action.
You can find an example implementation in CoopIHC-Zoo’s pointing module. Below are the important steps:
Warning
Example outdated
TASK_SIZE = 30
TARGETS = [1,5,6,19]
action_state = State()
action_state["action"] = StateElement(
0,
autospace([i for i in range(TASK_SIZE)]),
out_of_bounds_mode="error",
)
# Define the user_policy_model that the assistant will use
user_policy_model = XXX
# Define Policy
agent_policy = BIGDiscretePolicy(action_state, user_policy_model)
# Specify the potential Goal states of the user. Here, potential goals are all cases where targets may be the use goal
set_theta = [
{
("user_state", "goal"): StateElement(
t,
discrete_space(numpy.array(list(range(TASK_SIZE)))),
)
}
for t in TARGETS
]
# Attach this set to the policy
self.policy.attach_set_theta(set_theta)
# Define the predicted future observation of the user due to assistant action
def transition_function(assistant_action, observation):
"""What future observation will the user see due to assistant action"""
# always do this
observation["assistant_action"]["action"] = assistant_action
# specific to BIGpointer
observation["task_state"]["position"] = assistant_action
return observation
# Attach it to the policy
self.policy.attach_transition_function(transition_function)
LinearFeedback
RLPolicy
The RLPolicy is a wrapper for a neural network trained via Deep Reinforcement Learning. For an example, head over to Using Reinforcement Learning.
WrapAsPolicy
- 1
Liu, Wanyu, et al. “Bignav: Bayesian information gain for guiding multiscale navigation.” Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems. 2017.