Policies

In CoopIHC, it is assumed that agents performed decisions (take actions) based on their observations as well as their internal state. There might also be a cost associated with selecting an action, for example, a physical cost associated with moving an item, or benefit, for example if having a large diversity of actions is rewarding.

CoopIHC provides a generic object called a policy wich specifies how actions are taken by the agent. To specify a new policy, you will usually subclass the BasePolicy, although several predefined policies also exist.

Subclassing BasePolicy

To define a policy, simply subclass :py:class:BasePolicy<coopihc.policy.BasePolicy> and redefine its sample() method. Below, we show how ExamplePolicy has been defined.

 1class ExamplePolicy(BasePolicy):
 2    """ExamplePolicy
 3
 4    A simple policy which assumes that the agent using it has a 'goal' state and that the task has an 'x' state. x is compared to the goal and appropriate action is taken to make sure x reaches the goal.
 5
 6
 7    """
 8
 9    def __init____init__(self, *args, action_state=None, **kwargs):
10        super().__init__(*args, action_state=None, **kwargs)
11
12    @BasePolicy.default_value
13    def sample(self, agent_observation=None, agent_state=None):
14        """sample
15
16        Compares 'x' to goal and issues +-1 accordingly.
17
18        :return: action, reward
19        :rtype: tuple(`StateElement<coopihc.base.StateElement.StateElement>`, float)
20        """
21
22        if (
23            agent_observation["task_state"]["x"]
24            < agent_observation["user_state"]["goal"]
25        ):
26            _action_value = 1
27        elif (
28            agent_observation["task_state"]["x"]
29            > agent_observation["user_state"]["goal"]
30        ):
31            _action_value = -1
32        else:
33            _action_value = 0
34
35        return _action_value, 0

Note

Don’t forget to return a reward with the action.

Note

You can virtually put anything inside this function: that includes the output of a neural network, of a complex simulation process, and even the output of another bundle (see Modularity for an example.)

Other than that, there are a few predefined policies which you may find useful.

Explicit Likelihood Discrete (ELLD) Policy

Explicit Likelihood Discrete (ELLD) Policy is used in cases where the agent model is straightforward enough to be specified by an analytical model.

Below, we define a simple probabilistic model which assigns different probabilities to each possible discrete action.

 1# Define the likelihood model
 2def likelihood_model(self, action, observation, *args, **kwargs):
 3    if action == 0:
 4        return 1 / 7
 5    elif action == 1:
 6        return 1 / 7 + 0.05
 7    elif action == 2:
 8        return 1 / 7 - 0.05
 9    elif action == 3:
10        return 1 / 7 + 0.1
11    elif action == 4:
12        return 1 / 7 - 0.1
13    elif action == 5:
14        return 1 / 7 + 0.075
15    elif action == 6:
16        return 1 / 7 - 0.075
17    else:
18        raise RuntimeError(
19            "warning, unable to compute likelihood. You may have not covered all cases in the likelihood definition"
20        )
21
22

You can then define your policy and attach the model to it:

1_seed = 123
2se = cat_element(init=1, N=7)
3action_state = State(**{"action": se})
4policy = ELLDiscretePolicy(action_state, seed=_seed)
5# Attach the model
6policy.attach_likelihood_function(likelihood_model)

BIGDiscretePolicy

The Bayesian Information Gain Policy is a reimplementation of BIGPoint introduced by Liu et al 1.

The main ideas/assumptions are:

  • A user wants the task to go to some goal state \(\\Theta\)

  • The assistant can set the task in a number of states (X)

  • The user can perform a given set of action Y

  • A model \(p(Y=y|X=X, \\Theta = \\theta)\) exists for user behavior. This model is exploited by the assistant, and is not necessarily the true model.

After the policy has been defined, make sure to call:

  • attach_set_theta, to specify the potential goal states

  • attach_transition_function, to specify how the task state evolves after an assistant action.

You can find an example implementation in CoopIHC-Zoo’s pointing module. Below are the important steps:

Warning

Example outdated

TASK_SIZE = 30
TARGETS = [1,5,6,19]

action_state = State()
action_state["action"] = StateElement(
   0,
   autospace([i for i in range(TASK_SIZE)]),
   out_of_bounds_mode="error",
)
# Define the user_policy_model that the assistant will use
user_policy_model = XXX

# Define Policy
agent_policy = BIGDiscretePolicy(action_state, user_policy_model)

# Specify the potential Goal states of the user. Here, potential goals are all cases where targets may be the use goal
set_theta = [
         {
             ("user_state", "goal"): StateElement(
                 t,
                 discrete_space(numpy.array(list(range(TASK_SIZE)))),
             )
         }
         for t in TARGETS
     ]
# Attach this set to the policy
self.policy.attach_set_theta(set_theta)

# Define the predicted future observation of the user due to assistant action
def transition_function(assistant_action, observation):
   """What future observation will the user see due to assistant action"""
   # always do this
   observation["assistant_action"]["action"] = assistant_action
   # specific to BIGpointer
   observation["task_state"]["position"] = assistant_action
   return observation

# Attach it to the policy
self.policy.attach_transition_function(transition_function)

LinearFeedback

RLPolicy

The RLPolicy is a wrapper for a neural network trained via Deep Reinforcement Learning. For an example, head over to Using Reinforcement Learning.

WrapAsPolicy

1

Liu, Wanyu, et al. “Bignav: Bayesian information gain for guiding multiscale navigation.” Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems. 2017.