Policies

In CoopIHC, it is assumed that agents performed decisions (take actions) based on their observations as well as their internal state. There might also be a cost associated with selecting an action, for example, a physical cost associated with moving an item, or benefit, for example if having a large diversity of actions is rewarding.

CoopIHC provides a generic object called a policy wich specifies how actions are taken by the agent. To specify a new policy, you will usually subclass the BasePolicy, although several predefined policies also exist.

Subclassing BasePolicy

To define a policy, simply subclass :py:class:BasePolicy<coopihc.policy.BasePolicy> and redefine its sample() method. Below, we show how ExamplePolicy has been defined.

class ExamplePolicy(BasePolicy):
    """ExamplePolicy

    A simple policy which assumes that the agent using it has a 'goal' state and that the task has an 'x' state. x is compared to the goal and appropriate action is taken to make sure x reaches the goal.


    """

    def __init____init__(self, *args, action_state=None, **kwargs):
        super().__init__(*args, action_state=None, **kwargs)

    @BasePolicy.default_value
    def sample(self, agent_observation=None, agent_state=None):
        """sample

        Compares 'x' to goal and issues +-1 accordingly.

        :return: action, reward
        :rtype: tuple(`StateElement<coopihc.base.StateElement.StateElement>`, float)
        """

        if (
            agent_observation["task_state"]["x"]
            < agent_observation["user_state"]["goal"]
        ):
            _action_value = 1
        elif (
            agent_observation["task_state"]["x"]
            > agent_observation["user_state"]["goal"]
        ):
            _action_value = -1
        else:
            _action_value = 0

        return _action_value, 0

Note

Don’t forget to return a reward with the action.

Note

You can virtually put anything inside this function: that includes the output of a neural network, of a complex simulation process, and even the output of another bundle (see Modularity for an example.)

Other than that, there are a few predefined policies which you may find useful.

Explicit Likelihood Discrete (ELLD) Policy

Explicit Likelihood Discrete (ELLD) Policy is used in cases where the agent model is straightforward enough to be specified by an analytical model.

Below, we define a simple probabilistic model which assigns different probabilities to each possible discrete action.

# Define the likelihood model
def likelihood_model(self, action, observation, *args, **kwargs):
    if action == 0:
        return 1 / 7
    elif action == 1:
        return 1 / 7 + 0.05
    elif action == 2:
        return 1 / 7 - 0.05
    elif action == 3:
        return 1 / 7 + 0.1
    elif action == 4:
        return 1 / 7 - 0.1
    elif action == 5:
        return 1 / 7 + 0.075
    elif action == 6:
        return 1 / 7 - 0.075
    else:
        raise RuntimeError(
            "warning, unable to compute likelihood. You may have not covered all cases in the likelihood definition"
        )

You can then define your policy and attach the model to it:

_seed = 123
se = cat_element(init=1, N=7)
action_state = State(**{"action": se})
policy = ELLDiscretePolicy(action_state, seed=_seed)
# Attach the model
policy.attach_likelihood_function(likelihood_model)

BIGDiscretePolicy

The Bayesian Information Gain Policy is a reimplementation of BIGPoint introduced by Liu et al 1.

The main ideas/assumptions are:

A user wants the task to go to some goal state \(\\Theta\)

The assistant can set the task in a number of states (X)

The user can perform a given set of action Y

A model \(p(Y=y|X=X, \\Theta = \\theta)\) exists for user behavior. This model is exploited by the assistant, and is not necessarily the true model.

After the policy has been defined, make sure to call:

attach_set_theta, to specify the potential goal states

attach_transition_function, to specify how the task state evolves after an assistant action.

You can find an example implementation in CoopIHC-Zoo’s pointing module. Below are the important steps:

Warning

Example outdated

TASK_SIZE = 30
TARGETS = [1,5,6,19]

action_state = State()
action_state["action"] = StateElement(
   0,
   autospace([i for i in range(TASK_SIZE)]),
   out_of_bounds_mode="error",
)
# Define the user_policy_model that the assistant will use
user_policy_model = XXX

# Define Policy
agent_policy = BIGDiscretePolicy(action_state, user_policy_model)

# Specify the potential Goal states of the user. Here, potential goals are all cases where targets may be the use goal
set_theta = [
         {
             ("user_state", "goal"): StateElement(
                 t,
                 discrete_space(numpy.array(list(range(TASK_SIZE)))),
             )
         }
         for t in TARGETS
     ]
# Attach this set to the policy
self.policy.attach_set_theta(set_theta)

# Define the predicted future observation of the user due to assistant action
def transition_function(assistant_action, observation):
   """What future observation will the user see due to assistant action"""
   # always do this
   observation["assistant_action"]["action"] = assistant_action
   # specific to BIGpointer
   observation["task_state"]["position"] = assistant_action
   return observation

# Attach it to the policy
self.policy.attach_transition_function(transition_function)

LinearFeedback

RLPolicy

The RLPolicy is a wrapper for a neural network trained via Deep Reinforcement Learning. For an example, head over to Using Reinforcement Learning.

WrapAsPolicy

1: Liu, Wanyu, et al. “Bignav: Bayesian information gain for guiding multiscale navigation.” Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems. 2017.