More Complex Example

In this example, we are going to build our first representation of an interactive system: we use a target-selection task, where the user’s objective is to select a particular target known as the ‘goal’, by positioning a cursor on top of the goal. The assistant can help the user, positioning the cursor anywhere inside a grid.

Task

We first build a simple 1D gridworld (below is what it should look like)

The user goal is the green ‘G’, the current cursor position is the blue ‘P’ and the other targets are the purple ‘T’s. The cursor can go anywhere within this space.

To define a new task, we subclass InteractionTask and override the needed methods.

class SimplePointingTask(InteractionTask):
    """A 1D pointing task.

    A 1D grid of size 'Gridsize'. The cursor is at a certain 'position' and there are several potential 'targets' on the grid. The user action is modulated by the assistant.

    The task features two modes, 'position' and 'gain'. In position, the assistant directly specifies the position of the cursor. In gain mode, the cursor position is modulated by the assistant: position[k+1] = position[k] + user action * assistant_action

    """

    def __init__(self, gridsize=31, number_of_targets=10, mode="gain"):
        super().__init__()
        self.gridsize = gridsize
        self.number_of_targets = number_of_targets
        self.mode = mode
        self.dim = 1

        self.state["position"] = discrete_array_element(
            low=0, high=gridsize - 1, out_of_bounds_mode="clip"
        )  # position anywhere inside the grid

        self.state["targets"] = discrete_array_element(
            low=0, high=gridsize - 1, shape=(number_of_targets,)
        )  # targets anywere inside the grid

    def reset(self, dic=None):
        """Reset the task.

        Reset the grid used for rendering, define new targets, select a starting position

        """

        # select random targets
        targets = sorted(
            numpy.random.choice(
                list(range(self.gridsize)),
                size=self.number_of_targets,
                replace=False,
            )
        )

        # Define starting position, making sure not to start on a target
        copy = list(range(self.gridsize))
        for i in targets:
            copy.remove(i)
        position = int(numpy.random.choice(copy))

        # actually assign position and targets to task
        self.state["position"] = position
        self.state["targets"] = targets

        # For text render
        self.grid = [" " for i in range(self.gridsize)]
        for i in targets:
            self.grid[i] = "T"

    def on_user_action(self, *args, user_action=None, **kwargs):
        """Do nothing, increment turns, return half a timestep

        :meta public:
        """
        # Finish task if cursor on top of the goal
        is_done = False
        if (
            self.state["position"] == self.bundle.user.state["goal"]
            and user_action == 0
        ):
            is_done = True

        return self.state, -1, is_done

    def on_assistant_action(self, *args, assistant_action=None, **kwargs):
        """Modulate the user's action.

        Multiply the user action with the assistant action.
        Update the position and grids.

        :param assistant_action: (list)

        :return: new state (OrderedDict), half a time step, is_done (True/False)

        :meta public:
        """
        is_done = False

        # Stopping condition if too many turns
        if self.round_number >= 50:
            return self.state, 0, True

        # two modes
        if self.mode == "position":
            self.state["position"] = self.assistant_action
        elif self.mode == "gain":

Synthetic User Model

We now define a user model, that we call CarefulPointer. To do so, we describe the 4 components of the CoopIHC BaseAgent: state, observation and inference engines, and policy.

class CarefulPointer(BaseAgent):
    """A user that only indicates the right direction, with a fixed amplitude, and with some error rate.

    .. warning ::

        This agent only works with a task that has a 'targets' substate.


    * Reset: At each reset, it selects a new goal from the possible 'targets'.
    * Inference: None
    * State: None
    * Policy: When sampled, the user will issue an action that is either +1 or -1 in the direction of the target.
    * Observation: The user observes everything perfectly except for the assistant state.



    :param error_rate: rate at which users makes errors, defaults to 0.05
    :type error_rate: float, optional
    """

    def __init__(self, *args, error_rate=0.05, **kwargs):

        self._targets = None

        # Define action state of the agent ([-1,0,1])
        action_state = State()
        action_state["action"] = discrete_array_element(low=-1, high=1)

        # Define Policy. Here we use a policy explicitly defined by a likelihood model (of the form p(action|observation) = x for each action). To do so, we use a ELLDiscretePolicy, and simply define the likelihood model (compute_likelihood). See ELLDiscretePolicy docs for more information
        ELLD_dic = {"compute_likelihood_args": {"error_rate": error_rate}}
        ELLD_dic.update(kwargs.get("policy_kwargs", {}))

        agent_policy = ELLDiscretePolicy(
            action_state=action_state,
            **ELLD_dic,
        )

        def compute_likelihood(self, action, observation, *args, **kwargs):
            error_rate = kwargs.get("error_rate", 0)
            # convert actions and observations
            goal = observation["user_state"]["goal"]
            position = observation["task_state"]["position"]
            # Write down all possible cases (5)
            # (1) Goal to the right, positive action
            if goal > position and action > 0:
                return 1 - error_rate
            # (2) Goal to the right, negative action
            elif goal > position and action < 0:
                return error_rate
            # (3) Goal to the left, positive action
            if goal < position and action > 0:
                return error_rate
            # (4) Goal to the left, negative action
            elif goal < position and action < 0:
                return 1 - error_rate
            elif goal == position and action == 0:
                return 1
            elif goal == position and action != 0:
                return 0
            elif goal != position and action == 0:
                return 0
            else:
                raise RuntimeError(
                    "warning, unable to compute likelihood. You may have not covered all cases in the likelihood definition"
                )

            # Attach likelihood function to the policy

        agent_policy.attach_likelihood_function(compute_likelihood)

        # ---------- Observation engine ------------
        # Here, we use an engine that sees everything except the assistant state. See RuleObservationEngine documentation
        observation_engine = RuleObservationEngine(
            deterministic_specification=base_user_engine_specification,
        )

        # ---------- Calling BaseAgent class -----------
        # Always call super().__init__() to attach all components to the agent.

        super().__init__(
            "user",
            *args,
            agent_policy=agent_policy,
            agent_observation_engine=observation_engine,
            **kwargs,
        )

    def finit(self):
        # Select the goal. This has to be done during finit, otherwise, the task will not have the right targets yet.
        self.state["goal"] = discrete_array_element(
            low=0, high=(self.bundle.task.gridsize - 1)
        )

    @property
    def targets(self):
        # shortcut
        return self.bundle.task.state["targets"]

    def reset(self, dic=None):
        # select a random target to be the goal
        index = numpy.random.randint(0, self.targets.size)
        self.state["goal"] = self.targets[index]

Notice that the code re-uses two existing classes: ELLDiscretePolicy and RuleObservationEngine. The hope is that in many cases, you can re-use existing CoopIHC objects rather than code everything from scratch.

Assistant

We are going to couple this operator with an intelligent assistant which leverages Bayesian Information Gain (BIG) [Liu2017]. This assistant follows two mechanisms:

It holds a belief vector, that assigns each target with a probability (namely the probability that that particular target is the user goal). This belief is maintained by a particular inference engine called GoalInferenceWithUserPolicyGiven, which as the name suggests, is capable of updating the beliefs associated with each target by leveraging a user model.

It maintains a policy, that at each step, puts the cursor in a position that is going to be maximally informative for the assistant. This policy is implemented as a BIGDiscretePolicy.

class BIGGain(BaseAgent):
    def __init__(self):

        super().__init__(
            "assistant", agent_inference_engine=GoalInferenceWithUserPolicyGiven()  #
        )

    def finit(self):
        del self.policy.action_state["action"]
        self.policy.action_state["action"] = discrete_array_element(
            init=0,
            low=0,
            high=self.bundle.task.gridsize - 1,
            out_of_bounds_mode="error",
        )

        user_policy_model = copy.deepcopy(self.bundle.user.policy)
        agent_policy = BIGDiscretePolicy(self.policy.action_state, user_policy_model)
        self._attach_policy(agent_policy)
        self.inference_engine._attach_policy(user_policy_model)

        self.state["beliefs"] = array_element(
            init=1 / self.bundle.task.number_of_targets,
            low=numpy.zeros((self.bundle.task.number_of_targets,)),
            high=numpy.ones((self.bundle.task.number_of_targets,)),
            out_of_bounds_mode="error",
        )

    def reset(self, dic=None):
        self.state["beliefs"] = numpy.array(
            [
                1 / self.bundle.task.number_of_targets
                for i in range(self.bundle.task.number_of_targets)
            ]
        )

        # change theta for inference engine
        set_theta = [
            {
                ("user_state", "goal"): discrete_array_element(
                    init=t, low=0, high=self.bundle.task.gridsize
                )
            }
            for t in self.bundle.task.state["targets"]
        ]

        self.inference_engine.attach_set_theta(set_theta)
        self.policy.attach_set_theta(set_theta)

        def transition_function(assistant_action, observation):
            """What future observation will the user see due to assistant action"""
            # always do this
            observation["assistant_action"]["action"] = assistant_action
            # specific to BIGpointer
            observation["task_state"]["position"] = assistant_action

            return observation

        self.policy.attach_transition_function(transition_function)

    def render(self, mode="text", ax_user=None, ax_assistant=None, ax_task=None):
        try:
            self.inference_engine.render(mode="text", ax_assistant=ax_assistant)
        except ValueError:
            self.inference_engine.render(mode=mode)

Bundle

Now that all components are ready, we can bundle them together to evaluate this combination of user model and assistant.

from coopihc import SimplePointingTask, CarefulPointer, BIGGain
from coopihc import Bundle

import matplotlib.pyplot as plt

task = SimplePointingTask(gridsize=31, number_of_targets=8, mode="position")
binary_user = CarefulPointer(error_rate=0.05)
BIGpointer = BIGGain()

bundle = Bundle(task=task, user=binary_user, assistant=BIGpointer)
game_state = bundle.reset()
# bundle.render("plot")
# plt.tight_layout()
k = 0
# plt.savefig("/home/juliengori/Pictures/img_tmp/biggain_{}.png".format(k))
while True:
    game_state, rewards, is_done = bundle.step(user_action=None, assistant_action=None)
    # bundle.render("plot")
    k += 1
    # plt.savefig("/home/juliengori/Pictures/img_tmp/biggain_{}.png".format(k))
    if is_done:
        bundle.close()
        break

This assistant has very good performance (it should, we have given it the true user model, which is also really simple)

The figures below show a run, which finished in 3 steps with the task state as well as the assistant beliefs rendered.

What next

The example that we have just seen is what you would expect from an early prototype. Several extensions and enhancements could follow:

You could use a more complex user model to pair with the assistant. For example, a visual search model could determine how the cursor is located after a ‘jump’, penalizing frequent and high amplitude jumps. A motor control model could determine how the cursor moves (e.g. fast initially, and much slower towards the end. To see such examples, head over to Modularity.

Alternatively, you could learn the user behavior for a given assistant policy, e.g. via Deep Reinforcement Learning. See Using Reinforcement Learning for an example.

You could tune the BIGGain assistant to account for the extra cost associated with jumps in the cursor.

You could look at the effect of model mismatch between the model handled by the BIGGain assistant and the synthetic user model

You could pair your assistant with a real user to evaluate its short term performance. See Interfacing CoopIHC with a real user for an example.

You could jointly train the user model and the assistant to simulate co-adaptation between a user and a tool.

Liu2017: Liu, Wanyu, et al. “Bignav: Bayesian information gain for guiding multiscale navigation.” Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems. 2017.