Quick Start

Installation

CoopIHC is currently available on PyPI. You can install the package using pip with the following command:

python3 -m pip install coopihc

You can also build directly from the github repository to get the latest version. To do so, install poetry, and run

poetry install

from within the folder. This will install CoopIHC in editable mode (basically equivalent to python3 -m pip install -e .), together with all its dependencies. You might need to download and install CoopIHC-Zoo as well for this to work.

Interaction Model

CoopIHC builds on a sequential two agent decision-making model. You should read through the model to get a grasp of what each component does.

High-level view of CoopIHC code

At a high level, your CoopIHC code will usually look like this

# Define a task
example_task = ExampleTask()
# Define a user
example_user = ExampleUser()
# Define an assistant
example_assistant = ExampleAssistant()
# Bundle them together
bundle = Bundle(task=example_task, user=example_user, assistant=example_assistant)
# Reset the bundle (i.e. initialize it to a random or prescribed states)
bundle.reset(
    go_to=1
)  # Reset in a state where the user has already produced an observation and made an inference.

# Step through the bundle (i.e. play full rounds)
while 1:
    state, rewards, is_done = bundle.step(user_action=1, assistant_action=None)
    # Do something with the state or the rewards
    if is_done:
        break

You will usually define a task, a user, an assistant, and bundle them together. You can then play several rounds of interaction until the game ends, and based on the collected data, you can do something.

Quick-States

The interaction model uses the concept of states, a collection of useful variables for the system. In CoopIHC you define them via a State object. The states are containers that hold elements called StateElement. A StateElement is a collection of a value and a Space, its associated domain. A State may be nested and contain another State.

In the example below, a super-state is defined using a State. This super-state is itself defined by two substates. Each of those two substates holds a StateElement, defined here via shortcuts such as array_element.

state = State()

substate = State()
substate["x1"] = discrete_array_element(init=1, low=1, high=3)
substate["x3"] = array_element(
    init=1.5 * numpy.ones((2, 2)), low=numpy.ones((2, 2)), high=2 * numpy.ones((2, 2))
)

substate2 = State()
substate2["y1"] = discrete_array_element(init=1, low=1, high=3)

state["sub1"] = substate
state["sub2"] = substate2

:py:class:States<coopihc.base.State> and :py:class:StateElementss<coopihc.base.StateElements> subclass respectively Python’s built-in dictionary and NumPy arrays types. Interacting with these objects should prove relatively familiar to most Python users. To find out more about this and for extra details, go to Space, StateElement and State.

Quick-Tasks

Tasks represent the agent’s environment. Usually in the CoopIHC context, the task will represent the part of an interface that the user can interact with and drive to a certain state.

Essentially, tasks are characterized by:

An internal state called the task state which holds all the task’s information; for example, the state of the interface.
A on_user_action() method, which is a transition function that describes how the task state changes on receiving a user action.
An on_assistant_action() method, which is a transition function that describes how the task state changes based on the assistant action.

As an example, let’s define a simple task where the goal of the user is to drive the substate called ‘x’ to a value of 4. Both the user and the assistant can provide three actions: -1, +0 and +1. We define a task by inheriting from InteractionTask and redefining a few methods.

class ExampleTask(InteractionTask):
    """ExampleTask

    An example algebraic task which a single task state 'x', which finishes when x = 4.

    """

    def __init__(self, *args, **kwargs):

        # Call super().__init__() beofre anything else, which initializes some useful attributes, including a State (self.state) for the task

        super().__init__(*args, **kwargs)

        # Describe the state. Here it is a single item which takes value in [-4, -3, ..., 3, 4]. The StateElement has out_of_bounds_mode = clip, which means that values outside the range will automatically be clipped to fit the space.
        self.state["x"] = discrete_array_element(
            init=0, low=-1, high=4, out_of_bounds_mode="clip"
        )

    def reset(self, dic=None):
        # Always start with state 'x' at 0
        self.state["x"] = 0
        return

    def on_user_action(self, *args, **kwargs):
        # Modify the state in place, adding the user action
        is_done = False
        # self.state["x"] = self.state["x"] + self.user_action
        self.state["x"] += self.user_action

        # Stopping condition, return is_done boolean float
        if self.state["x"] == 4:
            is_done = True

        reward = -1
        return self.state, reward, is_done

    def on_assistant_action(self, *args, **kwargs):
        is_done = False
        # Modify the state in place, adding the assistant action
        self.state["x"] += self.assistant_action
        # Stopping condition, return is_done boolean float
        if self.state["x"] == 4:
            is_done = True

        reward = -1
        return self.state, reward, is_done

Some comments on the code snippet above:

The task state 'x' is defined in the __init__ method. Remember to always call super()’s __init__ before anything else to ensure all necessary variables internal to CoopIHC are set.

The reset method resets the task to an initial state, in this case 'x'=0. You don’t have to define a reset method, in which case it will inherit it from :py:class:InteractionTask<coopihc.interactiontask.InteractionTask>, and the reset method will randomly pick values for each state.

You have to define a user and assistant step function otherwise an error will be raised. Both of these are expected to return the triple (task state, reward, is_done).

A render method is available if you want to render the task online, see :py:class:InteractionTask<coopihc.interactiontask.InteractionTask>

Quick-Agents

Agents are defined by four components:

An internal state, which essentially gives memory to the agent;
An observation engine, which generates observations from the game state, giving the agent the ability to perceive;
An inference engine, with which the agent modifies its internal state, giving it the ability to learn;
A policy, used to take actions, giving the agent the ability to make decisions.

Figure made with TikZ

Fig. 1 Agent structure

You define a new agent by subclassing the BaseAgent class. As an example, we now create an agent which goes with the ExampleTask that we defined in Tasks. We make an agent with a 'goal' state to indicate the value for 'x' that it wants to achieve, and make its available actions \([-1,0,1]\). These actions are chosen via the ExamplePolicy (see Policies).

class ExampleUser(BaseAgent):
    """An Example of a User.

    An agent that handles the ExamplePolicy, has a single 1d state, and has the default observation and inference engines.
    See the documentation of the :py:mod:`BaseAgent <coopihc.agents.BaseAgent.BaseAgent>` class for more details.

    :meta public:
    """

    def __init__(self, *args, **kwargs):

        # Define an internal state with a 'goal' substate
        state = State()
        state["goal"] = discrete_array_element(init=4, low=-4, high=4)

        # Define policy
        action_state = State()
        action_state["action"] = discrete_array_element(init=0, low=-1, high=1)
        agent_policy = ExamplePolicy(action_state=action_state)

        # Use default observation and inference engines
        observation_engine = None
        inference_engine = None

        super().__init__(
            "user",
            *args,
            agent_policy=agent_policy,
            agent_observation_engine=observation_engine,
            agent_inference_engine=inference_engine,
            agent_state=state,
            **kwargs
        )

    def reset(self, dic=None):
        """reset

        Override default behaviour of BaseAgent which would randomly sample new goal values on each reset. Here for purpose of demonstration we impose a goal = 4

        :meta public:
        """
        self.state["goal"] = 4

Note

All 4 components default to their corresponding base implementation if not provided.

You can verify that the user model works as intended, by bundling it with the task. Since we haven’t provided an assistant yet, we slightly change the task very, redefining its on_assistant_action().

class ExampleTaskWithoutAssistant(ExampleTask):
    def on_assistant_action(self, *args, **kwargs):
        return self.state, 0, False


example_task = ExampleTaskWithoutAssistant()
example_user = ExampleUser()
bundle = Bundle(task=example_task, user=example_user)
bundle.reset(go_to = 1)

while True:
    state, rewards, is_done = bundle.step()
    if is_done:
        break

Quick-Policies

To define a policy, simply subclass :py:class:BasePolicy<coopihc.policy.BasePolicy> and redefine its sample() method. Below, we show how ExamplePolicy has been defined.

class ExamplePolicy(BasePolicy):
    """ExamplePolicy

    A simple policy which assumes that the agent using it has a 'goal' state and that the task has an 'x' state. x is compared to the goal and appropriate action is taken to make sure x reaches the goal.


    """

    def __init____init__(self, *args, action_state=None, **kwargs):
        super().__init__(*args, action_state=None, **kwargs)

    @BasePolicy.default_value
    def sample(self, agent_observation=None, agent_state=None):
        """sample

        Compares 'x' to goal and issues +-1 accordingly.

        :return: action, reward
        :rtype: tuple(`StateElement<coopihc.base.StateElement.StateElement>`, float)
        """

        if (
            agent_observation["task_state"]["x"]
            < agent_observation["user_state"]["goal"]
        ):
            _action_value = 1
        elif (
            agent_observation["task_state"]["x"]
            > agent_observation["user_state"]["goal"]
        ):
            _action_value = -1
        else:
            _action_value = 0

        return _action_value, 0

Note

Don’t forget to return a reward with the action.

Note

You can virtually put anything inside this function: that includes the output of a neural network, of a complex simulation process, and even the output of another bundle (see Modularity for an example.)

Quick-Observation Engines

States are rarely perfectly observable by agents:

the other agent’s internal states are usually unknown,

the task’s state may be partially observable; for example, a human observer may produce noisy observations,

an agent’s own internal state may be partially observable; for example, a human observer might have poor retention capabilities.

Furthermore, there might be a cost associated with making observations:

There can be a tradeoff between the time needed to produce an observation and its quality; for example, precise observations may be costly (in terms of time).

A human observer may enjoy making observations which are very different (according to some criterion) to the previous ones, in which case it would be rewarded for differing observations, satisfying its curiosity.

CoopIHC provides a generic object called an observation engine which specifies how an observation is created from the game state. To create a new observation engine, you can base it off an existing observation engine or subclass the BaseObservationEngine.

To create a new engine by subclassing the BaseObservationEngine class, you have to redefine the observe() method. You can virtually put anything inside this function: that includes the output of a neural network, of a complex simulation process, and even the output of another bundle (see Modularity for an example). Below, we show a basic example where we define an engine that only looks at a particular substate.

class ExampleObservationEngine(BaseObservationEngine):
    """ExampleObservationEngine

    A simple example where the engine is only able to see a particular state (observable_state).

    :param observable_state: only state that can be observed
    :type observable_state: string
    """

    def __init__(self, observable_state, *args, **kwargs):
        super().__init__(*args, **kwargs)
        self.observable_state = observable_state

    # @BaseObservationEngine.get_params
    @BaseObservationEngine.default_value
    def observe(self, game_state=None):
        """observe

        Select only state observable_state.

        :param game_state: game state
        :type game_state: `State<coopihc.base.State.State`
        :return: (observation, obs reward)
        :rtype: tuple(`State<coopihc.base.State.State`, float)
        """
        return (
            State(**{self.observable_state: game_state[self.observable_state]}),
            0,
        )

Don’t forget to return a reward with the observation. The effect of this engine can be tested by plugging in a simple State:

obs_engine = ExampleObservationEngine("substate1")
# Game state before observation
# >>> print(S)
# ----------  ----------  -  ---------
# substate1   substate_x  0  Numeric()
#             substate_y  2  Numeric()
# substate_2  substate_a  0  CatSet(3)
# ----------  ----------  -  ---------

print(obs_engine.observe(game_state=S)[0])
# Produced Observation
# >>> print(obs_engine.observe(S)[0])
# ---------  ----------  -  ----------
# substate1  substate_x  0  Cont(1, 1)
#            substate_y  2  Discr(3)
# ---------  ----------  -  ----------

Note

The signature observe(self, game_state=None) is expected. When called with game_state = None, the engine will fetch the agent’s observation. If the game state is actually passed, it will use the input state as basis to produce the observation. This is useful, for example, when testing your engine and you want to control the input.

Quick-Inference Engines

Agents should be able to learn (infer parameters from observed data) and to adapt (change policy parameters based on observed data). As in observation engines, there might be a cost associated with making inferences:

Making an inference can be time costly.

Inferring may be rewarding; for example, because it is enjoyable.

CoopIHC provides a generic object called inference engines to update internal states from observations. Although the name might suggest otherwise, these engines may use (any) other mechanisms than statistical inference that update the internal state. To create a new inference engine, you can base it off an existing engine or subclass the BaseInferenceEngine.

Essentially, the BaseInferenceEngine provides a simple first-in-first-out (FIFO) buffer that stores observations. When subclassing BaseInferenceEngine, you simply have to redefine the infer method (by default, no inference is produced). An example is provided below, where the engine stores the last 5 observations.

class ExampleInferenceEngine(BaseInferenceEngine):
    """ExampleInferenceEngine

    Example class

    """

    def __init__(self, *args, **kwargs):
        super().__init__(*args, **kwargs)

    def infer(self, agent_observation=None):
        """infer

        Do nothing. Same behavior as parent ``BaseInferenceEngine``

        :return: (new internal state, reward)
        :rtype: tuple(:py:class:`State<coopihc.base.State.State>`, float)
        """
        if agent_observation is None:
            agent_state = self.state

        reward = 0
        # Do something
        # agent_state = ..
        # reward = ...

        return agent_state, reward


ExampleInferenceEngine(buffer_depth=5)

Quick-Bundles

Bundles are the objects that compose the three main components (task, user and assistant) into a game. It forms the joint state, collects the rewards and ensure synchronous sequential sequences of observations, inferences and actions of the two agents.

They are useful because they allow you to orchestrate the interaction how you want it.

In most cases, there is no need to define a new Bundle, and you can straightaway use the standard existing Bundle. For example, you can create a bundle and interact with it like so:

class ExampleTaskWithoutAssistant(ExampleTask):
    def on_assistant_action(self, *args, **kwargs):
        return self.state, 0, False


example_task = ExampleTaskWithoutAssistant()
example_user = ExampleUser()
bundle = Bundle(task=example_task, user=example_user)
bundle.reset(go_to = 1)

while True:
    state, rewards, is_done = bundle.step()
    if is_done:
        break

Note

Bundles also handle joint rendering as well as other practical things. More details can be found on Bundle’s reference page.

An overview of CoopIHC

Several implementations of user models, tasks and assistants exist in CoopIHC’s repository CoopIHC-Zoo
Several worked-out examples are given in this documentation. Those should give you a good idea about what can be done with CoopIHC.