.. policy:

Policies
========================
In *CoopIHC*, it is assumed that agents performed decisions (take actions) based on their observations as well as their internal state. There might also be a cost associated with selecting an action, for example, a physical cost associated with moving an item, or benefit, for example if having a large diversity of actions is rewarding. 


*CoopIHC* provides a generic object called a policy wich specifies how actions are taken by the agent. To specify a new policy, you will usually subclass the ``BasePolicy``, although several predefined policies also exist.

Subclassing BasePolicy
-------------------------

.. start-quickstart-policy

To define a policy, simply subclass `:py:class:BasePolicy<coopihc.policy.BasePolicy>` and redefine its ``sample()`` method. Below, we show how ``ExamplePolicy`` has been defined.

.. literalinclude:: ../../coopihc/policy/ExamplePolicy.py
    :linenos:
    :pyobject: ExamplePolicy


.. note::

    Don't forget to return a reward with the action.

.. note::

    You can virtually put anything inside this function: that includes the output of a neural network, of a complex simulation process, and even the output of another bundle (see :doc:`modularity` for an example.)

.. end-quickstart-policy

Other than that, there are a few predefined policies which you may find useful.


Explicit Likelihood Discrete (ELLD) Policy
--------------------------------------------
Explicit Likelihood Discrete (ELLD) Policy is used in cases where the agent model is straightforward enough to be specified by an analytical model.

Below, we define a simple probabilistic model which assigns different probabilities to each possible discrete action.

.. literalinclude:: ../../coopihc/examples/basic_examples/policy_examples.py
   :language: python
   :linenos:
   :start-after: [start-elld-def-model]
   :end-before: [end-elld-def-model]


You can then define your policy and attach the model to it:

.. literalinclude:: ../../coopihc/examples/basic_examples/policy_examples.py
   :language: python
   :linenos:
   :start-after: [start-elld-attach]
   :end-before: [end-elld-attach]


BIGDiscretePolicy
--------------------
The Bayesian Information Gain Policy is a reimplementation of BIGPoint introduced by Liu et al [1]_.

The main ideas/assumptions are:

   * A user wants the task to go to some goal state :math:`\\Theta`
   * The assistant can set the task in a number of states (X)
   * The user can perform a given set of action Y
   * A model :math:`p(Y=y|X=X, \\Theta = \\theta)` exists for user behavior. This model is exploited by the assistant, and is not necessarily the true model.

After the policy has been defined, make sure to call:

   * ``attach_set_theta``, to specify the potential goal states
   * ``attach_transition_function``, to specify how the task state evolves after an assistant action.


You can find an example implementation in CoopIHC-Zoo's pointing module. Below are the important steps:

.. warning::

   Example outdated

.. code-block:: python

   TASK_SIZE = 30
   TARGETS = [1,5,6,19]

   action_state = State()
   action_state["action"] = StateElement(
      0,
      autospace([i for i in range(TASK_SIZE)]),
      out_of_bounds_mode="error",
   )
   # Define the user_policy_model that the assistant will use
   user_policy_model = XXX

   # Define Policy
   agent_policy = BIGDiscretePolicy(action_state, user_policy_model)

   # Specify the potential Goal states of the user. Here, potential goals are all cases where targets may be the use goal
   set_theta = [
            {
                ("user_state", "goal"): StateElement(
                    t,
                    discrete_space(numpy.array(list(range(TASK_SIZE)))),
                )
            }
            for t in TARGETS
        ]
   # Attach this set to the policy
   self.policy.attach_set_theta(set_theta)

   # Define the predicted future observation of the user due to assistant action
   def transition_function(assistant_action, observation):
      """What future observation will the user see due to assistant action"""
      # always do this
      observation["assistant_action"]["action"] = assistant_action
      # specific to BIGpointer
      observation["task_state"]["position"] = assistant_action
      return observation

   # Attach it to the policy
   self.policy.attach_transition_function(transition_function)


LinearFeedback
-----------------

RLPolicy
---------

The RLPolicy is a wrapper for a neural network trained via Deep Reinforcement Learning. For an example, head over to :ref:`Using Reinforcement Learning`.


WrapAsPolicy
-------------


.. [1] Liu, Wanyu, et al. "Bignav: Bayesian information gain for guiding multiscale navigation." Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems. 2017.