coopihc.policy.BIGDiscretePolicy.BIGDiscretePolicy

class BIGDiscretePolicy(assistant_action_state, user_policy_model, *args, threshold=0.8, **kwargs)[source]

Bases: coopihc.policy.BasePolicy.BasePolicy

BIGDiscretePolicy [summary]

Bayesian Information Gain Policy, adapted from 1.

The main ideas/assumptions are:

A user wants the task to go to some goal state \(\Theta\)

The assistant can put the task in a number of states (X)

The user can perform a given set of action Y

A model \(p(Y=y|X=X, \Theta = \theta)\) exists for user behavior

Make sure to call:

attach_set_theta, to specify the potential goal states

attach_transition_function, to specify how the task state evolves after an assistant action

1: Liu, Wanyu, et al. “Bignav: Bayesian information gain for guiding multiscale navigation.” Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems. 2017.

Parameters

assistant_action_state (State<coopihc.base.State.State>) – action state of the assistant
user_policy_model (ELLDiscretePolicy<coopihc.policy.ELLDiscretePolicy.ELLDiscretePolicy>`) – user policy model. This may be the real policy of the user, but realistically has to be a model of the user policy. This policy must currently be an ELLDiscretePolicy<coopihc.policy.ELLDiscretePolicy.ELLDiscretePolicy>.

Methods

`HY__OoXx`	\(H(Y \|\Theta = heta, X=x)\)
`HY__Xx`	\(H(Y \|X=x)\)
`IG`	Information Gain \(\mathrm{IG}(X=x)\)
`PYy_Xx`	\(P(Y=y\|X=x)\)
`attach_set_theta`
`attach_transition_function`
`default_value`	Apply this decorator to use bundle.game_state as default value to observe if game_state = None
`find_best_action`	Finds expected information gain associated with each possible future cursor position and ranks them in order from the most to less informative.
`reset`	Reset the policy
`sample`

Attributes

`action`	Return the last action.
`action_keys`
`observation`	Return the last observation.
`parameters`
`state`
`unwrapped`

HY__OoXx(potential_states, beliefs)[source]

\(H(Y |\Theta = heta, X=x)\)

Computes the conditional entropy \(H(Y |\Theta = heta, X=x) = -\mathbb{E}[\log(p(Y|\Theta = heta, X=x))]\).

Parameters

potential_states (iterable) – collection of potential goal states
beliefs ((list) beliefs for each target) – (list) beliefs for each target

Returns

\(H(Y |\Theta = heta, X=x)\)

Return type

float

HY__Xx(potential_states, assistant_action, beliefs)[source]

\(H(Y |X=x)\)

Computes the conditional entropy \(H(Y |X=x) = -\mathbb{E}[\log(p(Y|X=x))]\).

Parameters

assistant_action (StateElement<coopihc.base.StateElement.StateElement>) – assistant action to be evaluated
potential_states (iterable) – collection of potential goal states
beliefs ((list) beliefs for each target) – (list) beliefs for each target

Returns

\(H(Y |X=x)\)

Return type

float

IG(assistant_action, observation, beliefs)[source]

Information Gain \(\mathrm{IG}(X=x)\)

Computes the expected information gain \(\mathrm{IG}(X=x) = H(Y |X=x) - H(Y |\Theta = heta, X=x)\) for a potential assistant action x.

Parameters

assistant_action (StateElement<coopihc.base.StateElement.StateElement>) – assistant action to be evaluated
observation (State<coopihc.base.State.State>) – current assistant observation
beliefs ((list) beliefs for each target) – (list) beliefs for each target

Returns

[description]

Return type

[type]

PYy_Xx(user_action, assistant_action, potential_states, beliefs)[source]

\(P(Y=y|X=x)\)

Computes the conditional probability \(P(Y=y|X=x)\), where X is the assistant outcome and Y the user’s response.

Parameters

user_action (StateElement<coopihc.base.StateElement.StateElement>) – user action y for which the condition is computed
assistant_action (StateElement<coopihc.base.StateElement.StateElement>) – assistant action to be evaluated
potential_states (iterable) – collection of potential goal states
beliefs ((list) beliefs for each target) – (list) beliefs for each target

Returns

the conditional \(P(Y=y|X=x)\)

Return type

list

property action

Return the last action.

Returns: last action
Return type: State<coopihc.base.StateElement.StateElement>

default_value(): Apply this decorator to use bundle.game_state as default value to observe if game_state = None

find_best_action()[source]

Finds expected information gain associated with each possible future cursor position and ranks them in order from the most to less informative.

Returns: (assistant actions, associated information gain)
Return type: tuple(list, list)

property observation

Return the last observation.

Returns: last observation
Return type: State<coopihc.base.State.State>

reset(random=True)

Reset the policy

Parameters: random (bool, optional) – reset the policy, defaults to True. Here in case of subclassing BasePolicy.