coopihc.policy.BIGDiscretePolicy.BIGDiscretePolicy

class BIGDiscretePolicy(assistant_action_state, user_policy_model, *args, threshold=0.8, **kwargs)[source]

Bases: coopihc.policy.BasePolicy.BasePolicy

BIGDiscretePolicy [summary]

Bayesian Information Gain Policy, adapted from 1.

The main ideas/assumptions are:

  • A user wants the task to go to some goal state \(\Theta\)

  • The assistant can put the task in a number of states (X)

  • The user can perform a given set of action Y

  • A model \(p(Y=y|X=X, \Theta = \theta)\) exists for user behavior

Make sure to call:

  • attach_set_theta, to specify the potential goal states

  • attach_transition_function, to specify how the task state evolves after an assistant action

1

Liu, Wanyu, et al. “Bignav: Bayesian information gain for guiding multiscale navigation.” Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems. 2017.

Parameters
  • assistant_action_state (State<coopihc.base.State.State>) – action state of the assistant

  • user_policy_model (ELLDiscretePolicy<coopihc.policy.ELLDiscretePolicy.ELLDiscretePolicy>`) – user policy model. This may be the real policy of the user, but realistically has to be a model of the user policy. This policy must currently be an ELLDiscretePolicy<coopihc.policy.ELLDiscretePolicy.ELLDiscretePolicy>.

Methods

HY__OoXx

\(H(Y |\Theta = heta, X=x)\)

HY__Xx

\(H(Y |X=x)\)

IG

Information Gain \(\mathrm{IG}(X=x)\)

PYy_Xx

\(P(Y=y|X=x)\)

attach_set_theta

attach_transition_function

default_value

Apply this decorator to use bundle.game_state as default value to observe if game_state = None

find_best_action

Finds expected information gain associated with each possible future cursor position and ranks them in order from the most to less informative.

reset

Reset the policy

sample

Attributes

action

Return the last action.

action_keys

observation

Return the last observation.

parameters

state

unwrapped

HY__OoXx(potential_states, beliefs)[source]

\(H(Y |\Theta = heta, X=x)\)

Computes the conditional entropy \(H(Y |\Theta = heta, X=x) = -\mathbb{E}[\log(p(Y|\Theta = heta, X=x))]\).

Parameters
  • potential_states (iterable) – collection of potential goal states

  • beliefs ((list) beliefs for each target) – (list) beliefs for each target

Returns

\(H(Y |\Theta = heta, X=x)\)

Return type

float

HY__Xx(potential_states, assistant_action, beliefs)[source]

\(H(Y |X=x)\)

Computes the conditional entropy \(H(Y |X=x) = -\mathbb{E}[\log(p(Y|X=x))]\).

Parameters
  • assistant_action (StateElement<coopihc.base.StateElement.StateElement>) – assistant action to be evaluated

  • potential_states (iterable) – collection of potential goal states

  • beliefs ((list) beliefs for each target) – (list) beliefs for each target

Returns

\(H(Y |X=x)\)

Return type

float

IG(assistant_action, observation, beliefs)[source]

Information Gain \(\mathrm{IG}(X=x)\)

Computes the expected information gain \(\mathrm{IG}(X=x) = H(Y |X=x) - H(Y |\Theta = heta, X=x)\) for a potential assistant action x.

Parameters
  • assistant_action (StateElement<coopihc.base.StateElement.StateElement>) – assistant action to be evaluated

  • observation (State<coopihc.base.State.State>) – current assistant observation

  • beliefs ((list) beliefs for each target) – (list) beliefs for each target

Returns

[description]

Return type

[type]

PYy_Xx(user_action, assistant_action, potential_states, beliefs)[source]

\(P(Y=y|X=x)\)

Computes the conditional probability \(P(Y=y|X=x)\), where X is the assistant outcome and Y the user’s response.

Parameters
  • user_action (StateElement<coopihc.base.StateElement.StateElement>) – user action y for which the condition is computed

  • assistant_action (StateElement<coopihc.base.StateElement.StateElement>) – assistant action to be evaluated

  • potential_states (iterable) – collection of potential goal states

  • beliefs ((list) beliefs for each target) – (list) beliefs for each target

Returns

the conditional \(P(Y=y|X=x)\)

Return type

list

property action

Return the last action.

Returns

last action

Return type

State<coopihc.base.StateElement.StateElement>

default_value()

Apply this decorator to use bundle.game_state as default value to observe if game_state = None

find_best_action()[source]

Finds expected information gain associated with each possible future cursor position and ranks them in order from the most to less informative.

Returns

(assistant actions, associated information gain)

Return type

tuple(list, list)

property observation

Return the last observation.

Returns

last observation

Return type

State<coopihc.base.State.State>

reset(random=True)

Reset the policy

Parameters

random (bool, optional) – reset the policy, defaults to True. Here in case of subclassing BasePolicy.