coopihc.policy.BIGDiscretePolicy.BIGDiscretePolicy
- class BIGDiscretePolicy(assistant_action_state, user_policy_model, *args, threshold=0.8, **kwargs)[source]
Bases:
coopihc.policy.BasePolicy.BasePolicy
BIGDiscretePolicy [summary]
Bayesian Information Gain Policy, adapted from 1.
The main ideas/assumptions are:
A user wants the task to go to some goal state \(\Theta\)
The assistant can put the task in a number of states (X)
The user can perform a given set of action Y
A model \(p(Y=y|X=X, \Theta = \theta)\) exists for user behavior
Make sure to call:
attach_set_theta, to specify the potential goal states
attach_transition_function, to specify how the task state evolves after an assistant action
- 1
Liu, Wanyu, et al. “Bignav: Bayesian information gain for guiding multiscale navigation.” Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems. 2017.
- Parameters
assistant_action_state (State<coopihc.base.State.State>) – action state of the assistant
user_policy_model (ELLDiscretePolicy<coopihc.policy.ELLDiscretePolicy.ELLDiscretePolicy>`) – user policy model. This may be the real policy of the user, but realistically has to be a model of the user policy. This policy must currently be an ELLDiscretePolicy<coopihc.policy.ELLDiscretePolicy.ELLDiscretePolicy>.
Methods
\(H(Y |\Theta = heta, X=x)\)
\(H(Y |X=x)\)
Information Gain \(\mathrm{IG}(X=x)\)
\(P(Y=y|X=x)\)
attach_set_theta
attach_transition_function
Apply this decorator to use bundle.game_state as default value to observe if game_state = None
Finds expected information gain associated with each possible future cursor position and ranks them in order from the most to less informative.
Reset the policy
sample
Attributes
Return the last action.
action_keys
Return the last observation.
parameters
state
unwrapped
- HY__OoXx(potential_states, beliefs)[source]
\(H(Y |\Theta = heta, X=x)\)
Computes the conditional entropy \(H(Y |\Theta = heta, X=x) = -\mathbb{E}[\log(p(Y|\Theta = heta, X=x))]\).
- Parameters
potential_states (iterable) – collection of potential goal states
beliefs ((list) beliefs for each target) – (list) beliefs for each target
- Returns
\(H(Y |\Theta = heta, X=x)\)
- Return type
float
- HY__Xx(potential_states, assistant_action, beliefs)[source]
\(H(Y |X=x)\)
Computes the conditional entropy \(H(Y |X=x) = -\mathbb{E}[\log(p(Y|X=x))]\).
- Parameters
assistant_action (StateElement<coopihc.base.StateElement.StateElement>) – assistant action to be evaluated
potential_states (iterable) – collection of potential goal states
beliefs ((list) beliefs for each target) – (list) beliefs for each target
- Returns
\(H(Y |X=x)\)
- Return type
float
- IG(assistant_action, observation, beliefs)[source]
Information Gain \(\mathrm{IG}(X=x)\)
Computes the expected information gain \(\mathrm{IG}(X=x) = H(Y |X=x) - H(Y |\Theta = heta, X=x)\) for a potential assistant action x.
- Parameters
assistant_action (StateElement<coopihc.base.StateElement.StateElement>) – assistant action to be evaluated
observation (State<coopihc.base.State.State>) – current assistant observation
beliefs ((list) beliefs for each target) – (list) beliefs for each target
- Returns
[description]
- Return type
[type]
- PYy_Xx(user_action, assistant_action, potential_states, beliefs)[source]
\(P(Y=y|X=x)\)
Computes the conditional probability \(P(Y=y|X=x)\), where X is the assistant outcome and Y the user’s response.
- Parameters
user_action (StateElement<coopihc.base.StateElement.StateElement>) – user action y for which the condition is computed
assistant_action (StateElement<coopihc.base.StateElement.StateElement>) – assistant action to be evaluated
potential_states (iterable) – collection of potential goal states
beliefs ((list) beliefs for each target) – (list) beliefs for each target
- Returns
the conditional \(P(Y=y|X=x)\)
- Return type
list
- property action
Return the last action.
- Returns
last action
- Return type
State<coopihc.base.StateElement.StateElement>
- default_value()
Apply this decorator to use bundle.game_state as default value to observe if game_state = None
- find_best_action()[source]
Finds expected information gain associated with each possible future cursor position and ranks them in order from the most to less informative.
- Returns
(assistant actions, associated information gain)
- Return type
tuple(list, list)
- property observation
Return the last observation.
- Returns
last observation
- Return type
State<coopihc.base.State.State>
- reset(random=True)
Reset the policy
- Parameters
random (bool, optional) – reset the policy, defaults to True. Here in case of subclassing BasePolicy.