coopihc.policy.BIGDiscretePolicy.BIGDiscretePolicy
- class BIGDiscretePolicy(assistant_action_state, user_policy_model, *args, threshold=0.8, **kwargs)[source]
- Bases: - coopihc.policy.BasePolicy.BasePolicy- BIGDiscretePolicy [summary] - Bayesian Information Gain Policy, adapted from 1. - The main ideas/assumptions are: - A user wants the task to go to some goal state \(\Theta\) 
- The assistant can put the task in a number of states (X) 
- The user can perform a given set of action Y 
- A model \(p(Y=y|X=X, \Theta = \theta)\) exists for user behavior 
 - Make sure to call: - attach_set_theta, to specify the potential goal states 
- attach_transition_function, to specify how the task state evolves after an assistant action 
 - 1
- Liu, Wanyu, et al. “Bignav: Bayesian information gain for guiding multiscale navigation.” Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems. 2017. 
 - Parameters
- assistant_action_state (State<coopihc.base.State.State>) – action state of the assistant 
- user_policy_model (ELLDiscretePolicy<coopihc.policy.ELLDiscretePolicy.ELLDiscretePolicy>`) – user policy model. This may be the real policy of the user, but realistically has to be a model of the user policy. This policy must currently be an ELLDiscretePolicy<coopihc.policy.ELLDiscretePolicy.ELLDiscretePolicy>. 
 
 - Methods - \(H(Y |\Theta = heta, X=x)\) - \(H(Y |X=x)\) - Information Gain \(\mathrm{IG}(X=x)\) - \(P(Y=y|X=x)\) - attach_set_theta- attach_transition_function- Apply this decorator to use bundle.game_state as default value to observe if game_state = None - Finds expected information gain associated with each possible future cursor position and ranks them in order from the most to less informative. - Reset the policy - sample- Attributes - Return the last action. - action_keys- Return the last observation. - parameters- state- unwrapped- HY__OoXx(potential_states, beliefs)[source]
- \(H(Y |\Theta = heta, X=x)\) - Computes the conditional entropy \(H(Y |\Theta = heta, X=x) = -\mathbb{E}[\log(p(Y|\Theta = heta, X=x))]\). - Parameters
- potential_states (iterable) – collection of potential goal states 
- beliefs ((list) beliefs for each target) – (list) beliefs for each target 
 
- Returns
- \(H(Y |\Theta = heta, X=x)\) 
- Return type
- float 
 
 - HY__Xx(potential_states, assistant_action, beliefs)[source]
- \(H(Y |X=x)\) - Computes the conditional entropy \(H(Y |X=x) = -\mathbb{E}[\log(p(Y|X=x))]\). - Parameters
- assistant_action (StateElement<coopihc.base.StateElement.StateElement>) – assistant action to be evaluated 
- potential_states (iterable) – collection of potential goal states 
- beliefs ((list) beliefs for each target) – (list) beliefs for each target 
 
- Returns
- \(H(Y |X=x)\) 
- Return type
- float 
 
 - IG(assistant_action, observation, beliefs)[source]
- Information Gain \(\mathrm{IG}(X=x)\) - Computes the expected information gain \(\mathrm{IG}(X=x) = H(Y |X=x) - H(Y |\Theta = heta, X=x)\) for a potential assistant action x. - Parameters
- assistant_action (StateElement<coopihc.base.StateElement.StateElement>) – assistant action to be evaluated 
- observation (State<coopihc.base.State.State>) – current assistant observation 
- beliefs ((list) beliefs for each target) – (list) beliefs for each target 
 
- Returns
- [description] 
- Return type
- [type] 
 
 - PYy_Xx(user_action, assistant_action, potential_states, beliefs)[source]
- \(P(Y=y|X=x)\) - Computes the conditional probability \(P(Y=y|X=x)\), where X is the assistant outcome and Y the user’s response. - Parameters
- user_action (StateElement<coopihc.base.StateElement.StateElement>) – user action y for which the condition is computed 
- assistant_action (StateElement<coopihc.base.StateElement.StateElement>) – assistant action to be evaluated 
- potential_states (iterable) – collection of potential goal states 
- beliefs ((list) beliefs for each target) – (list) beliefs for each target 
 
- Returns
- the conditional \(P(Y=y|X=x)\) 
- Return type
- list 
 
 - property action
- Return the last action. - Returns
- last action 
- Return type
- State<coopihc.base.StateElement.StateElement> 
 
 - default_value()
- Apply this decorator to use bundle.game_state as default value to observe if game_state = None 
 - find_best_action()[source]
- Finds expected information gain associated with each possible future cursor position and ranks them in order from the most to less informative. - Returns
- (assistant actions, associated information gain) 
- Return type
- tuple(list, list) 
 
 - property observation
- Return the last observation. - Returns
- last observation 
- Return type
- State<coopihc.base.State.State> 
 
 - reset(random=True)
- Reset the policy - Parameters
- random (bool, optional) – reset the policy, defaults to True. Here in case of subclassing BasePolicy.