coopihc.policy.LinearFeedback.LinearFeedback
- class LinearFeedback(action_state, state_indicator, *args, feedback_gain='identity', noise_function=None, noise_func_args=(), **kwargs)[source]
Bases:
coopihc.policy.BasePolicy.BasePolicy
Linear Feedback policy, which applies a feedback gain to a given state of the observation, and passes it to a function.
For example with:
state_indicator = ('user_state', 'substate1', slice(0,2,1))
feedback_gain = -numpy.eye(2)
noise_function = f(action, observation, *args)
noise_func_args = (1,2)
You will get
obs = observation['user_state']['substate_1'][slice(0,2,1)] action = - -numpy.eye(2 @ obs) return f(action, observation, *(1,2))
You can change the feedback gain online via the
set_feedback_gain()
method- Parameters
action_state (State<coopihc.base.State.State) – see BasePolicy<coopihc.policy.BasePolicy.BasePolicy
state_indicator (iterable) – specifies which component is used as feedback information e.g.
('user_state', 'substate1', slice(0,2,1))
feedback_gain (str or numpy.ndarray, optional) – Feedback gain matrix, defaults to “identity”, which creates a negative identity matrix.
noise_function (function, optional) – a function that produces a noise sample to add to the generated action, defaults to None
noise_func_args (tuple, optional) – arguments to the function above, defaults to ()
Methods
Apply this decorator to use bundle.game_state as default value to observe if game_state = None
Reset the policy
sample
set feedback gain.
Attributes
Return the last action.
action_keys
Return the last observation.
parameters
state
unwrapped
- property action
Return the last action.
- Returns
last action
- Return type
State<coopihc.base.StateElement.StateElement>
- default_value()
Apply this decorator to use bundle.game_state as default value to observe if game_state = None
- property observation
Return the last observation.
- Returns
last observation
- Return type
State<coopihc.base.State.State>
- reset(random=True)
Reset the policy
- Parameters
random (bool, optional) – reset the policy, defaults to True. Here in case of subclassing BasePolicy.