coopihc.policy.LinearFeedback.LinearFeedback

class LinearFeedback(action_state, state_indicator, *args, feedback_gain='identity', noise_function=None, noise_func_args=(), **kwargs)[source]

Bases: coopihc.policy.BasePolicy.BasePolicy

Linear Feedback policy, which applies a feedback gain to a given state of the observation, and passes it to a function.

For example with:

state_indicator = ('user_state', 'substate1', slice(0,2,1))

feedback_gain = -numpy.eye(2)

noise_function = f(action, observation, *args)

noise_func_args = (1,2)

You will get

obs = observation['user_state']['substate_1'][slice(0,2,1)]
action = - -numpy.eye(2 @ obs)
return f(action, observation, *(1,2))

You can change the feedback gain online via the set_feedback_gain() method

Parameters

action_state (State<coopihc.base.State.State) – see BasePolicy<coopihc.policy.BasePolicy.BasePolicy
state_indicator (iterable) – specifies which component is used as feedback information e.g. ('user_state', 'substate1', slice(0,2,1))
feedback_gain (str or numpy.ndarray, optional) – Feedback gain matrix, defaults to “identity”, which creates a negative identity matrix.
noise_function (function, optional) – a function that produces a noise sample to add to the generated action, defaults to None
noise_func_args (tuple, optional) – arguments to the function above, defaults to ()

Methods

`default_value`	Apply this decorator to use bundle.game_state as default value to observe if game_state = None
`reset`	Reset the policy
`sample`
`set_feedback_gain`	set feedback gain.

Attributes

`action`	Return the last action.
`action_keys`
`observation`	Return the last observation.
`parameters`
`state`
`unwrapped`

property action

Return the last action.

Returns: last action
Return type: State<coopihc.base.StateElement.StateElement>

default_value(): Apply this decorator to use bundle.game_state as default value to observe if game_state = None

property observation

Return the last observation.

Returns: last observation
Return type: State<coopihc.base.State.State>

reset(random=True)

Reset the policy

Parameters: random (bool, optional) – reset the policy, defaults to True. Here in case of subclassing BasePolicy.

set_feedback_gain(gain)[source]

set feedback gain. Only needed if the gain needs to be changed after initialization, otherwise it is recommended to set the gain during initialiation of the policy.

Parameters: gain (numpy.ndarray) – feedback gain matrix