coopihc.policy.RLPolicy.RLPolicy

class RLPolicy(action_state, model_path, learning_algorithm, env, wrappers, library, *args, **kwargs)[source]

Bases: coopihc.policy.BasePolicy.BasePolicy

Wrap a trained net as a CoopIHC policy.

A policy object compatible with CoopIHC that wraps a policy that was trained via Deep Reinforcement learning.

Example code:

# action_state
action_state = State()
action_state["action"] = StateElement(0, autospace([-5 + i for i in range(11)]))

# env
env = TrainGym(
bundle,
train_user=True,
train_assistant=False,
    )

# Using PPO from stable_baselines3, with some wrappers
model_path = "saved_model.zip"
learning_algorithm = "PPO"
wrappers = {
    "observation_wrappers": [MyObservationWrapper],
    "action_wrappers": [MyActionWrapper],
}
library = "stable_baselines3"

trained_policy = RLPolicy(
    action_state, model_path, learning_algorithm, env, wrappers, library
)

Note

Currently only supports policies obtained via stable baselines 3.

Parameters

action_state (see BasePolicy) – see BasePolicy
model_path (string) – path to the saved model
learning_algorithm (string) – name of the learning algorithm
env (gym.Env) – environment before any wrappers were applied
wrappers (dictionary) – observation and action wrappers
library (string) – name of the training library. Currently, only stable_baselines3 is supported.

Methods

`default_value`	Apply this decorator to use bundle.game_state as default value to observe if game_state = None
`reset`	Reset the policy
`sample`

Attributes

`action`	Return the last action.
`action_keys`
`observation`	Return the last observation.
`parameters`
`state`
`unwrapped`

property action

Return the last action.

Returns: last action
Return type: State<coopihc.base.StateElement.StateElement>

default_value(): Apply this decorator to use bundle.game_state as default value to observe if game_state = None

property observation

Return the last observation.

Returns: last observation
Return type: State<coopihc.base.State.State>

reset(random=True)

Reset the policy

Parameters: random (bool, optional) – reset the policy, defaults to True. Here in case of subclassing BasePolicy.