coopihc.policy.RLPolicy.RLPolicy
- class RLPolicy(action_state, model_path, learning_algorithm, env, wrappers, library, *args, **kwargs)[source]
Bases:
coopihc.policy.BasePolicy.BasePolicy
Wrap a trained net as a CoopIHC policy.
A policy object compatible with CoopIHC that wraps a policy that was trained via Deep Reinforcement learning.
Example code:
# action_state action_state = State() action_state["action"] = StateElement(0, autospace([-5 + i for i in range(11)])) # env env = TrainGym( bundle, train_user=True, train_assistant=False, ) # Using PPO from stable_baselines3, with some wrappers model_path = "saved_model.zip" learning_algorithm = "PPO" wrappers = { "observation_wrappers": [MyObservationWrapper], "action_wrappers": [MyActionWrapper], } library = "stable_baselines3" trained_policy = RLPolicy( action_state, model_path, learning_algorithm, env, wrappers, library )
Note
Currently only supports policies obtained via stable baselines 3.
- Parameters
action_state (see
BasePolicy
) – seeBasePolicy
model_path (string) – path to the saved model
learning_algorithm (string) – name of the learning algorithm
env (gym.Env) – environment before any wrappers were applied
wrappers (dictionary) – observation and action wrappers
library (string) – name of the training library. Currently, only stable_baselines3 is supported.
Methods
Apply this decorator to use bundle.game_state as default value to observe if game_state = None
Reset the policy
sample
Attributes
Return the last action.
action_keys
Return the last observation.
parameters
state
unwrapped
- property action
Return the last action.
- Returns
last action
- Return type
State<coopihc.base.StateElement.StateElement>
- default_value()
Apply this decorator to use bundle.game_state as default value to observe if game_state = None
- property observation
Return the last observation.
- Returns
last observation
- Return type
State<coopihc.base.State.State>
- reset(random=True)
Reset the policy
- Parameters
random (bool, optional) – reset the policy, defaults to True. Here in case of subclassing BasePolicy.