coopihc.policy.RLPolicy.RLPolicy
- class RLPolicy(action_state, model_path, learning_algorithm, env, wrappers, library, *args, **kwargs)[source]
- Bases: - coopihc.policy.BasePolicy.BasePolicy- Wrap a trained net as a CoopIHC policy. - A policy object compatible with CoopIHC that wraps a policy that was trained via Deep Reinforcement learning. - Example code: - # action_state action_state = State() action_state["action"] = StateElement(0, autospace([-5 + i for i in range(11)])) # env env = TrainGym( bundle, train_user=True, train_assistant=False, ) # Using PPO from stable_baselines3, with some wrappers model_path = "saved_model.zip" learning_algorithm = "PPO" wrappers = { "observation_wrappers": [MyObservationWrapper], "action_wrappers": [MyActionWrapper], } library = "stable_baselines3" trained_policy = RLPolicy( action_state, model_path, learning_algorithm, env, wrappers, library ) - Note - Currently only supports policies obtained via stable baselines 3. - Parameters
- action_state (see - BasePolicy) – see- BasePolicy
- model_path (string) – path to the saved model 
- learning_algorithm (string) – name of the learning algorithm 
- env (gym.Env) – environment before any wrappers were applied 
- wrappers (dictionary) – observation and action wrappers 
- library (string) – name of the training library. Currently, only stable_baselines3 is supported. 
 
 - Methods - Apply this decorator to use bundle.game_state as default value to observe if game_state = None - Reset the policy - sample- Attributes - Return the last action. - action_keys- Return the last observation. - parameters- state- unwrapped- property action
- Return the last action. - Returns
- last action 
- Return type
- State<coopihc.base.StateElement.StateElement> 
 
 - default_value()
- Apply this decorator to use bundle.game_state as default value to observe if game_state = None 
 - property observation
- Return the last observation. - Returns
- last observation 
- Return type
- State<coopihc.base.State.State> 
 
 - reset(random=True)
- Reset the policy - Parameters
- random (bool, optional) – reset the policy, defaults to True. Here in case of subclassing BasePolicy.