coopihc.bundle.wrappers.Train.TrainGym
- class TrainGym(bundle, *args, train_user=False, train_assistant=False, observation_dict=None, reset_dic={}, reset_turn=None, filter_observation=None, **kwargs)[source]
Bases:
gym.core.Env
Generic Wrapper to make bundles compatibles with gym.Env
This is a Wrapper to make a Bundle compatible with gym.Env. Read more on the Train class.
- Parameters
bundle (Bundle <coopihc.bundle.Bundle.Bundle>) – bundle to convert to a gym.Env
train_user (bool, optional) – whether to train the user, defaults to True
train_assistant (bool, optional) – whether to train the assistant, defaults to True
observation_dict – to filter out observations, you can apply a dictionnary, defaults to None. e.g.:
filterdict = OrderedDict( { "user_state": OrderedDict({"goal": 0}), "task_state": OrderedDict({"x": 0}), } )
You can always filter out observations later using an ObservationWrapper. Difference in performance between the two approaches is unknown.
- Parameters
reset_dic (dict, optional) – During training, the bundle will be repeatedly reset. Pass the reset_dic here if needed (see Bundle reset mechanism), defaults to {}
reset_turn (int, optional) – During training, the bundle will be repeatedly reset. Pass the reset_turn here (see Bundle reset_turn mechanism), defaults to None, which selects either 1 if the user is trained else 3
Methods
See Bundle and gym API
convert_space
Create a gym.spaces.Dict out of the action states of the Bundle.
get_agent_observation_space
Same as get_action_space for observations.
See Bundle and gym API
Resets the environment to an initial state and returns an initial observation.
Sets the seed for this env's random number generator(s).
Run one timestep of the environment's dynamics.
Attributes
action_space
metadata
observation_space
reward_range
spec
Completely unwrap this env.
- reset()[source]
Resets the environment to an initial state and returns an initial observation.
Note that this function should not reset the environment’s random number generator(s); random variables in the environment’s state should be sampled independently between multiple calls to reset(). In other words, each call of reset() should yield an environment suitable for a new episode, independent of previous episodes.
- Returns:
observation (object): the initial observation.
- seed(seed=None)
Sets the seed for this env’s random number generator(s).
- Note:
Some environments use multiple pseudorandom number generators. We want to capture all such seeds used in order to ensure that there aren’t accidental correlations between multiple generators.
- Returns:
- list<bigint>: Returns the list of seeds used in this env’s random
number generators. The first value in the list should be the “main” seed, or the value which a reproducer should pass to ‘seed’. Often, the main seed equals the provided ‘seed’, but this won’t be true if seed=None, for example.
- step(action)[source]
Run one timestep of the environment’s dynamics. When end of episode is reached, you are responsible for calling reset() to reset this environment’s state.
Accepts an action and returns a tuple (observation, reward, done, info).
- Args:
action (object): an action provided by the agent
- Returns:
observation (object): agent’s observation of the current environment reward (float) : amount of reward returned after previous action done (bool): whether the episode has ended, in which case further step() calls will return undefined results info (dict): contains auxiliary diagnostic information (helpful for debugging, and sometimes learning)
- property unwrapped
Completely unwrap this env.
- Returns:
gym.Env: The base non-wrapped gym.Env instance