coopihc.bundle.wrappers.Train.TrainGym
- class TrainGym(bundle, *args, train_user=False, train_assistant=False, observation_dict=None, reset_dic={}, reset_turn=None, filter_observation=None, **kwargs)[source]
- Bases: - gym.core.Env- Generic Wrapper to make bundles compatibles with gym.Env - This is a Wrapper to make a Bundle compatible with gym.Env. Read more on the Train class. - Parameters
- bundle (Bundle <coopihc.bundle.Bundle.Bundle>) – bundle to convert to a gym.Env 
- train_user (bool, optional) – whether to train the user, defaults to True 
- train_assistant (bool, optional) – whether to train the assistant, defaults to True 
- observation_dict – to filter out observations, you can apply a dictionnary, defaults to None. e.g.: 
 
 - filterdict = OrderedDict( { "user_state": OrderedDict({"goal": 0}), "task_state": OrderedDict({"x": 0}), } ) - You can always filter out observations later using an ObservationWrapper. Difference in performance between the two approaches is unknown. - Parameters
- reset_dic (dict, optional) – During training, the bundle will be repeatedly reset. Pass the reset_dic here if needed (see Bundle reset mechanism), defaults to {} 
- reset_turn (int, optional) – During training, the bundle will be repeatedly reset. Pass the reset_turn here (see Bundle reset_turn mechanism), defaults to None, which selects either 1 if the user is trained else 3 
 
 - Methods - See Bundle and gym API - convert_space- Create a gym.spaces.Dict out of the action states of the Bundle. - get_agent_observation_space- Same as get_action_space for observations. - See Bundle and gym API - Resets the environment to an initial state and returns an initial observation. - Sets the seed for this env's random number generator(s). - Run one timestep of the environment's dynamics. - Attributes - action_space- metadata- observation_space- reward_range- spec- Completely unwrap this env. - reset()[source]
- Resets the environment to an initial state and returns an initial observation. - Note that this function should not reset the environment’s random number generator(s); random variables in the environment’s state should be sampled independently between multiple calls to reset(). In other words, each call of reset() should yield an environment suitable for a new episode, independent of previous episodes. - Returns:
- observation (object): the initial observation. 
 
 - seed(seed=None)
- Sets the seed for this env’s random number generator(s). - Note:
- Some environments use multiple pseudorandom number generators. We want to capture all such seeds used in order to ensure that there aren’t accidental correlations between multiple generators. 
- Returns:
- list<bigint>: Returns the list of seeds used in this env’s random
- number generators. The first value in the list should be the “main” seed, or the value which a reproducer should pass to ‘seed’. Often, the main seed equals the provided ‘seed’, but this won’t be true if seed=None, for example. 
 
 
 - step(action)[source]
- Run one timestep of the environment’s dynamics. When end of episode is reached, you are responsible for calling reset() to reset this environment’s state. - Accepts an action and returns a tuple (observation, reward, done, info). - Args:
- action (object): an action provided by the agent 
- Returns:
- observation (object): agent’s observation of the current environment reward (float) : amount of reward returned after previous action done (bool): whether the episode has ended, in which case further step() calls will return undefined results info (dict): contains auxiliary diagnostic information (helpful for debugging, and sometimes learning) 
 
 - property unwrapped
- Completely unwrap this env. - Returns:
- gym.Env: The base non-wrapped gym.Env instance