coopihc.bundle.wrappers.Train.TrainGym

class TrainGym(bundle, *args, train_user=False, train_assistant=False, observation_dict=None, reset_dic={}, reset_turn=None, filter_observation=None, **kwargs)[source]

Bases: gym.core.Env

Generic Wrapper to make bundles compatibles with gym.Env

This is a Wrapper to make a Bundle compatible with gym.Env. Read more on the Train class.

Parameters
  • bundle (Bundle <coopihc.bundle.Bundle.Bundle>) – bundle to convert to a gym.Env

  • train_user (bool, optional) – whether to train the user, defaults to True

  • train_assistant (bool, optional) – whether to train the assistant, defaults to True

  • observation_dict – to filter out observations, you can apply a dictionnary, defaults to None. e.g.:

filterdict = OrderedDict(
    {
        "user_state": OrderedDict({"goal": 0}),
        "task_state": OrderedDict({"x": 0}),
    }
)

You can always filter out observations later using an ObservationWrapper. Difference in performance between the two approaches is unknown.

Parameters
  • reset_dic (dict, optional) – During training, the bundle will be repeatedly reset. Pass the reset_dic here if needed (see Bundle reset mechanism), defaults to {}

  • reset_turn (int, optional) – During training, the bundle will be repeatedly reset. Pass the reset_turn here (see Bundle reset_turn mechanism), defaults to None, which selects either 1 if the user is trained else 3

Methods

close

See Bundle and gym API

convert_space

get_action_space

Create a gym.spaces.Dict out of the action states of the Bundle.

get_agent_observation_space

get_observation_space

Same as get_action_space for observations.

render

See Bundle and gym API

reset

Resets the environment to an initial state and returns an initial observation.

seed

Sets the seed for this env's random number generator(s).

step

Run one timestep of the environment's dynamics.

Attributes

action_space

metadata

observation_space

reward_range

spec

unwrapped

Completely unwrap this env.

close()[source]

See Bundle and gym API

get_action_space()[source]

Create a gym.spaces.Dict out of the action states of the Bundle.

get_observation_space()[source]

Same as get_action_space for observations.

render(mode)[source]

See Bundle and gym API

reset()[source]

Resets the environment to an initial state and returns an initial observation.

Note that this function should not reset the environment’s random number generator(s); random variables in the environment’s state should be sampled independently between multiple calls to reset(). In other words, each call of reset() should yield an environment suitable for a new episode, independent of previous episodes.

Returns:

observation (object): the initial observation.

seed(seed=None)

Sets the seed for this env’s random number generator(s).

Note:

Some environments use multiple pseudorandom number generators. We want to capture all such seeds used in order to ensure that there aren’t accidental correlations between multiple generators.

Returns:
list<bigint>: Returns the list of seeds used in this env’s random

number generators. The first value in the list should be the “main” seed, or the value which a reproducer should pass to ‘seed’. Often, the main seed equals the provided ‘seed’, but this won’t be true if seed=None, for example.

step(action)[source]

Run one timestep of the environment’s dynamics. When end of episode is reached, you are responsible for calling reset() to reset this environment’s state.

Accepts an action and returns a tuple (observation, reward, done, info).

Args:

action (object): an action provided by the agent

Returns:

observation (object): agent’s observation of the current environment reward (float) : amount of reward returned after previous action done (bool): whether the episode has ended, in which case further step() calls will return undefined results info (dict): contains auxiliary diagnostic information (helpful for debugging, and sometimes learning)

property unwrapped

Completely unwrap this env.

Returns:

gym.Env: The base non-wrapped gym.Env instance