coopihc.bundle.wrappers.Train.TrainGym

class TrainGym(bundle, *args, train_user=False, train_assistant=False, observation_dict=None, reset_dic={}, reset_turn=None, filter_observation=None, **kwargs)[source]

Bases: gym.core.Env

Generic Wrapper to make bundles compatibles with gym.Env

This is a Wrapper to make a Bundle compatible with gym.Env. Read more on the Train class.

Parameters

bundle (Bundle <coopihc.bundle.Bundle.Bundle>) – bundle to convert to a gym.Env
train_user (bool, optional) – whether to train the user, defaults to True
train_assistant (bool, optional) – whether to train the assistant, defaults to True
observation_dict – to filter out observations, you can apply a dictionnary, defaults to None. e.g.:

filterdict = OrderedDict(
    {
        "user_state": OrderedDict({"goal": 0}),
        "task_state": OrderedDict({"x": 0}),
    }
)

You can always filter out observations later using an ObservationWrapper. Difference in performance between the two approaches is unknown.

Parameters

reset_dic (dict, optional) – During training, the bundle will be repeatedly reset. Pass the reset_dic here if needed (see Bundle reset mechanism), defaults to {}
reset_turn (int, optional) – During training, the bundle will be repeatedly reset. Pass the reset_turn here (see Bundle reset_turn mechanism), defaults to None, which selects either 1 if the user is trained else 3

Methods

`close`	See Bundle and gym API
`convert_space`
`get_action_space`	Create a gym.spaces.Dict out of the action states of the Bundle.
`get_agent_observation_space`
`get_observation_space`	Same as get_action_space for observations.
`render`	See Bundle and gym API
`reset`	Resets the environment to an initial state and returns an initial observation.
`seed`	Sets the seed for this env's random number generator(s).
`step`	Run one timestep of the environment's dynamics.

Attributes

`action_space`
`metadata`
`observation_space`
`reward_range`
`spec`
`unwrapped`	Completely unwrap this env.

close()[source]

See Bundle and gym API

get_action_space()[source]: Create a gym.spaces.Dict out of the action states of the Bundle.

get_observation_space()[source]: Same as get_action_space for observations.

render(mode)[source]

See Bundle and gym API

reset()[source]

Resets the environment to an initial state and returns an initial observation.

Note that this function should not reset the environment’s random number generator(s); random variables in the environment’s state should be sampled independently between multiple calls to reset(). In other words, each call of reset() should yield an environment suitable for a new episode, independent of previous episodes.

Returns:: observation (object): the initial observation.

seed(seed=None)

Sets the seed for this env’s random number generator(s).

Note:

Some environments use multiple pseudorandom number generators. We want to capture all such seeds used in order to ensure that there aren’t accidental correlations between multiple generators.

Returns:

list<bigint>: Returns the list of seeds used in this env’s random: number generators. The first value in the list should be the “main” seed, or the value which a reproducer should pass to ‘seed’. Often, the main seed equals the provided ‘seed’, but this won’t be true if seed=None, for example.

step(action)[source]

Run one timestep of the environment’s dynamics. When end of episode is reached, you are responsible for calling reset() to reset this environment’s state.

Accepts an action and returns a tuple (observation, reward, done, info).

Args:: action (object): an action provided by the agent
Returns:: observation (object): agent’s observation of the current environment reward (float) : amount of reward returned after previous action done (bool): whether the episode has ended, in which case further step() calls will return undefined results info (dict): contains auxiliary diagnostic information (helpful for debugging, and sometimes learning)

property unwrapped

Completely unwrap this env.

Returns:: gym.Env: The base non-wrapped gym.Env instance