coopihczoo.teaching.scripts_to_sort.behavioral_cloning_original.Transitions¶

class coopihczoo.teaching.scripts_to_sort.behavioral_cloning_original.Transitions(obs: numpy.ndarray, acts: numpy.ndarray, infos: numpy.ndarray, next_obs: numpy.ndarray, dones: numpy.ndarray)[source]¶

Bases: coopihczoo.teaching.scripts_to_sort.behavioral_cloning_original.TransitionsMinimal

A batch of obs-act-obs-done transitions.

Methods

`register_datapipe_as_function`
`register_function`

Attributes

`functions`
`next_obs`	New observation.
`dones`	Boolean array indicating episode termination.

acts: np.ndarray¶: Actions. Shape: (batch_size,) + action_shape.

dones: numpy.ndarray¶

Boolean array indicating episode termination. Shape: (batch_size, ).

done[i] is true iff next_obs[i] the last observation of an episode.

infos: np.ndarray¶: Array of info dicts. Shape: (batch_size,).

next_obs: numpy.ndarray¶

New observation. Shape: (batch_size, ) + observation_shape.

The i’th observation next_obs[i] in this array is the observation after the agent has taken action acts[i].

Invariants:

next_obs.dtype == obs.dtype
len(next_obs) == len(obs)

obs: np.ndarray¶

Previous observations. Shape: (batch_size, ) + observation_shape.

The i’th observation obs[i] in this array is the observation seen by the agent when choosing action acts[i]. obs[i] is not required to be from the timestep preceding obs[i+1].