coopihczoo.teaching.scripts_to_sort.behavioral_cloning_original.Transitions

class coopihczoo.teaching.scripts_to_sort.behavioral_cloning_original.Transitions(obs: numpy.ndarray, acts: numpy.ndarray, infos: numpy.ndarray, next_obs: numpy.ndarray, dones: numpy.ndarray)[source]

Bases: coopihczoo.teaching.scripts_to_sort.behavioral_cloning_original.TransitionsMinimal

A batch of obs-act-obs-done transitions.

Methods

register_datapipe_as_function

register_function

Attributes

functions

next_obs

New observation.

dones

Boolean array indicating episode termination.

acts: np.ndarray

Actions. Shape: (batch_size,) + action_shape.

dones: numpy.ndarray

Boolean array indicating episode termination. Shape: (batch_size, ).

done[i] is true iff next_obs[i] the last observation of an episode.

infos: np.ndarray

Array of info dicts. Shape: (batch_size,).

next_obs: numpy.ndarray

New observation. Shape: (batch_size, ) + observation_shape.

The i’th observation next_obs[i] in this array is the observation after the agent has taken action acts[i].

Invariants:
  • next_obs.dtype == obs.dtype

  • len(next_obs) == len(obs)

obs: np.ndarray

Previous observations. Shape: (batch_size, ) + observation_shape.

The i’th observation obs[i] in this array is the observation seen by the agent when choosing action acts[i]. obs[i] is not required to be from the timestep preceding obs[i+1].