coopihczoo.teaching.scripts_to_sort.behavioral_cloning_original.TrajectoryWithRew¶
- class coopihczoo.teaching.scripts_to_sort.behavioral_cloning_original.TrajectoryWithRew(obs: numpy.ndarray, acts: numpy.ndarray, infos: Optional[numpy.ndarray], terminal: bool, rews: numpy.ndarray)[source]¶
- Bases: - coopihczoo.teaching.scripts_to_sort.behavioral_cloning_original.Trajectory- A Trajectory that additionally includes reward information. - Methods - Attributes - Reward, shape (trajectory_len, ). - acts: numpy.ndarray¶
- Actions, shape (trajectory_len, ) + action_shape. 
 - infos: Optional[numpy.ndarray]¶
- An array of info dicts, length trajectory_len. 
 - obs: numpy.ndarray¶
- Observations, shape (trajectory_len + 1, ) + observation_shape. 
 - rews: numpy.ndarray¶
- Reward, shape (trajectory_len, ). dtype float. 
 - terminal: bool¶
- Does this trajectory (fragment) end in a terminal state? - Episodes are always terminal. Trajectory fragments are also terminal when they contain the final state of an episode (even if missing the start of the episode).