coopihczoo.teaching.scripts_to_sort.behavioral_cloning_original.rollout¶

coopihczoo.teaching.scripts_to_sort.behavioral_cloning_original.rollout(policy, venv, sample_until, *, unwrap: bool = True, exclude_infos: bool = True, **kwargs)[source]¶

Generate policy rollouts.

The .infos field of each Trajectory is set to None to save space.

Args:

policy: Can be any of the following:

A stable_baselines3 policy or algorithm trained on the gym environment.

2) A Callable that takes an ndarray of observations and returns an ndarray of corresponding actions. 3) None, in which case actions will be sampled randomly.

venv: The vectorized environments. sample_until: End condition for rollout sampling. unwrap: If True, then save original observations and rewards (instead of

potentially wrapped observations and rewards) by calling unwrap_traj().

exclude_infos: If True, then exclude infos from pickle by setting: this field to None. Excluding infos can save a lot of space during pickles.

verbose: If True, then print out rollout stats before saving. **kwargs: Passed through to generate_trajectories.

Returns:

Sequence of trajectories, satisfying sample_until. Additional trajectories may be collected to avoid biasing process towards short episodes; the user should truncate if required.