coopihczoo.teaching.scripts_to_sort.behavioral_cloning_original.rollout¶
- coopihczoo.teaching.scripts_to_sort.behavioral_cloning_original.rollout(policy, venv, sample_until, *, unwrap: bool = True, exclude_infos: bool = True, **kwargs)[source]¶
Generate policy rollouts.
The .infos field of each Trajectory is set to None to save space.
- Args:
- policy: Can be any of the following:
A stable_baselines3 policy or algorithm trained on the gym environment.
2) A Callable that takes an ndarray of observations and returns an ndarray of corresponding actions. 3) None, in which case actions will be sampled randomly.
venv: The vectorized environments. sample_until: End condition for rollout sampling. unwrap: If True, then save original observations and rewards (instead of
potentially wrapped observations and rewards) by calling unwrap_traj().
- exclude_infos: If True, then exclude infos from pickle by setting
this field to None. Excluding infos can save a lot of space during pickles.
verbose: If True, then print out rollout stats before saving. **kwargs: Passed through to generate_trajectories.
- Returns:
Sequence of trajectories, satisfying sample_until. Additional trajectories may be collected to avoid biasing process towards short episodes; the user should truncate if required.