coopihczoo.teaching.scripts_to_sort.behavioral_cloning_original.BC¶
- class coopihczoo.teaching.scripts_to_sort.behavioral_cloning_original.BC(*, observation_space: gym.spaces.space.Space, action_space: gym.spaces.space.Space, policy: Optional[stable_baselines3.common.policies.ActorCriticPolicy] = None, demonstrations=None, batch_size: int = 32, optimizer_cls: Type[torch.optim.optimizer.Optimizer] = <class 'torch.optim.adam.Adam'>, optimizer_kwargs: Optional[Mapping[str, Any]] = None, ent_weight: float = 0.001, l2_weight: float = 0.0, device: Union[str, torch.device] = 'auto')[source]¶
- Bases: - object- Behavioral cloning (BC). - Recovers a policy via supervised learning from observation-action pairs. - Methods - Converts demonstration data to Torch data loader. - Save policy to a path. - set_demonstrations- Train with supervised learning for some number of epochs. - Attributes - policy- static make_data_loader(transitions, batch_size: int, data_loader_kwargs: Optional[Mapping[str, Any]] = None)[source]¶
- Converts demonstration data to Torch data loader. - Args:
- transitions: Transitions expressed directly as a types.TransitionsMinimal
- object, a sequence of trajectories, or an iterable of transition batches (mappings from keywords to arrays containing observations, etc). 
- batch_size: The size of the batch to create. Does not change the batch size
- if transitions is already an iterable of transition batches. 
 - data_loader_kwargs: Arguments to pass to th_data.DataLoader. 
- Returns:
- An iterable of transition batches. 
- Raises:
- ValueError: if transitions is an iterable over transition batches with batch
- size not equal to batch_size; or if transitions is transitions or a sequence of trajectories with total timesteps less than batch_size. 
 - TypeError: if transitions is an unsupported type. 
 
 - save_policy(policy_path)[source]¶
- Save policy to a path. Can be reloaded by .reconstruct_policy(). - Args:
- policy_path: path to save policy to. 
 
 - train(*, n_epochs: Optional[int] = None, n_batches: Optional[int] = None, on_epoch_end: Optional[Callable[[], None]] = None, on_batch_end: Optional[Callable[[], None]] = None, log_interval: int = 500, log_rollouts_venv: Optional[stable_baselines3.common.vec_env.base_vec_env.VecEnv] = None, log_rollouts_n_episodes: int = 5, progress_bar: bool = True, reset_tensorboard: bool = False)[source]¶
- Train with supervised learning for some number of epochs. - Here an ‘epoch’ is just a complete pass through the expert data loader, as set by self.set_expert_data_loader(). - Args:
- n_epochs: Number of complete passes made through expert data before ending
- training. Provide exactly one of n_epochs and n_batches. 
- n_batches: Number of batches loaded from dataset before ending training.
- Provide exactly one of n_epochs and n_batches. 
- on_epoch_end: Optional callback with no parameters to run at the end of each
- epoch. 
- on_batch_end: Optional callback with no parameters to run at the end of each
- batch. 
 - log_interval: Log stats after every log_interval batches. log_rollouts_venv: If not None, then this VecEnv (whose observation and - actions spaces must match self.observation_space and self.action_space) is used to generate rollout stats, including average return and average episode length. If None, then no rollouts are generated. - log_rollouts_n_episodes: Number of rollouts to generate when calculating
- rollout stats. Non-positive number disables rollouts. 
 - progress_bar: If True, then show a progress bar during training. reset_tensorboard: If True, then start plotting to Tensorboard from x=0 - even if .train() logged to Tensorboard previously. Has no practical effect if .train() is being called for the first time.