coopihczoo.teaching.scripts_to_sort.behavioral_cloning_original.BC¶
- class coopihczoo.teaching.scripts_to_sort.behavioral_cloning_original.BC(*, observation_space: gym.spaces.space.Space, action_space: gym.spaces.space.Space, policy: Optional[stable_baselines3.common.policies.ActorCriticPolicy] = None, demonstrations=None, batch_size: int = 32, optimizer_cls: Type[torch.optim.optimizer.Optimizer] = <class 'torch.optim.adam.Adam'>, optimizer_kwargs: Optional[Mapping[str, Any]] = None, ent_weight: float = 0.001, l2_weight: float = 0.0, device: Union[str, torch.device] = 'auto')[source]¶
Bases:
object
Behavioral cloning (BC).
Recovers a policy via supervised learning from observation-action pairs.
Methods
Converts demonstration data to Torch data loader.
Save policy to a path.
set_demonstrations
Train with supervised learning for some number of epochs.
Attributes
policy
- static make_data_loader(transitions, batch_size: int, data_loader_kwargs: Optional[Mapping[str, Any]] = None)[source]¶
Converts demonstration data to Torch data loader.
- Args:
- transitions: Transitions expressed directly as a types.TransitionsMinimal
object, a sequence of trajectories, or an iterable of transition batches (mappings from keywords to arrays containing observations, etc).
- batch_size: The size of the batch to create. Does not change the batch size
if transitions is already an iterable of transition batches.
data_loader_kwargs: Arguments to pass to th_data.DataLoader.
- Returns:
An iterable of transition batches.
- Raises:
- ValueError: if transitions is an iterable over transition batches with batch
size not equal to batch_size; or if transitions is transitions or a sequence of trajectories with total timesteps less than batch_size.
TypeError: if transitions is an unsupported type.
- save_policy(policy_path)[source]¶
Save policy to a path. Can be reloaded by .reconstruct_policy().
- Args:
policy_path: path to save policy to.
- train(*, n_epochs: Optional[int] = None, n_batches: Optional[int] = None, on_epoch_end: Optional[Callable[[], None]] = None, on_batch_end: Optional[Callable[[], None]] = None, log_interval: int = 500, log_rollouts_venv: Optional[stable_baselines3.common.vec_env.base_vec_env.VecEnv] = None, log_rollouts_n_episodes: int = 5, progress_bar: bool = True, reset_tensorboard: bool = False)[source]¶
Train with supervised learning for some number of epochs.
Here an ‘epoch’ is just a complete pass through the expert data loader, as set by self.set_expert_data_loader().
- Args:
- n_epochs: Number of complete passes made through expert data before ending
training. Provide exactly one of n_epochs and n_batches.
- n_batches: Number of batches loaded from dataset before ending training.
Provide exactly one of n_epochs and n_batches.
- on_epoch_end: Optional callback with no parameters to run at the end of each
epoch.
- on_batch_end: Optional callback with no parameters to run at the end of each
batch.
log_interval: Log stats after every log_interval batches. log_rollouts_venv: If not None, then this VecEnv (whose observation and
actions spaces must match self.observation_space and self.action_space) is used to generate rollout stats, including average return and average episode length. If None, then no rollouts are generated.
- log_rollouts_n_episodes: Number of rollouts to generate when calculating
rollout stats. Non-positive number disables rollouts.
progress_bar: If True, then show a progress bar during training. reset_tensorboard: If True, then start plotting to Tensorboard from x=0
even if .train() logged to Tensorboard previously. Has no practical effect if .train() is being called for the first time.