coopihczoo.teaching.scripts_to_sort.behavioral_cloning_original.BC

class coopihczoo.teaching.scripts_to_sort.behavioral_cloning_original.BC(*, observation_space: gym.spaces.space.Space, action_space: gym.spaces.space.Space, policy: Optional[stable_baselines3.common.policies.ActorCriticPolicy] = None, demonstrations=None, batch_size: int = 32, optimizer_cls: Type[torch.optim.optimizer.Optimizer] = <class 'torch.optim.adam.Adam'>, optimizer_kwargs: Optional[Mapping[str, Any]] = None, ent_weight: float = 0.001, l2_weight: float = 0.0, device: Union[str, torch.device] = 'auto')[source]

Bases: object

Behavioral cloning (BC).

Recovers a policy via supervised learning from observation-action pairs.

Methods

make_data_loader

Converts demonstration data to Torch data loader.

save_policy

Save policy to a path.

set_demonstrations

train

Train with supervised learning for some number of epochs.

Attributes

policy

static make_data_loader(transitions, batch_size: int, data_loader_kwargs: Optional[Mapping[str, Any]] = None)[source]

Converts demonstration data to Torch data loader.

Args:
transitions: Transitions expressed directly as a types.TransitionsMinimal

object, a sequence of trajectories, or an iterable of transition batches (mappings from keywords to arrays containing observations, etc).

batch_size: The size of the batch to create. Does not change the batch size

if transitions is already an iterable of transition batches.

data_loader_kwargs: Arguments to pass to th_data.DataLoader.

Returns:

An iterable of transition batches.

Raises:
ValueError: if transitions is an iterable over transition batches with batch

size not equal to batch_size; or if transitions is transitions or a sequence of trajectories with total timesteps less than batch_size.

TypeError: if transitions is an unsupported type.

save_policy(policy_path)[source]

Save policy to a path. Can be reloaded by .reconstruct_policy().

Args:

policy_path: path to save policy to.

train(*, n_epochs: Optional[int] = None, n_batches: Optional[int] = None, on_epoch_end: Optional[Callable[[], None]] = None, on_batch_end: Optional[Callable[[], None]] = None, log_interval: int = 500, log_rollouts_venv: Optional[stable_baselines3.common.vec_env.base_vec_env.VecEnv] = None, log_rollouts_n_episodes: int = 5, progress_bar: bool = True, reset_tensorboard: bool = False)[source]

Train with supervised learning for some number of epochs.

Here an ‘epoch’ is just a complete pass through the expert data loader, as set by self.set_expert_data_loader().

Args:
n_epochs: Number of complete passes made through expert data before ending

training. Provide exactly one of n_epochs and n_batches.

n_batches: Number of batches loaded from dataset before ending training.

Provide exactly one of n_epochs and n_batches.

on_epoch_end: Optional callback with no parameters to run at the end of each

epoch.

on_batch_end: Optional callback with no parameters to run at the end of each

batch.

log_interval: Log stats after every log_interval batches. log_rollouts_venv: If not None, then this VecEnv (whose observation and

actions spaces must match self.observation_space and self.action_space) is used to generate rollout stats, including average return and average episode length. If None, then no rollouts are generated.

log_rollouts_n_episodes: Number of rollouts to generate when calculating

rollout stats. Non-positive number disables rollouts.

progress_bar: If True, then show a progress bar during training. reset_tensorboard: If True, then start plotting to Tensorboard from x=0

even if .train() logged to Tensorboard previously. Has no practical effect if .train() is being called for the first time.