Algorithms
PPO (Proximal Policy Optimization)
On-policy algorithm using clipped surrogate objective.
julia
using Drill
using Zygote
using ClassicControlEnvironments
env = BroadcastedParallelEnv([CartPoleEnv() for _ in 1:4])
obs_space = state_space(env)
act_space = action_space(env)
max_steps = 100_000
ppo = PPO(;
gamma = 0.99f0,
gae_lambda = 0.95f0,
clip_range = 0.2f0,
ent_coef = 0.0f0,
vf_coef = 0.5f0,
max_grad_norm = 0.5f0,
n_steps = 2048,
batch_size = 64,
epochs = 10,
learning_rate = 3f-4,
)
model = ActorCriticLayer(obs_space, act_space)
agent = Agent(model, ppo)
train!(agent, env, ppo, max_steps)SAC (Soft Actor-Critic)
Off-policy algorithm with entropy regularization and twin Q-networks.
julia
using Drill
using Zygote
using ClassicControlEnvironments
env = BroadcastedParallelEnv([CartPoleEnv() for _ in 1:4])
obs_space = state_space(env)
act_space = action_space(env)
max_steps = 100_000
sac = SAC(;
learning_rate = 3f-4,
buffer_capacity = 1_000_000,
batch_size = 256,
tau = 0.005f0,
gamma = 0.99f0,
train_freq = 1,
gradient_steps = 1,
ent_coef = AutoEntropyCoefficient(),
)
model = SACLayer(obs_space, act_space)
agent = Agent(model, sac)
train!(agent, env, sac, max_steps)Entropy Coefficient
julia
# Fixed entropy
sac = SAC(ent_coef = FixedEntropyCoefficient(0.2f0))
# Automatic tuning (default)
sac = SAC(ent_coef = AutoEntropyCoefficient())