Pendulum Reinforcement Learning — Cross‑Entropy Method

A single‑file demo showing a pendulum environment and a simple policy trained via CEM. Click Train and watch it learn to keep the pendulum upright.

(Left/Right arrows apply ± torque)
9.8
0.02
Angle θ (rad)
0.00
Angular vel ω (rad/s)
0.00
500
2.0
60
20%
0.50
Training progress
Iter
0
Best return
Mean return
Policy
w·[cosθ,sinθ,ω,1]