Pong from Pixels Illustrative concept

Watch it learn to play Pong.

← CPU opponent Agent · learning from pixels →
Q-valuesnetwork's action choice
Value V(s)“will it win this point?”
Average reward / episode
Training time → Episode 0  ·  ε 1.00  ·  Exploration
episode 0 · random play~700 episodes · trained

Illustrative concept — representative animation and curve, not measured results. The shipped page uses Will's real per-episode reward log and recorded gameplay at each milestone checkpoint. The right paddle is the agent; its skill is tied to the slider, so the rally you see is the same point the curve is showing.