Pong from Pixels Illustrative concept

Watch it learn to play Pong.

← CPU opponent Agent · learning from pixels →

Q-valuesnetwork's action choice

Value V(s)“will it win this point?”

Average reward / episode —

Training time → Episode 0 · ε 1.00 · Exploration

episode 0 · random play~700 episodes · trained

Illustrative concept — representative animation and curve, not measured results. The shipped page uses Will's real per-episode reward log and recorded gameplay at each milestone checkpoint. The right paddle is the agent; its skill is tied to the slider, so the rally you see is the same point the curve is showing.