Pong AI — DQN Self-Play | Zakaria Kassemi

How the AI works

Pong uses DQN with self-play. Two agents compete; each one trains against a frozen checkpoint of its opponent, so the difficulty scales up automatically as both improve.

State, actions, reward

State: paddle and ball positions and the ball's velocity.
Actions: move the paddle up, down, or stay.
Reward: +1 for scoring, -1 for conceding.

Why self-play matters

Against a fixed opponent an agent can overfit. Self-play creates an ever-improving curriculum: as one side gets better, the other must too, pushing both toward strong, general play.

Staying sharp

The networks are bootstrapped with an analytic intercept policy (track the ball's predicted landing) and anchored to it during training — a regularizer that lets them fine-tune without the catastrophic forgetting that makes self-play agents collapse over time.

What you see on screen

You watch two learned policies rally against each other — no hand-coded paddle AI, just two networks that taught themselves the game and keep their edge.

How the AI works

State, actions, reward

Why self-play matters

Staying sharp

What you see on screen

Need a Data Scientist or AI Engineer?