All AI games
🏓
DQN

Pong AI — DQN Self-Play

How two Deep Q-Network agents learn Pong through self-play, each training against the other's checkpoint.

Play the game

How the AI works

Pong uses DQN with self-play. Two agents compete; each one trains against a frozen checkpoint of its opponent, so the difficulty scales up automatically as both improve.

State, actions, reward

  • State: paddle and ball positions and the ball's velocity.
  • Actions: move the paddle up, down, or stay.
  • Reward: +1 for scoring, -1 for conceding.

Why self-play matters

Against a fixed opponent an agent can overfit. Self-play creates an ever-improving curriculum: as one side gets better, the other must too, pushing both toward strong, general play.

Staying sharp

The networks are bootstrapped with an analytic intercept policy (track the ball's predicted landing) and anchored to it during training — a regularizer that lets them fine-tune without the catastrophic forgetting that makes self-play agents collapse over time.

What you see on screen

You watch two learned policies rally against each other — no hand-coded paddle AI, just two networks that taught themselves the game and keep their edge.

Need a Data Scientist or AI Engineer?

I build custom ML models, RAG chatbots, data pipelines, and production APIs — from analysis to deployment.