AIgamesagentsAI agentssoftware development

Can You Beat My Dog at Chess?

February 7, 2026

5 min read

By Michael Cooper

Share:

How simple games reveal the real pace of AI agent progress.

On my personal website, I run a small free arcade. It includes simple games like Tic-Tac-Toe, more complex ones like Chess and Allies & Adversaries, and original simulations like Power Broker—a political strategy game designed to explore incentives, negotiation, and second-order effects.

At first glance, it looks like a hobby.

In reality, it's one of the most effective ways I've found to understand how fast AI agents are actually improving.

The Dogs

In every game, I name the AI difficulty levels after my dogs.

Bella (Easy) — my puppy. Curious, fast, inconsistent.

Coop (Medium) — me. Decent strategy, occasional overconfidence.

Bentley (Hard) — my older dog. Calm, patient, and ruthless.

It started as a joke.

But over time, it became a surprisingly accurate way to track how AI agents behave as they mature. Not just how "smart" they are, but how stable, patient, and self-correcting they become.

The question the arcade quietly asks is simple:

Can you beat my dog at chess?

Why Games Work When Benchmarks Don't

Games are unusually honest environments for AI.

They have:

Clear rules
Observable state
No hiding behind demos
Immediate feedback
Binary outcomes

You can't explain away a bad move in chess. You either saw it—or you didn't.

That makes games ideal for testing agent behavior over time, especially when you rebuild the same game repeatedly with newer tools.

October vs January: The Difference Is Not Subtle

Between October and January, the change in agent behavior has been dramatic.

Not incremental. Not theoretical. Obvious.

Across rebuilds of the same games, using the same prompts and architectures, I observed consistent improvements in:

Run length – agents could operate far longer without degrading
Context survival – less drift after resets
Error recovery – mistakes were corrected instead of compounding
Decision patience – fewer rushed or random moves
State awareness – better understanding of "what just happened"

Bentley didn't get better because I tuned difficulty.

Bentley got better because the agents did.

The difference between an October Bentley and a January Bentley is the difference between a clever demo and a credible system.

The Quiet Breakthrough: Self-Evaluation

One of the most important changes wasn't raw reasoning power—it was self-evaluation.

Modern agents are increasingly able to:

Observe their own output
Compare it to expected outcomes
Identify failure modes
Adjust behavior mid-run

This is where tools like Playwright matter enormously.

When an agent can interact with a real interface, inspect state, replay actions, and validate outcomes, it stops behaving like a one-shot responder and starts behaving like a system.

Games amplify this effect because feedback is immediate and unforgiving. A bad move is visible. A lost position is undeniable. The agent has to reconcile intent with outcome.

That loop—act, observe, correct—is improving fast.

Context Windows Are the Real Constraint

If there's one hard limit I kept running into, it wasn't model intelligence.

It was context management.

Games make this painfully obvious.

Long-running sessions stress:

Memory boundaries
Instruction decay
Goal drift
Accidental overwrites
Reset recovery

Between October and January, agents got significantly better at operating within constrained context windows. Not by magically remembering everything—but by becoming more selective, more structured, and more disciplined about what matters.

The takeaway is simple but uncomfortable:

AI performance is increasingly limited by how we manage context, not by how smart the models are.

Games surface this faster than almost anything else.

Same Game, Different Brains

One of the most valuable practices in this arcade has been rebuilding the same basic game across different AI tools.

Claude. Gemini. Codex.

Same rules. Same objectives. Same constraints.

Wildly different approaches.

Some models favor:

Explicit planning
Careful explanation
Conservative moves

Others:

Explore aggressively
Recover quickly
Optimize via iteration rather than foresight

None of them are "right." But seeing the differences side-by-side builds intuition fast.

You don't learn this from benchmarks. You learn it by watching how each agent struggles—and how that struggle changes month to month.

Why Static End States Matter

Every game eventually ends up as a static page, hosted simply, with the source code published.

That's intentional.

Static artifacts remove excuses.

No background services. No orchestration magic. No operational crutches.

If the experience is good as a frozen artifact, the system design worked. If it's fragile, confusing, or inconsistent, it didn't.

That discipline makes evaluation honest: OK. Good. Great.

Then move on.

The Real Lesson

AI isn't just getting smarter.

It's getting:

More patient
More stable
Better at self-correction
More tolerant of imperfect instructions
More capable of operating as a system

The speed of that change—from October to January—was impossible to miss when rebuilding the same games over and over.

That's why I'd encourage others to do something similar.

You don't need a chess engine. You don't need a political simulation.

You need:

A bounded system
A real endpoint
A willingness to rebuild it over time

Name it something human. Ship it. Play it.

And then ask the question that matters:

Can you still beat the dog?

Because lately... that's getting a lot harder.

Related Posts

Cover image for related post: The Intelligence Paradox: Why Mid-Market Growth is Stalling in 2026

The Intelligence Paradox: Why Mid-Market Growth is Stalling in 2026

Intelligence is now cheap—AI can generate reports, write code, and analyze data for nearly nothing. Yet operational costs remain astronomical. The real bottleneck isn't thinking; it's moving information through your business.

January 16, 2026

Cover image for related post: The Agentic AI Ecosystem: A New Frontier for Strategic Partnerships

The Agentic AI Ecosystem: A New Frontier for Strategic Partnerships

As autonomous AI agents coordinate across digital systems, the companies that build the right partnerships first will lead the next era of enterprise technology.

October 29, 2025