Select Your Opponent
10:00
10:00
About
Between 1996 and 1997 regining chess world champion Garry Kasparov played a pair of six-game matches against Deep Blue, a supercomputer developed by IBM. Kasparov would lose the second match, being the first defeat of a reigning world chess champion by a computer under tournament conditions.
While this was an important milestone for artificial intelligence and for chess, in playing Kasparov, Deep Blue was mainly relying on brute computational force to evaluate millions of positions and outcomes. Deep Blue was able to match Kasparov's ability, but not his flair.
Now, three decades later, artifical intelligence has allowed us to do both. For the first time, anyone can experience not only the ability of a player, but their unique approach to the game firsthand.
Select a pre-configured opponent and experience it for yourself above.
Our Work:
1. Preference Optimization in Chess.
We demonstrate, to our knowledge for the first time, that Direct Preference Optimization (DPO) can be successfully applied to a non-language, structured decision domain with discrete legal-action constraints.
Notably:
- DPO yields ~2× improvement in mean log-probability gap between the Grandmaster’s chosen move and Maia-2’s next-best alternative.
- DPO achieves these gains with negligible additional Kullback–Leibler (KL) divergence from the base Maia-2 policy compared to the strongest supervised fine-tuning (SFT) baseline.
- Preference optimization produces stronger shallow-search performance with less stylistic drift.
2. Grandmaster-Style Modeling.
We provide empirical evidence that DPO captures fine-grained stylistic signals more effectively than standard supervised baselines, improving relative move preference alignment while preserving overall policy stability. We further demonstrate that inference-time integration with Stockfish search can significantly reduce tactical errors without erasing learned stylistic characteristics, even as search depth increases.
3. Public Release and Evaluation Framework.
We release all fine-tuned models and provide a standardized evaluation suite, enabling interactive analysis and community exploration of stylistic chess AI.
By bridging preference-based optimization with neural chess policies, our work advances the goal of personalized, interpretable chess AI. Rather than pushing beyond human limits of strength, we focus on faithfully modeling human expertise—opening new avenues for education, historical analysis, and human–AI collaboration in strategic domains.