The last time we discussed chess with ChatGPT, it had plenty of strange comments and bad chess advice. Naturally, we stopped there, isn’t that right?
On the contrary, in fact. We doubled down. We combined Gemini, an AI from Google, with the other AIs and put them against each other in a “wits” match to find out which model is superior at chess.
Who then won? Continue reading to learn more!
The Match
Putting the AIs to the test by having them compete in a game is a great idea. First, here is the result on its own, without any explanation. How many hilarious mistakes can you identify?
This is the game that has some of our favorite scenes in it, including quotes from the AIs themselves and some of the most hilarious histories of some of the actions.
The “brilliant” move taken by ChatGPT made Gemini think it had no right to respond and resulted in “resignation.”
The Review
Therefore, the accuracy rather than only the result shows that GPT is better at chess.
The ranges that those actions fall into are shown here (book moves removed).
Many of the details are made clear by Gemini’s mistakes and ChatGPT’s absences. While one AI proceeded to provide chances to the other, the other AI continued to turn down these gifts. The good news for ChatGPT is that compared to Gemini’s mistakes and blunders, it made more “Good” or wiser choices. For Gemini, the bad news is—well, almost everything that happened.
Black has some possibilities now that White’s 21.Ra1 has become the Harmed Pawn variant. The most crucial move on b5 is to resist the need to take the poison piece.
The Approach
To start a game of chess, ChatGPT was asked for a move. Someone requested Gemini to respond. The following formulation was then used to request moves: “White/Black replied [1…c5, 2.Nf3, etc.].” Play the nth move of White/Black.” Each of them took place in a single chat thread, allowing the AI to retain the memory of the whole game.
In fairness to the AIs, this approach simply equals to their playing chess while blind. To be fair to us, though, they ought to be able to duplicate the position far more quickly than a person could.
Should be. However, because the model involves language, it is challenging to convert words into a physical position. Nevertheless, this issue still exists when users download the entire game at once, and as the previous post shows, ChatGPT isn’t even able to replicate a position from the FEN itself.
Another issue is that both of these AIs are built to equivocate. This is useful when a user asks an important philosophical question, but not so useful when you just want it to play a damn chess move. An AI asked for suggestions if it listed several moves without suggesting one. It was asked to select one move if it suggested more than one.
Now for the exciting part: what if it tries to make a capture with a piece that isn’t on the board using an illegal move, or a move that doesn’t even exist? Given the poor board vision, both AIs attempted to make large numbers of each kind. If one did, it was informed to choose another because the move was illegal. It was awarded every move for the course of the game if it made three straight illegal moves. This would typically result in a legal move, although still not a very good one.
Gemini 32, ChatGPT 6 was the last unlawful move count. That makes sense—it would have been absurd if the AI that was capable of winning had also been capable of committing more criminal acts. However, it also suggests ChatGPT made almost 80% of the approved moves, compared to Gemini’s 50%.
Conclusion
That is the result of two language learning models trying to play chess together. Which of the outcomes truly surprised you? As the victor of this match, ought we to compare ChatGPT to other real chess bots? Is it possible for it to defeat Martin? On the other hand, how soon do you think Stockfish would win out?
So far, all we know is that it wouldn’t be wise to put down your life on ChatGPT to discover a hanging queen. However, you already know which option to select if you were to choose between Gemini and ChatGPT.
Please feel free to do this exercise again and let us know how it goes in the comments!