Jan 7, 2007

[Guide] Perft scores - validating the move generation

I should have done this much earlier, but here we are.

A perft score is the node count from a plain mini-max search on a certain position. For example the start position at depth 1 has 20 nodes, since there are 20 possible moves (nodes). At depth 2 we have 400 nodes, since for every move white makes black can answer with 20 different moves (20*20=400). And so on.

This is used to make sure our move generation works as it should. Now how should we know how many nodes a certain position is supposed to have at a certain depth? Well you could either count them by hand (I rather not do that :) or use another engine which you know has a correct move generator.

Or you could use validated results available on different web sites. Like Sharper.

Or just a test suite, which is basically a text file with fen-strings followed by the correct results at each depth.

Here are the correct number of moves each depth should return, starting with the initial position:

1  20
2 400
3 8902
4 197281
5 4865609
6 119060324
7 3195901860
8 84998978956
9 2439530234167
10 69352859712417

Those values are correct and if your engine does not return the same figures, something is wrong.

My Perft code looks like this:
System.out.println(miniMax(board, 5));

private long miniMax(Board board, int depth)
long nodes = 0;
if(depth == 0) return 1;
Vector moves = board.generateMoves();

for(int i = 0; i < moves.size(); i++)
nodes += miniMax(board, depth-1);

return nodes;
As you can see it is a mini-max routine which returns '1' on every leaf of the tree. Every node collects the result, and when the search is done you have the final result which is returned and printed.

If you get a conflicting result, something is wrong with your move generation and you will have to go bug-hunting.

Divide the trouble

If you get a result that differs from the validated amount of nodes at a certain depth you know you have a bug somewhere. But unfortunately the number says nothing about where it might be.

To make it easier you can write a 'divide'-method that instead of just printing the total number of nodes, prints every initial move, and then the number of child moves to it. The output from the initital position could look like this:
Searching with depth 5

Move Nodes
Nc3 234656
Na3 198572
Nh3 198502
Nf3 233491
a3 181046
a4 217832
b3 215255
b4 216145
c3 222861
c4 240082
d3 328511
d4 361790
e3 402988
e4 405385
f3 178889
f4 198473
g3 217210
g4 214048
h3 181044
h4 218829

Total nodes: 4865609
Now you need to compare it to another verified engine with a divide function. Sharper has it and also Roce. I used Roce when I verified my results, just start the .exe-file and write 'help' to get the commands.

When you see a conflicting result for some move, you play it on the board and do another divide-search with one less depth. And so on until you reach depth 1 and you will see all the moves and what move you are missing.

It takes time

Testing deeper depths can take a lot of time. We are talking hours and days for 10 ply searches. This is of course because no part of the tree gets cut off like in alpha-beta searches. And even though we do no kind of evaluation in the end, there are insane amounts of nodes to find (from the initial position 10 plies results in nearly 70 trillion nodes).

You will find any errors you might have eventually so it is well worth it. But I recommend you find some good test suite and run every position to about depth 6. This should catch all bugs with alot less time.

Here is the test suite from Roce.

perftsuite.epd (right click and choose 'Save as' to download the file)

To run the test suite automatically you will of course have to do some more programming or just take the fen-strings from the file and check the results manually. I leave that to you.


Anonymous said...

I don't know if you noticed but in the perftsuite.epd line #60 and line #63 and exactly the same.

Anonymous said...

Hello -- first, thank you for this wonderful blog -- it has inspired me to get into chess programming myself (I just have a move generator at the moment). Chess programming is so challenging, yet fascinating at the same time.

Anyway, I wanted to add to this post -- you can speed up your perft function significantly (roughly 30% in my experience) by simply returning the number of moves if depth == 1. This way, you avoid calling your do/undo functions for a great number of positions (all of your leaf nodes). It makes sense because what's the point of doing/undoing moves if all you're going to do for depth == 0 positions is return "1"?


m = movegen(pos)
if depth > 1 {
for every move {
nodes += perft(pos)
} else {
return m
return nodes

Unknown said...

Perhaps you can confirm that, but it seems that stats given in perft suite file are false if you take into account the draw by insufficient material to checkmate. For line 3 for example stats are 15/66/1197/7059/133812/763438, with detection of insufficient material it should be 15/66/1197/7059/133987/764643.

Unknown said...

Correction of my previous post: stats are 15/66/1197/7059/133987/764643 should be 15/66/1197/7059/133812/763438

Unknown said...

Perft must not take into account draws by repetition

Unknown said...

Hey Everyone,

I have implemented command line tool for testing chess engine.

It takes list of cases from this post and sends it to your engine comparing the result with etalon engine. If there is some mismatch it will make a move (updates fen) and tries again with new fen.

As result it will show you FEN of board that your engine do not generate valid moves.

Check it here https://chessdiag.codeplex.com/