Mar 16, 2007

[Other] Evaluating new features

I ran a couple tournaments to test the new features LMR, SEE and an attempt at a more dynamic time management.

Trying LMR

The first tournament is a 200 game match between Mediocre v0.241b and a version with LMR that decides what to reduce based on the move being a capture or not, the current depth, and if the position before the move had one of the kings in check. It does this after searching four moves to regular depth (usually hash move, killer moves and one more).

Obviously this is a quite loose way of handling the reductions, so plenty of moves are going to get reduced. This made me unsure how sound the LMR was going to be, so the result came as a bit of a surprise:
   Engine           Score
1: Mediocre LMR 132,0/200
2: Mediocre v0.241b 68,0/200
The LMR really pays off it seems, and that is with hardly any limits as to what gets reduced after the first four moves.

This was the base of the assumption I made in my last post about the upcoming Mediocre v0.25b having a rating of about 2050 (about 100 points higher than v0.241b).

The 'previous position in check' seems quite useless as a condition when deciding what to reduce, I simply put it in there because we have the information free from the check extension heuristic. Better obviously, is checking if the move up for reduction is a checking move, and if it is, not reduce it. If we do not have that condition the check extension is effectively cancelled out.

So in the next test tournament I changed that and also added a couple of different setups with SEE.

Trying LMR and different setups of SEE

There are numerous ways to use the static exchange evaluation in the search (as well as in the positional evaluation, but I have not gotten to that yet).

It can be used for better sorting of both the quiescent search moves and ordinary search moves. Also in the quiescent search we can prune out losing captures since they should not come up with anything interesting (good sacrifices are handled by the ordinary search) as mentioned by Ed Schröder.

So we have a few possible combinations. In this tournament I decided LMR with 'current' in-check condition is better than 'previous' in-check (based on a few mini tests they surprisingly enough seems quite equal), so I went with that for all the SEE-versions. Every other condition for LMR is the same (4 moves to full depth, do not reduce 3 plies from the horizon, and do not reduce captures).
clmr -> LMR does not reduce checking moves
plmr -> LMR does not reduce previous positions in check
ss -> SEE sorting in ordinary search
qs -> SEE sorting for quiescent moves
qsp -> Same as qs but prunes losing captures
nt -> New time mangement (didn't work out too great)
If not mentioned, it is the same as in v0.241b (quiescent search being ordered with the ordinary MVV/LVA setup for example).

The tournament had a total of 375 games (25 games between all engines) and these were the results:
   Engine                     Score 
1: v0.25b (clmr,ss,qsp) 79,5/125
2: v0.25b (clmr,ss) 71,0/125
3: v0.25b (clmr,ss,qs) 66,5/125
4: v0.25b (clmr,ss,qsp,nt) 66,5/125
5: v0.25b (plmr) 57,0/125
6: v0.241b 34,5/125
The version with quiescent pruning seems to be the best, but worse with my new clumsy time management.

As usual we have too few games to make any real conclusions, but I will go for quiescent search pruning, it seems to make a difference for the better, and SEE ordering in ordinary search is clearly beneficial.

Splitting up the table for statistics of version vs. version is rather useless with such few games, but here is a table for the v0.241b against the others:
-----------------Mediocre v0.241b------------------
v0.241b-v0.25b (clmr,ss) : 6,5/25 4-16-5 26%
v0.241b-v0.25b (clmr,ss,qs) : 8,5/25 7-15-3 34%
v0.241b-v0.25b (clmr,ss,qsp) : 6,0/25 4-17-4 24%
v0.241b-v0.25b (clmr,ss,qsp,nt) : 5,5/25 4-18-3 22%
v0.241b-v0.25b (plmr) : 8,0/25 5-14-6 32%
In conclusion v0.241b got horribly beaten by all the new versions. :)


The last analysis of this tournament is done by Elostat, a utility by Frank Schubert for determining rating differences between the engines from PGN files. I set the starting rating to 2090 to get v0.241b to about a rating of 1953 as suggested by Le fou numérique.

These were the results:
  Program                   Elo   +   -   Score
1 v0.25b (clmr,ss,qsp) : 2172 56 55 63.6%
2 v0.25b (clmr,ss) : 2131 55 55 56.8%
3 v0.25b (clmr,ss,qsp,nt) : 2110 55 54 53.2%
4 v0.25b (clmr,ss,qs) : 2110 54 54 53.2%
5 v0.25b (plmr) : 2066 54 54 45.6%
6 v0.241b : 1952 60 62 27.6%
I cut out a few statistics to make the table fit, the layout of this page is not very forgiving. :) The '+' and '-' is the error margin for the elo-ratings. Using the error margins it seems Mediocre is about to gain somewhere between 105 and 338 elo points.

Not too bad I guess. :)


Anonymous said...

google "bayeselo" --- better than elostat

Jonatan Pettersson said...

Great I'll take a look at it.