Mediocre Chess: [Test] Gauntlet engine test done

Oct 15, 2011

[Test] Gauntlet engine test done

Turned out like this for v0.34:

    Engine         Score     
01: Mediocre v0.34 667,0/1000
02: Counter 1.2    70,0/100
03: Knightx 1.92   67,5/100
04: iCE 0.2        53,0/100
05: Horizon 4.4    42,5/100
06: TJchess 1.01   38,5/100
07: Roce 0.0390    30,5/100
08: Wing 2.0a      13,0/100
09: Adam 3.3       11,5/100
10: Bikjump 2.01   4,0/100
11: Lime 66        2,5/100

At the start I was debating whether I should have 8 or 10 engines, and since Lime and Bikjump are a bit too far behind I think I'll just drop those and go with 8. That would enable me to have 800 games played over night, with 100 against each engine.

Elo-wise it looks like this:

Rank Name             Elo    +    - games score draws
   1 Counter 1.2      330   56   53   111   72%   22%
   2 Knightx 1.92     305   60   57   111   66%    5%
   3 iCE 0.2          208   55   54   111   55%    9%
   4 Mediocre v0.34   168   20   20  1118   65%    7%
   5 Horizon 4.4      104   55   56   110   42%    6%
   6 TJchess 1.01      94   55   56   110   41%    9%
   7 Roce 0.0390       20   56   59   110   32%    9%
   8 Wing 2.0a       -178   69   82   111   14%    4%
   9 Adam 3.3        -179   69   82   111   13%    5%
  10 Lime 66         -274   82  105   111    8%    3%
  11 Bikjump 2.01    -288   85  111   111    8%    0%

Mediocre is ranked slightly too high in the field. But I'll stick with this (minus Lime and Bikjump), and add higher ranked engines later on.

3 comments:

John said...: Have you done any tests (or read any tests) about how closely short time control results mimic long time control results?

I suppose tuning the evaluation function should be fairly similar. But once tweaks start affecting the branching factor, it seems the gain/loss will depend on the time controls used, right?; October 26, 2011 at 5:56 AM
Jonatan Pettersson said...: It seems most programmers of the highend engines use these super fast time controls for most testing.

And from what I've seen (and read) they mimic longer time controls very well.

The only case where you shouldn't use fast time controls is when you have a good reason to assume that there would be a difference (say when you change time management for example).

Not sure what you mean by tweaks that affect the branching factor.. changing the evaluation can result in completely different trees as well.

However, the point is if you play enough games it will all even out.

If you messed up the search tree and now can't reach as deep anymore, this will show with 0.25sec per move as well as 20minutes per move.; October 26, 2011 at 2:02 PM
John said...: I meant instances where the gain might depend on the approximate depth of the engine. Consider a highly cooked up example where you implement some move ordering change that makes the entire code run 10% slower (i.e. irrespective of depth), but the branching factor is decreased by 2%.

So if you play at very fast controls to test this change, where depths reached are around 4-6, you might not notice any difference. But then at longer controls, the branching factor reduction will overcome the flat slowdown.; October 27, 2011 at 1:12 AM