Oct 15, 2011

[Test] Gauntlet engine test done

Turned out like this for v0.34:
    Engine         Score     
01: Mediocre v0.34 667,0/1000
02: Counter 1.2 70,0/100
03: Knightx 1.92 67,5/100
04: iCE 0.2 53,0/100
05: Horizon 4.4 42,5/100
06: TJchess 1.01 38,5/100
07: Roce 0.0390 30,5/100
08: Wing 2.0a 13,0/100
09: Adam 3.3 11,5/100
10: Bikjump 2.01 4,0/100
11: Lime 66 2,5/100

At the start I was debating whether I should have 8 or 10 engines, and since Lime and Bikjump are a bit too far behind I think I'll just drop those and go with 8. That would enable me to have 800 games played over night, with 100 against each engine.

Elo-wise it looks like this:
Rank Name             Elo    +    - games score draws
1 Counter 1.2 330 56 53 111 72% 22%
2 Knightx 1.92 305 60 57 111 66% 5%
3 iCE 0.2 208 55 54 111 55% 9%
4 Mediocre v0.34 168 20 20 1118 65% 7%
5 Horizon 4.4 104 55 56 110 42% 6%
6 TJchess 1.01 94 55 56 110 41% 9%
7 Roce 0.0390 20 56 59 110 32% 9%
8 Wing 2.0a -178 69 82 111 14% 4%
9 Adam 3.3 -179 69 82 111 13% 5%
10 Lime 66 -274 82 105 111 8% 3%
11 Bikjump 2.01 -288 85 111 111 8% 0%

Mediocre is ranked slightly too high in the field. But I'll stick with this (minus Lime and Bikjump), and add higher ranked engines later on.

3 comments:

John said...

Have you done any tests (or read any tests) about how closely short time control results mimic long time control results?

I suppose tuning the evaluation function should be fairly similar. But once tweaks start affecting the branching factor, it seems the gain/loss will depend on the time controls used, right?

Jonatan Pettersson said...

It seems most programmers of the highend engines use these super fast time controls for most testing.

And from what I've seen (and read) they mimic longer time controls very well.

The only case where you shouldn't use fast time controls is when you have a good reason to assume that there would be a difference (say when you change time management for example).

Not sure what you mean by tweaks that affect the branching factor.. changing the evaluation can result in completely different trees as well.

However, the point is if you play enough games it will all even out.

If you messed up the search tree and now can't reach as deep anymore, this will show with 0.25sec per move as well as 20minutes per move.

John said...

I meant instances where the gain might depend on the approximate depth of the engine. Consider a highly cooked up example where you implement some move ordering change that makes the entire code run 10% slower (i.e. irrespective of depth), but the branching factor is decreased by 2%.

So if you play at very fast controls to test this change, where depths reached are around 4-6, you might not notice any difference. But then at longer controls, the branching factor reduction will overcome the flat slowdown.