Nov 12, 2011

[Tournament] GECCO - Standings day 1

    Name              Rating Score Perfrm Upset  Results 
------------- ------ ----- ------ ------ -------
1 +Spike [1872] 3.0 [2168] [ 10] +10w +03w +04b
2 +Nightmare [1833] 2.5 [2060] [ 24] +08w =04b +07w
3 +Rookie [1747] 2.0 [1874] [ 0] +09w -01b +06w
4 -Tornado [1882] 1.5 [1793] [ 0] +05b =02w -01w
5 +Baron [ 0] 1.5 [1793] [2587] -04w =07b +09b
6 -Deuterium [ 0] 1.5 [1748] [2587] =07b +10w -03b
7 -Goldbar [1824] 1.0 [1594] [ 0] =06w =05w -02b
8 -Spartacus [ 0] 1.0 [1594] [1675] -02b -09w +10b
9 +Mediocre [ 0] 1.0 [1565] [1675] -03b +08b -05w
10 -microMax [ 0] 0.0 [1340] [ 0] -01b -06b -08w

Mediocre's walkover was against Rookie which I really thought I had a chance against. Too bad.

I guess MicroMax should be possible to beat and then we'll see what the other opponents are. Looking at the board it would be Deuterium and Goldbar. With some luck perhaps a 4.0 score isn't too impossible.

We'll see tomorrow.

[Tournament] GECCO - Game 3

Game 3 underway against The Baron. Have no high hopes for this one. :)

-

A solid loss as expected, but Mediocre played quite well I'd have to say. Ended up with some over extended pawns and the kings on the wrong side (The Baron had a pawn majority on the queenside, making the pawn ending a really simple win).



Game 4 starts tomorrow at 8:30 CET.

[Tournament] GECCO - Game 2

Spartacus played weird in the end game but still almost held the draw due to opposite colored bishops.

I have only a 20% adjustment towards draw for opposite bishops.. might be slightly too little, but rather too little than too much I guess.


[Tournament] Mediocre in GECCO

Mediocre is participating in a long time control tournament today and tomorrow.

http://marcelk.net/chess/GECCO/2011/GECCC.html

Unfortunately I had connection issues during the first game and had to forfeit it. Second game now, against Spartacus, seems everything is going fine, 14 moves in and Mediocre says up with +1.75. :)

I'm using a few weeks old version of Mediocre, with the changes to search but none of the recent evaluation dabbling.

Nov 7, 2011

[Info] Yay me

Up to my 10th failed attempt at tuning my passed pawn eval.

The last attempt I wasted 20,000 games.

I have tables looking like this:

Rank: 1 2 3 4 5 6 7 8
Value: {0,10,20,30,60,120,150,0}

That is increasingly higher evaluation the closer the passer is to promotion.

This table can than be stretched in all kinds of directions during the tuning (increasing/decreasing all values, or increasing the differences between them) using two "knobs", so the table only needs two values to tune instead of six.

Now, I had reversed the values when preparing for the tuning... so instead of giving 150 centipawns for being one square from queening I gave it 10.

The tuning tried to compensate and the best it came up with was:

Rank: 1 2 3 4 5 6 7 8
Value: {0,-83,-60,-14,-9,17,25,0}

Quite good effort, but I find it hard to believe a 8 cp difference for 6th and 7th rank is optimal.

Fixed the problem and running the tuning for the 11th time. :)

Nov 6, 2011

[Info] Fun little problem

Still tuning, moved on to passed pawn eval which is another of those problem areas (I've always thought Mediocre neglected passed pawns, but any attempts to manually tune it has resulted in heavily overvaluing them).

While running my tests I ran into a little interesting problem. Since I have aspiration windows it's quite common that researches occur (when the result is outside the window, i.e. a sudden drop/rise in evaluation between iterations).

Now if not careful it's possible that this window bounces back and forth, i.e. failing too low, then too high and never getting passed the iteration since it keeps researching.

I have all those security measures in place, but with the extreme numbers in evaluation the tuning can come up with, the score was so high it surpassed the "infinity" score. Now this will obviously fix itself after a few iterations of the testing (giving a passed pawn the value of 20 queens is probably not going to help), but my aspiration windows went berserk since I check it like this:

if(eval <= alpha) {
...
} else if (eval >= beta) {
...
}

With alpha and beta set to - and + "infinity" we have the maximum window that can never cause a research (mate in 1 is lower than infinity obviously). But as I said with these extreme evaluation parameters it did.

Easy fix, just a bit silly and quite hard to find.

Nov 5, 2011

[Info] CLOP windows executable

Rémi Coulom was nice enough to create a windows executable of CLOP (I ran it through the Qt Creator before which was a bit silly).

If you haven't tried his software before I urge you to do it, it's quite aweosme:

http://remi.coulom.free.fr/CLOP/

Some changes in this version as well:

2011-11-05: 0.0.9
  • Stronger regularization (avoid overfitting in high dimensions)
  • "Merge Replications" option in gui -> faster, better display
  • Performance optimization of display and loading of large data files
  • Removed "-ansi" option for Windows compilation
  • Shrinking parameter ranges does not lose integer data any more
  • Removed confusing columns: max-1, max-2, ...
  • More explanations in the doc: biased win rate + GammaParameter

Link to the release post here.

[Info] The results are in

So I've run the mobility tuning overnight, with more than 10,000 games (probably too little for extreme accuracy, but good enough for me).

All of the four parameters (mentioned in my last post) started to zone in on the values very well. For example the first parameter looks like this:


The mean value was -370 which can clearly be seen in the plot.

So on to all of the results and some comparisons (sanity checks I guess). These were the resulting values (meaning of these explained in my last post):

  • MOBILITY_SAFE_MULTI = -370

  • MOBILITY_UNSAFE_MULTI = 783

  • MOBILITY_ONE_TRAPPED_MULTI = 767

  • MOBILITY_ZERO_TRAPPED_MULTI = 1343


I'll do four different examples.

  1. Average piece - a piece with 4 safe squares and 1 unsafe

  2. Good piece - a piece with 12 safe squares and 3 unsafe

  3. Bad piece - a piece on the fourth rank with 1 safe square and 1 unsafe

  4. Trapped piece - a piece on the fourth rank with 0 safe squares and 1 unsafe


So comparisons (using the examples above):

  1. 4*2+5 = 13 (before tuning)
    -370*4/100+783*5/100 = 25 (after tuning)

  2. 12*2+15 = 39
    -370*12/100+783*15/100 = 73

  3. 1*2+1-4*5/2 = -7
    -370*1/100+783*2/100-767*4/100 = -18

  4. 0*2+1-4*5 = -19
    -370*0/100+783*1/100-1343*4/100 = -46


So very reasonable numbers, just a bit higher in all directions just as I suspected (always nice when you have a guess and testing confirms it).

First interesting thing is the negative safe square number. I'm sure it's not trying to penalize safe squares, but rather put all bonus in the total squares (as those include safe squares as well), meaning safe vs unsafe squares is really not that important.

Second thing is my apparently good guestimate of giving a piece with one square half the penalty of a piece with zero squares. Which the tuning seems to confirm by almost doubling the factor (767 to 1343).

-

Time to run a test with all these new values. Will be very interesting.

Nov 4, 2011

[Info] The mobility tuning

I think the results of the mobility tuning is going to be quite interesting (and hopefully useful), so I'm going to scribble down the specifics of the test.

I'm going to test the following parameters:

MOBILITY_SAFE_MULTI
Every safe square that a piece can reach on the board, time this constant, divided by 100.

So basically:

MOBILITY_SAFE_MULTI = 500
Safe squares = 4
Equals 500*4/100 = 20 centipawns bonus

MOBILITY_UNSAFE_MULTI
Every safe squares, plus every nonsafe square (i.e. squares that are protected by less valued pieces). Same equation as above.

MOBILITY_ONE_TRAPPED_MULTI
If the piece on has one safe square it's penalized by the rank it's on plus 1. (and weighed as above)

MOBILITY_ZERO_TRAPPED_MULTI
If the piece has no safe squares it's penalized by the rank it's on plus 1. (and weighed as above)


After 1100 games I got the following numbers

MOBILITY_SAFE_MULTI = -141
MOBILITY_UNSAFE_MULTI = 865
MOBILITY_ONE_TRAPPED_MULTI = -174
MOBILITY_ZERO_TRAPPED_MULTI = -133

Not too happy about those negative numbers, but this would result in in the following scoring for say bishop with 4 safe and 1 unsafe squares (quite reasonable assumption):

-141*4/100 + 865*5/100 = 39

So basically it's favoring the unsafe squares and trying to reduce the safe squares evaluation. (I wonder if this is indicative of all squares, rather than safe/unsafe squares being preferable)

I'll leave it overnight and see what it comes up with.

[Plan] Tuning

I've started some tuning with CLOP which is an excellent piece of software. Of course I'd want a software completely focused on chess engine tuning, i.e. choose parameters, choose opponents, and receive optimal parameters and expected gain. But this very much good enough.

First tuned the futility levels which resulted in a quite expected (but hard to guess) increase in value of the shallow nodes. From:

120, 120, 310, 310, 400

to

210, 230, 260, 260, 460

So basically a higher margin before skipping searching nodes close to the leaves (remember futility pruning checks how far behind you are in the search and if the arbitrary margin doesn't get you back on the plus side, simply do not keep searching). And a slightly lower margin in nodes further from the leaves, and then slightly higher again for the nodes 5 plies from the leaves.

Pretty much what I expected (I borrowed the previous values from Crafty and generally thought they were a bit optimistic).

The next thing I tuned was the king positioning table in the endgame, which looked like this:

-20 -15 -10 -10 -10 -10 -15 -20
-15 -5 0 0 0 0 -5 -15
-10 0 5 5 5 5 0 -10
-10 0 5 10 10 5 0 -10
-10 0 5 10 10 5 0 -10
-10 0 5 5 5 5 0 -10
-15 -5 0 0 0 0 -5 -15
-20 -15 -10 -10 -10 -10 -15 -20

Again this was just randomly chosen (in this case I think I simply went with gut-feeling to pick the numbers). And I always suspected them to be too low, that is not giving enough credit for having the king in the center in the endgames.

Tuning gave this:

-187 -157 -128 -128 -128 -128 -157 -187
-157 -99 -70 -70 -70 -70 -99 -157
-128 -70 -41 -41 -41 -41 -70 -128
-128 -70 -41 -12 -12 -41 -70 -128
-128 -70 -41 -12 -12 -41 -70 -128
-128 -70 -41 -41 -41 -41 -70 -128
-157 -99 -70 -70 -70 -70 -99 -157
-187 -157 -128 -128 -128 -128 -157 -187

So a whole bunch lower, bit suprising... But bigger differences between center and edges (two pawns difference). I did a quick test of this, and over 200 games it seems it's certainly better (about 60% win rate over the old values).

I have a feeling my evaluation is so badly tuned that I'll be seeing a lot of these quite extreme numbers, and I might have to pass through all the variables a few times until I get them all right.

But I love this tuning business (which I've done very little of in the passed). Simply pass in a few parameters, wait a couple hours, and out comes an improved engine. No effort whatsoever. Silly really. :)

Next thing up is mobility which I have no idea how valid it is at the moment. Currently I do something like count the available squares for a piece, and give twice the number in centipawns along with half the number of unsafe squares (protected by lesser valued pieces).

This gives a really arbitrary number which I have no idea how good it is. Will be really interesting what CLOP comes up with.