Oct 19, 2011

[Plan] Moving on to evaluation

I'm starting to be quite happy with the results of my rewritten search, a bit bigger test gave this:

Program Elo + - Games Score Av.Op. Draws
1 M1-1 : 2435 15 15 1508 59.9 % 2365 25.7 %
2 M34 : 2365 15 15 1508 40.1 % 2435 25.7 %

This is comfortably within the bounds for a my new engine being superior. The rating difference is probably inflated quite a bit due this being a self test (as Thomas Petzke pointed out), but nonetheless it's an improvement for sure.

So time to move on to the evaluation revamp.

Having looked through it for the first time in a couple years I remember how extremely complicated it is. :) With a lot of Ed Schröder's recommendations in it (not exactly simplifying things).

I honestly don't think a complete rewrite is plausible. The best course of action is probably to prepare it for some auto-tuning and then start playing massive amounts of games.

However, there are a few things that could be looked over before that.

  • Game phase interpolation (or tapered eval) - Seems every other engine has this now and I can clearly see the benefits. Basically don't switch sharply between the different phases of the game, but rather do it smoothly with separate scores for the different phases.

    Simply put, let's say we evaluate as a middle game and get an evaluation of +200 while an evaluation as endgame gives +400 (maybe the king is close to the middle which would give a penalty in middle game but a bonus in endgame). Now if we're clearly in the middle game (queens and rooks and minor pieces all over), we would just take the +200. And if we're clearly in the endgame (maybe one or two pieces) we take the +400. But if we're in the middle of transitioning to the endgame (perhaps we have a few minor pieces but also queens) we shouldn't take one or the other straight off. But rather interpolate between the two. If we decide we're exactly in the middle between middle game and endgame we would take a +300 score.

    I want to get this done first before poking anymore in the evaluation, since I'll probably need to split up a whole bunch of scores.

    I'll get back to this. (maybe a new guide is in order)

  • Endgame knowledge - Currently a KBKP endgame is evaluated heavily in favour of the side with the bishop (somewhere around +200 centipawns), when it clearly should be close to 0. KRKB give the side with a rook a clear win. And there's plenty more of that. Since I have no plans of adding tablebases (and even if I do I can't assume they'll always be present), I will need to put in some basic endgame knowledge.

  • Passed pawns - Something is not right here. Need to take a good look at how I evaluate them (a simple comparison to Crafty's evaluation gave far too low scores for Mediocre).

  • Tropism - I'm not sure how much is gained (or lost) from this, I just put it in because I could (it's a fairly simple piece of code). Maybe take a look at it and see how others do it.

  • Manual tuning - Find a good way to look at the individual evaluation parameters (even more so than the current one I have) and see if I can spot any obvious flaws. For example I'm quite sure the king's endgame piece square table has too low values (not rewarded enough for moving out the king in the endgame).

  • Auto-tuning - I've had a look at CLOP which seems really nice, and also Joona Kiiski explained how he tunes Stockfish (forum post), which was identified as SPSA but with self-play.

    However I decide to do it, I need to prepare Mediocre's evaluation for it. Which means access to more parameters from outside the evaluation.


So first, tapered eval.

No comments: