Apr 29, 2007

[New Version] v0.32 - Hung pieces evaluation and general tweaks

Changes:
  • Killer moves verification to avoid generating unnecessary moves
  • The finalEval is now started with globalBestMove to make sure we always get the right move there (I'm not sure this was ever a problem, but just in case)
  • Passed pawn values halfed in middle/opening
  • Fixed bug with x-ray attacks for queens
  • Simplified castling rights code
  • Added time search to line input and fixed some bugs
  • Dynamic null move depth
  • Futility check moves outside while loop
  • Fixed bug with black queen on 2nd rank bonus
  • Increased bonus for rook on 2nd/7th rank
  • Searched moves instead of HASH_EXACT to determine LMR and PVS (results in more reduced moves)
  • Hung piece check in the evaluation
  • Tempo bonus in evaluation
Note: Of these changes the hung pieces in evaluation probably helped the most. I am still having trouble adjusting everything to work together but this release should be a noticeable improvement. More to come...

mediocre_v0.32

[Other] Some confirmed improvement

The changes I mentioned yesterday plus some tweaking to the late move reduction and principal variation search resulted in a noticeable improvement in a 200 game match against v0.311. About 50-60 elo points or so.

I did some tweaking to the aspiration windows as well and I am going to test it extensively today, hopefully it helps a bit as well and I can finally release v0.32.

Apr 28, 2007

[Other] Many attempted improvements

A couple of days ago I started testing every change I made for the new version one by one to see if I could find some obvious flaw in the code.

I implemented one change and ran a few test suites on it to see if it helped or made things worse, and then tried implementing one more and so on.

These are the changes I have tried so far:
  • Killer moves verification (avoids generating unnecessary moves)
  • Passed pawn values halved in middle/opening
  • Fixed a bug with queen's x-ray attacks
  • Dynamic null move depth
  • Slightly optimized futility pruning
  • Increased bonus for rooks/queen on 2nd/7th rank
  • Fixed bug with black queen on 2nd rank bonus
All these changes made no difference at all in self-play, and resulted in slightly worse results in the test suites.

However by testing them one by one I noticed the worse result in the test suites was only due to the last change. Without it the results were actually slightly better than without the changes.

The bug (as I mentioned before I think) was black queen getting the bonus for being on the 7th rank instead of on the 2nd rank. This is obviously a bug and should be fixed, but resulted in quite a few points less in the test suites I tried for some reason, this is just a coincidence and should not affect general play in that way.

So I am going to stick with these changes, even though they actually do not seem to affect playing strength at all. I will not release a new version until there is a clear improvement however, but at least I have gotten somewhere.

I have a strong feeling the dynamic null move pruning (reduces 3 ply if the remaining depth is deeper) helps more at longer time controls, but at least it does not hurt the play at shorter time controls.

Apr 24, 2007

[Other] Testing and testing

I have been playing around with a few new versions of Mediocre.

One version (which I call Mediocre v0.32full) includes all the tweaks I have tried so far, with things like; adaptive null move pruning, evaluation based LMR (it actually reduces more moves now), fractional plies, mate and move history in the board object, new extensions, restructured futility pruning, and some changes to evaluation.

I have had very mixed results with this version and until yesterday it played much weaker than v0.311.

Yesterday I tried only allowing null moves if the depth left was atleast 2 plies (which is the common way of doing it) and suddenly it started playing on par with v0.311. The results from the btm test suite is about as good as v0.311 but v0.32full searches about 1.5 deeper on average, which should be explained mainly by the greater number of moves reduced (apparently they are not all sound as of now).

I think there is a great deal of potential and there are probably a few more 'bugs' like this to work out.

Apr 20, 2007

[Other] Fixed

I have now updated and uploaded the correct files. Dangerous to do changes too early in the morning I suppose. :)

[Other] Whoops

I accidently updated the wrong sources so version 0.311 does not use null moves. I will upload the correct version in a few minutes. Sorry. :)

Apr 19, 2007

[New Version] v0.311 - Bug fix for winboard protocol

Changes:
  • Fixed a bug that made the engine enter an infinite loop if it was mated using the winboard protocol
  • Reworked the ended game draw detection using the winboard protocol (does not affect the search)
Note: This is merely a bug fix and nothing else has been changed, however in very fast games (less than 2 minutes or so) this version should play better than the previous while using the winboard protocol since the old end game detection was hideous. :)

mediocre_v0.311

[Bug] Fixed the winboard protocol problem

When using the winboard protocol along with the Winboard interface and Mediocre got mated the engine entered an infinite loop. This happened because Winboard sends the mating move to the engine while for example Arena does not, it simply states the result once the game is over.

So what happened was the mating move was played on the board and then Mediocre started searching, but since it had no legal moves at the root node (since it was mated) the searchRoot returned -INFINITY as evaluation which got caught by the window check and a research was made, and since the evaluation returned never changed from -INFINITY the loop never exited.

Some poor programming from my side of course. I changed it so the root moves are generated before the iterative deepening loop (since they never change in the current search) and if no legal moves are found an empty pv is returned and handled by the input loop.

Thanks to H.G. Muller for sending me some games that made it easy to locate the bug.

[Bug] Something with the winboard protocol

A few people have reported problems while using Mediocre with the winboard protocol.

I am quite certain the problem is with the 'end of game'-verification, I just have to locate it.

The reason I did not notice this earlier is I pretty much only use UCI when I run tournaments.

I will release a hotfix when I find the problem.

Apr 15, 2007

[Guide] Fractional plies

Fractional plies are used to make it easier to tune extensions (and reductions).

If you have an extension that might not be worth extending a full ply, you could extend it 0.5 plies instead and only if you get two of those in the same branch the branch is extended with one ply.

The easiest way to make this work is change the way the search handles the depth altogether.

In Mediocre the call to alphaBeta from the root will look like this:
eval=-alphaBeta(board,current_depth*PLY-PLY,-beta,-alpha,true,1);
Where PLY is a fixed number that represents a full ply, 16 for instance. So if we wanted to search to 10 plies we would call alphaBeta with 10*16-16 = 144, which means we call it with 9 plies left to go.

If we wanted to extend one ply we add 16 to the call, so the call would be depth-PLY+16.

To extend 0.5 plies the call would be depth-PLY+8 and so on.

When we run out of 'depth' the quiescent search is called and this happens pretty much automatically like this:
if(depth < PLY) do_quiescent();
Meaning we do not have a full ply left to search so we start the quiescent search (which does not keep track of depth so we do not have to worry about it there).

Fractional plies is very easy to implement (basically replace all depth-1 with depth-PLY) and is very handy to have.

[Other] Test sets

Up until now I have been testing additions to Mediocre by trying it on random positions and basically look at the node count and time to decide if the addition was an improvement or not.

Once I started to feel satisfied with a new version I ran as many games as I had patience for, somewhere between 100 and 1000, against the last version and a couple of other engines.

I never really ran into any problems using this crude way of testing since pretty much every addition and change resulted in a very noticeable improvement.

However recently I have been having some serious trouble determining if new changes actually improves the engine or not, so it is time to refine my methods a bit.

New way of testing

Quick test: Run the YATS test sets selected.pgn and tony.pgn at 5 seconds per position, and analyze the results with Pro Deo 1.4. This takes 3.5 minutes and gives a general idea of the effect of the change(s).

Mediocre v0.31 got the following results:
Testset        : selected.pgn
Level : 5 seconds
Engine : Mediocre 0.31

Positions : 26
Found : 1 (3.8%)

Maximum points : 260
Scored points : 145 (55.8%)

Maximum time : 2:10
Used time : 2:05

Testset : tony.pgn
Level : 5 seconds
Engine : Mediocre 0.31

Positions : 16
Found : 2 (12.5%)

Maximum points : 160
Scored points : 69 (43.1%)

Maximum time : 1:20
Used time : 1:17
Full test: Run the btm.pgn (Beat the Masters) YATS set at 1 minute per position. This takes 2 hours 46 minutes and should give a very good idea of the strength of the engine.

Mediocre v0.31 got this result:
Testset        : btm.pgn
Level : 60 seconds
Engine : Mediocre 0.31

Positions : 166
Found : 32 (19.3%)

Maximum points : 1660
Scored points : 993 (59.8%)

Maximum time : 2:46:00
Used time : 2:20:43
If this results in an expected improvement to the engine run a 100 3-minute game match between the previous version and the new version using the sherwin50.pgn test set (by Michael Sherwin). 100 games means two games per opening so each version gets to play both sides.

By using a fixed set of openings a lot of the randomness is taken out and it should be easy to determine if the new engine is an improvement and by how much.

100 3-minute games take somewhere between 5-10 hours, probably closer to 5.

Why YATS

YATS is a testing system created by Ed Schröder, that instead of giving 0 points for a wrong answer and 1 point for a right answer rewards 0-10 points depending on what move the engine chooses. 10 for the 'correct' move and less for alternative moves that might still be good.

For a mediocre engine, like Mediocre, this is a perfect way of analyzing its strength. It might not find the 'best' move very often, but usually find a decent alternative move in the same position.

This way even one analyzed position might supply some interesting information, an old version might find an alternative move that gives 2 points while the newer version finds a slightly better alternative that gives 5 points. The traditional test sets would reward 0 points for both versions.

Further reading on the YATS-site.

And stick to it

Having set up this testing structure I do not intend to change it for quite some time, except for perhaps adding a fixed depth search to the YATS sets as well.

By having it unchanged like this it will be much easier to determine individual strength of the different versions over time without having to run thousands of games just to avoid the inherent randomness of openings and in fact chess in general.

Apr 14, 2007

[Other] Stack done

I now have a working stack done for the Board class. It keeps track of trivial things like castling rights and en passant squares to make unmakeMove possible, but also what moves were played to reach the position and if the position is a check or not.

Things like determining if the last move played (i.e. in the last instance of the recursive alpha-beta) was a promotion, or a capture, or a check etc. are very simple now.

I should have done this a long time ago, but well, the first thing I wrote was the Board class and I simply did not realize what was really needed in it back then.

As I said the makeMove method now determines if the position before making the move is a check or not. I do not know if this is really nescessary, but considering pretty much every node uses this information in one way or another it is probably ok.

The problem is deciding if the next move to be played will be a check without actually playing it on the board. This is another matter though and I will probably implement a gen_checks to take care of that.

Anyway the way it works now the move generation is about 20% slower than without the checks, but all in all it should turn out to be slightly faster when used in the search. And most importantly it will make things like futility pruning and LMR much easier to adjust.

[Other] The joys of bug hunting

After some reconstructing of the makeMove, umakeMove, makeNull and unmakeNull methods I ran into a long streak of errors popping up in just about all positions.

I sat for a few hours commenting out all complicating code like transposition tables, null moves, futility pruning, quiescent search etc. trying to isolate the error but it kept appearing at the exact same spots. The perft numbers worked perfectly, so I had to look for the error in the search.

So I started printing the moves to see if I could find a pattern. I had found a position where the error appeared after 'only' 3000 nodes so the print out was atleast manageable.

At first I thought it was the legal move check that was the problem because that was the most obvious thing that was made differently from a plain perft search.

However, after 4 hours or so I concluded it had something to do with captures and it seemed to involve the hash move.

By then the time was 2am so I decided to try in the morning. And with a fresh mind I immedatiatly drew the obvious conclusion... hash moves depend on zobrist keys to work correctly of course, and there is a code segment in the makeMove method that updates the zobrist for captures.

And of course there the problem was, I had 'simplified' that part of the code and instead of removing black pieces from the zobrist I removed white and vice versa.

So it was the 'other' thing the search does differently execept for legal moves, it uses a transposition table. :)

This was so much fun I almost considered starting to put in intentional bugs just to enjoy solving them. Or not.

Apr 13, 2007

[Other] Some restructuring

I have taken a look at the different ways I could handle the extensions and it is a complete mess. :)

Currently the search is doing an inCheck at the beginning of each node to determine if we should extend for check. That inCheck is then used by null-moves and futility pruning for the current position.

After that futility pruning does another inCheck to determine if the next position will be in check, this is also done by the late move reduction.

So the worst case scenario does three inCheck lookups for one node. Which is just bad.

I have had a 'history'-array for quite some time that keeps track of passed castling rights, en passant squares and the fifty moves rule. Recently I extended that idea by adding another array that kept track of passed zobrist keys and captures to make the unmakeMove faster.

So basically there already is a stack keeping track of passed positions, only that it lacks information about inCheck and it does not remember what moves was made (only if they were a capture or not).

I am going to complete the stack with the needed information and make the inCheck while making each move so we always have that information.

I do not expect this to speed up the search that much, however the code will be much easier to modify which in turn will help make everything faster.

Apr 10, 2007

[Other] More work to be done

After 200 games the two versions finished with about equal score after a winning streak from v0.31 in the end.

I have a 'gut feeling' the v0.32 is be better, I can not see how it would not be. However at this point I am not looking for differences that take thousands of games to determine.

Mediocre is not nearly good enough to start chasing 1-2 elo points here and there, I want atleast 50-70 points (about 60% won games) before I am satisfied.

I think I will leave futility pruning and the evaluation as it is for a while and start looking at how extensions are handled. Currently Mediocre has only a very crude check extension, so I will take a look at extensions for mate threat (if null move returns a mate value the branch is searched more carefully), recapture, single-response and something regarding passed pawns.

I will need a new way of handling the extensions and probably make use of fractional plies for some of them. And while doing this I will take care of the repeated work Mediocre currently is doing for determining if a position is in check (it is done in the check extension, late move reduction and futility pruning, this should only be done once).

Off I go. :)

Apr 9, 2007

[Other] Settled for a futility

After some testing I decided which futility pruning I am going to use. These are the highlights:
  • Use full evaluation and margins 125 centipawns for frontier nodes and 500 for pre-frontier nodes, if the evaluation + margin does not reach alpha the move is considered for futility pruning
  • Consider the type of move, do not prune moves that
    • give check
    • move a pawn/queen/rook to the 7th rank
    • capture a bishop so the opponent loses the bishop pair
    • capture an pawn on the 2nd or 3rd rank
    • capture a queen
    The considerations are taken because these moves are likely to change the score more than average (e.g. moving a pawn to 7th rank will always make it a passed pawn, capturing the queen usually alters the king safety quite a bit etc.), thanks to Mark Lefler for these ideas
  • If the move is a capture make sure to add the value of the captured piece to the score and see if it still is below alpha minus the margin
  • If the move still is getting pruned after these considerations a final check is made to see if the evaluation is above the best evaluation of the node, if it is that evaluation is recorded so the node does not risk exiting without any evaluation
H.G. Muller pointed out to me that if one non-capture is futile, so is the rest so basically that check could be done before even generating the non-captures. However since Mediocre needs the move to be played on the board in order to determine if it is a checking move I can not do that just yet.

This setup is the best I found, but there does not seem to be a big difference between the implementations I tested. I really like the idea of not pruning moves that are likely to change the score however, and the way it is done here is very simple and effective.

Stumbling upon great things

Once I decided on this matter I went on to some other things, mainly bug fixes and such, but there was one thing that I stumbled upon that really seems to pay off. I wanted a pawn hash table since the pawn structure rarely changes and there is a lot of unnescessary work done in the evaluation.

However the implementation would take quite some work since the zobrist key for the pawns would have to be updated every time a pawn is moving and also the current pawn evaluation uses the king positions as well so that would have to be sorted out.

So I decided to try that later and instead I tried implementing a simple evaluation hash table that keeps track of passed evaluations so if we run into the position again that work does not have to be repeated.

This took about 5 lines of code and speeded up the search a huge amount. It almost feels like I did something wrong since I have read so much about how memory lookups are costly and barely worth it. But Mediocre took another little leap in strength with this change.

I will play a test tournament overnight and if the results are good I think it is worth releasing Mediocre v0.32.

Apr 8, 2007

[Other] Tweaking v0.31

I have yet to come up with the perfect implementation of the futility pruning in Mediocre. I ran a 300 game match between four different versions and the only conclusion was it is worth including captures in the futility, and I am already doing that. Using a limited material evaluation versus a full evaluation seems to be pretty much equal in strength.

There are a few more things to try out however, like not pruning pawns advancing to the seventh rank or similar, since they will most likely increase the score quite a bit.

There are also some adjustments to the evaluation that needs to be done. In v0.3 I had only a small bonus for rooks on the seventh rank since I did not check if there were actually pawns there (or opponent king on eight rank). Now in v0.31 I added that check so a rook on seventh should be valued higher than before.

The biggest flaw I found was with the x-ray attacks. A queen running diagonally into an own rook kept getting attack points behind it, this is obviously a bug and will be fixed.

Once I am happy with these things I will give candidate pawns another shot. I will most likely have to run a big test with them turned on and off to see if they are worth the effort in Mediocre. I am not expecting a big improvement from them in any case.

Apr 6, 2007

[New Version] v0.31 - Futility pruning and additions to evaluation

Changes:
  • Made a few arrays static in the SEE, speeding it up quite a bit
  • Futility pruning added
  • X-ray attacks in the evaluation (e.g. queen attacking empty squares behind own bishop)
  • Eval is adjusted towards 0 if a draw is likely
  • Pawns are valued slightly less but becomes a bit more valuable in endings
  • Knight outposts added
  • King tropism added
  • Rook/queen on 7th rank only awarded if enemy king on 8th or enemy pawns still on 7th
  • Fixed bug with bishop positioning and also one with pawn evaluation
Note: One particular implementation of futility pruning (and no additions to evaluation) performed slightly better against v0.3, however I believe this setup works better in general. Only testing will show so I am releasing this now since it is quite an improvement and make any needed adjustments in future releases.

Also I commented out the candidate pawns code. It follows the definition used in Fruit, but it got a bit messy and I did not notice any improvement from it. I will have to keep trying in that area.

mediocre_v0.31

Apr 5, 2007

[Other] Silly bug with bishops

I am currently reshaping the evaluation a bit and while doing that I noticed a very subtle bug. Instead of using the piece table for black bishops I used the table for black knights. This resulted in black thinking fiancetto was a bad thing since the knights get a penalty for standing on the G7/B7 squares, and also a smaller penalty for the D7/E7 squares (which are quite natural for the bishops).

I had been thinking about adding even more incentive for developing the bishops since it looked like they were rarely developed properly. This bug was probably the problem though.

[Other] New archive site address

Thanks to Wang Zhen the Mediocre archives now have a new ad-free home!

The new address is http://mediocrechess.varten.org/.

Apr 4, 2007

[Other] Finally futility

After hours of despair and a bunch of helpful tips that I could not get to work, I decided to post one of my implementations of futility on the Talkchess.com forum. And finally I got the mystery solved. :)

I had overlooked the possibility of all moves in a node fitting the requirements for futility pruning. In some of my implementations this caused a very faulty value being recorded for the node, and in other the mate detection thought no legal moves existed and returned the stalemate score (0).

After having sorted this out, the new implementation of fuility pruning in Mediocre is working like a charm.

I started a little tournament to test it out and the results so far looks quite promising:
   Engine            Score                              Me
1: Mediocre dev 0.31 20,5/30 ..............................
2: Mediocre 0.3 9,5/30 00=0=010101=110=0=00=0000=0001
And this is with futility pruning for only non-captures, so there is still some in there up for grabs.

Apr 2, 2007

[Other] Struggling with futility pruning

About two months ago I tried implementing futility pruning. I failed miserably and decided to blame it on the general structure of Mediocre at the time. :)

Once again I am trying to get it to work and obviously I am doing something terribly wrong. I have tried three different setups and run 100 games matches against version 0.3, all three setups resulted in weaker play.

The theory is so simple that I must be missing something.

Well I will keep trying this time.

Apr 1, 2007

[Other] Mediocre website is up

I uploaded the website. For now you can find it at http://hem.passagen.se/maragor/, sorry for the horrible ad covering half the page, it was the easiest solution. When a get a tad better cash flow I will probably get a better location for it.

It is very simple and only intented as a bit more structured way of reading the guides and downloading older versions of Mediocre, atleast for now.

As I said earlier I will be adding content to the site from the blog now and then.

One nice thing is the links to old Jim Ablett executables of Mediocre which I have not had up anywhere before.

Well enjoy. :)