I've been writing a paper for an evening course I've been taking, related to chess engine searches.
To get a clean output of the search I had to turn off most of the features, like killer moves, PVS search, aspiration windows etc.
Funny thing, when I was going to turn off the futility pruning, I noticed it was already turned off... :) I apparently accidentally returned false for "use futility pruning" even when it met the requirements.
That means Mediocre v0.4 is playing without it.
I ran a quick 128 game test and turning it on seems to gain some 30-40 elo points in self play. Not too huge, but definitely silly to not have.
I'll be sure to turn it on again in the next release. :)
Dec 7, 2011
Nov 29, 2011
[Info] Jim Ablett's compile of Mediocre v0.4
Jim has compiled Mediocre v0.4 and I added it to my sourceforge page.
I haven't had time to test it myself, but previous experience has it that Jim's compiles are far stronger than the Java version, so I'd recommend using that.
Jim's page
Mediocre v0.4 JA compile
I haven't had time to test it myself, but previous experience has it that Jim's compiles are far stronger than the Java version, so I'd recommend using that.
Jim's page
Mediocre v0.4 JA compile
Nov 27, 2011
[New Version] v0.4 - Ponder, revamped search, UCI only
Changes:
Mediocre is as mentioned an UCI only engine from here on. This also means I've removed old settings file, use the UCI settings commands mentioned in the readme file.
Download here
- Any hash move used is now verified, this fixes a very rare occurrence of Mediocre crashing
- The transposition table is now using the full 64 bit zobrist keys
- The search was completely rewritten, possibly catching some bugs. Should show help quite a bit in playing strength
- Ponder implemented
- Removed the dependency of a settings file, things like hash sizes are now done through the UCI protocol
- Removed the semi-working xboard protocol entirely. Sorry.
Mediocre is as mentioned an UCI only engine from here on. This also means I've removed old settings file, use the UCI settings commands mentioned in the readme file.
Download here
Nov 25, 2011
[Info] Testing results
So some testing to confirm I didn't do anything silly.
M1-1 is a version with 64 bit zobrist keys in the transposition table, removal of the notion of "row" and some evaluation fixes. But without the tapered eval. (see previous posts for more info)
Against the Mediocre v1.0 beta it turned out like this:
So pretty much equal, which is good enough. The worst scenario here would be the beta version being slightly stronger, but that should only be at most with a few elo points.
And against some other engines just to confirm.
The newer version seems to be holding up.
I'll release a new version with this during the weekend, probably on Sunday.
Then I have a steady foundation to start tackling the evaluation again.
M1-1 is a version with 64 bit zobrist keys in the transposition table, removal of the notion of "row" and some evaluation fixes. But without the tapered eval. (see previous posts for more info)
Against the Mediocre v1.0 beta it turned out like this:
Program Elo + - Games Score Av.Op. Draws
1 M1-1 : 2401 6 6 11029 50.4 % 2399 24.5 %
2 M1B : 2399 6 6 11029 49.6 % 2401 24.5 %
So pretty much equal, which is good enough. The worst scenario here would be the beta version being slightly stronger, but that should only be at most with a few elo points.
And against some other engines just to confirm.
Program Elo + - Games Score Av.Op. Draws
1 counter : 2593 15 15 2048 76.4 % 2389 23.4 %
2 M1-1 : 2392 8 8 6154 48.2 % 2405 14.4 %
3 adam : 2337 14 15 2048 42.5 % 2389 9.3 %
4 bikjump : 2294 15 15 2048 36.6 % 2389 10.6 %
Program Elo + - Games Score Av.Op. Draws
1 counter : 2580 14 14 2048 75.1 % 2388 25.4 %
2 M1B : 2390 8 8 5854 47.5 % 2407 15.8 %
3 adam : 2343 14 14 2048 43.7 % 2388 9.3 %
4 bikjump : 2290 16 16 1748 36.4 % 2388 12.1 %
The newer version seems to be holding up.
I'll release a new version with this during the weekend, probably on Sunday.
Then I have a steady foundation to start tackling the evaluation again.
Nov 23, 2011
[Info] So wrong again, but at least closer
So yeah, my imagined strength increase mentioned in the last post was non-existent of course.
But, the tapered eval seems to be holding up as the culprit of my recent failures.
I've tried to zone in on the exact version after Mediocre v1.0 Beta that did the best. With all kinds of combinations with and without 64 bit hash tables, tapered eval and removal of the notion of "row".
The results are... inconclusive.
However, it seems a version with everything except the specific addition of tapered eval seems to be playing at least equal with the beta version. So I think I'll just go with that one. Do a new release (to get a firm base to build from). And then start with my evaluation tampering.
I'll post some testing results in a day or two. (not going to leave any doubt this time)
But, the tapered eval seems to be holding up as the culprit of my recent failures.
I've tried to zone in on the exact version after Mediocre v1.0 Beta that did the best. With all kinds of combinations with and without 64 bit hash tables, tapered eval and removal of the notion of "row".
The results are... inconclusive.
However, it seems a version with everything except the specific addition of tapered eval seems to be playing at least equal with the beta version. So I think I'll just go with that one. Do a new release (to get a firm base to build from). And then start with my evaluation tampering.
I'll post some testing results in a day or two. (not going to leave any doubt this time)
Nov 18, 2011
[Info] Importance of thorough testing
Lately I've been struggling with one of those "super versions" that seems to beat everything I throw at it.
When I got done with my search improvements I did some really extensive testing against Mediocre v0.34 and concluded the new version to have pretty much exactly 60% win rate against it.
So I tagged that version and called it Mediocre v1.0 beta.
Then I committed three things to the trunk of svn: renaming of row to file, tapered eval and the change from 32 bit to 64 bit keys in the transposition table (along with a sanity check of all tt moves).
I thought I'd tried all of these extensively, scoring more or less equal to v1.0 beta, which I deemed ok since the changes were more or less needed for readability, stability, and future work.
During the passed weeks any change I did, no matter how tiny it seemed, got slaughtered by 1.0 beta. All my evaluation tweaking seemed to give results, but against 1.0 beta it still lost.
Now, the newer (uncommitted) versions had some utility changes that I really wanted to have committed (things like the mirror evaluation test). So I took those changes and added them to the 1.0 beta tag one by one, testing quite extensively between every change.
After I'd moved over all the utility, I thought I might just as well try the three things I'd committed after 1.0 beta. This is how that testing went:
So the moral of the story. Never assume you did enough testing if you see signs that you didn't.
When I got done with my search improvements I did some really extensive testing against Mediocre v0.34 and concluded the new version to have pretty much exactly 60% win rate against it.
So I tagged that version and called it Mediocre v1.0 beta.
Then I committed three things to the trunk of svn: renaming of row to file, tapered eval and the change from 32 bit to 64 bit keys in the transposition table (along with a sanity check of all tt moves).
I thought I'd tried all of these extensively, scoring more or less equal to v1.0 beta, which I deemed ok since the changes were more or less needed for readability, stability, and future work.
During the passed weeks any change I did, no matter how tiny it seemed, got slaughtered by 1.0 beta. All my evaluation tweaking seemed to give results, but against 1.0 beta it still lost.
Now, the newer (uncommitted) versions had some utility changes that I really wanted to have committed (things like the mirror evaluation test). So I took those changes and added them to the 1.0 beta tag one by one, testing quite extensively between every change.
After I'd moved over all the utility, I thought I might just as well try the three things I'd committed after 1.0 beta. This is how that testing went:
- Row to file change: This should just have been a readability change (the usage of "row" had lingered around since the very first version of Mediocre, while the correct terminology is of course "file"). But it turned out while doing this I'd changed the rank, file and distance methods to static (rather than instance methods). This seems to be a very good move since they're called a lot, and suddenly 1.0 beta was playing better, quite a bit better.
- 32 to 64 keys and hash move validation: I thought if anything, this would be the culprit since messing around with the transposition tables is very likely to introduce bugs. Now when re-adding it, it seems to give a tiny but noticeable strength increase..
- Tapered eval: Horrible horrible reduction in strength. I have no idea how I missed this, but it seems to completely ruin the evaluation. Here's the actual culprit and I'll be much more careful when trying to put it back.
So the moral of the story. Never assume you did enough testing if you see signs that you didn't.
Nov 14, 2011
[Tournament] GECCO - Final results
1 Spike wwbwbw xrtnbd 111==1 5
2 Nightmare wbwbbw ctgsrb 1=1=1= 4.5
3 Tornado bwwbbw bnsdgm 1=0111 4.5
4 Rookie -bwbwb msdbnc 101=01 3.5
5 Baron wbbwwb tgmrsn 0=1=== 3
6 Goldbar wwbbwb dbnctx ==0101 3
7 Deuterium bwbwwb gxrtcs =10010 2.5
8 Mediocre -bw-bb rcbxxt 010010 2
9 Spartacus bwbwbw nmxgdr 001000 1
10 micro-Max bbw-ww sdcmmg 000100 1
Not what I'd hoped for, but with two forfeits I guess that's what I deserve. Atleast Mediocre won the two games it should and played very well against The Baron, while pretty horrible against Tornado.
Next time Mediocre will be in the top half. :)
[Tournament] GECCO - Game 6
Bit unlucky with the pairing and got Tornado here. Mediocre had the bishop pair and felt quite comfortable but underestimated the insanely strong white knight that ultimately lead to an unstoppable pair of passed pawns. Not much to say about this loss, Tornado was just better.
[Tournament] GECCO - Game 5
A second chance against micromax. Started out a bit crazy and then turned in to an endgame where Mediocre had the upper hand from the start.
[Tournament] GECCO - Game 4
Forfeit against micromax... yeah I overslept (and was a bit hungover after a late saturday night...), was connected to the server but for some reason Mediocre couldn't start the game. No idea why.
Subscribe to:
Posts (Atom)