Oct 26, 2011

[Info] Profiling

As a response to John's question in a comment in the last post, here's a breakdown of where Mediocre spends its time. Sorted on time spent in each method (based on a 30 second snapshot).



Keep in mind that this does not include subsequent method calls, it's just the time spent inside the actual method, so including subsequent calls, the evaluation would look like this:



Basically more than half the time spent in evaluation, and a third of that is generating attacks for the different pieces.

Also high in the list is the Board.isAttacked-method, due to "is in check" calls in every node in the search, i.e. isAttacked is mostly used to see if the king is in check.

I do think the things related to making and generating moves are in a good place, around 15% of the total time. Maybe slightly high but not too bad.

If I used bitboards instead, with same (or better) performance in generating and making/unmaking moves. I would gain an obscene speed upgrade for "free", as the attack-checks cost a fraction compared to my current setup.

20 comments:

Thomas Petzke said...

I was also trying to get more speed out the engine lately. I have tried a lot, most of it failed and the 1 thing that worked messed up the code quite a bit. And I'm not sure whether its worth it.

In Olivers ChessWar (Div G) plays an engine called Sjakk, it is written in Visual Basic and runs with less then 100k nps. It scores better than my iCE that runs at 700k+ nps. After a chat with the author I thought a lot whether putting my efforts in speeding up the engine brings the biggest bang for the buck.

Jonatan Pettersson said...

You're right in that chasing tiny speed upgrades is probably not worth it. For a fairly sloppy engine (like Mediocre) it would take hundreds of tiny changes to actually notice a substantial difference.

However, comparing Sjakk to iCE like you do is skewing the analysis a bit. If you slowed iCE down to 100k nps or sped Sjakk up to 700k, I'm positive you'd see a huge difference in strength. It would be like playing one engine on a seven times slower/faster computer.

What I'm proposing though, is changing the board represenation to something that's as fast, or faster, and then for virtually free cut 33% out of the evaluation time (my attack generation bits).

Considering my perft6 takes almost 40 seconds I think I have even more to gain but looking it over.

Keep in mind that this isn't done by removing any logic at all, the engine would play exactly the same when using fixed depth. Just 20-30% faster (hopefully).

Thomas Petzke said...

I read somewhere that 20% speed increase gives you about 13 ELO points. Whether it's worth depends on the effort you have to undertake to get that 20%.

If I see an engine with 100k nps that plays better then my engine it shows that there is a lot of potential beyond speed that I have not lifted yet. This makes me think that trying to speed up the engine should not be a top priority for me.

But iCE calculates perft 6 in 3.4 sec (1.1 sec when the hash table helps) so I squeezed out here a lot already. This might be of course totally different for you.

Jonatan Pettersson said...

Yes, 13 elo point for 20% faster engine seems about right.

(I seems to remember someone saying 70 points for 100% increase)

So let's say I could get my perft 6 down to close to yours, say 4 seconds. That would mean 1000% speed increase and a 700 elo gain right? :)

But yeah you're right, chasing speed increases is hardly to way to go if you have other issues going on (like I'm sure most engines below say 2900 elo or so has).

Abel Belzunces said...
This comment has been removed by the author.
Abel Belzunces said...

Hi Jonatan

Nice you are working back on your engine, I've noticed that at least the latest public version does not handle correctly this position:

-8/8/8/2p5/1pp5/brpp4/1pprp2P/qnkbK3 w - - 0 1

It cannot find the mate.

Jonatan Pettersson said...

Oh interesting, thanks. I'll have a look.

Thomas Petzke said...

iCE doesn't find the mate either, but this is probably rather good. Having tuned the engine to handle such an artificial position would hurt strength in real games. Underpromotion to a knight usually isn't that helpful.

Anonymous said...

Hi Jonathan, Glad you are back!

One way to increase the perft performance is to avoid making every move twice. ATM, you make the move, check legality, unmake the move and then make it again when traversing down the tree. If you move the legality check to the traversing down part you can avoid the first move. This won't help the performance during games, though...

GeorgeK

Jonatan Pettersson said...

You bring up an interesting point "Anonymous".

My perft 6 takes 40 seconds, but I haven't done -anything- to improve on the actual execution of the perft procedure.

That obviously means no caching whatsoever, but no other optimizations whatever they may be.

I simply generate all legal moves, make the move, and call recursively. Nothing more nothing less.

To be honest I'm not that interested in improving the perft speed just for the sake of it, unless it's due to improvements in the actual parts that are used in the real search.

But that means I might not be able to compare to other engines as easily. I don't know.

Jonatan Pettersson said...

On that mate position I'm quite surprised it didn't find it, but I guess the under promotions are really far down in priority so it gets hard.

Moving the pawn to h7 made Mediocre find the mate, in 35 seconds and 49 plies, so it's not completely blind to it.

But as Thomas says, it would certainly be possible to change the search around to find this mate really fast, but that would hurt "normal" searches way too much.

Thomas Petzke said...

If you execute each move via makeMove() in order to check whether it is legal you do way to much (e.g. all that hash signature update stuff).

In most cases it is pretty easy to determine whether moves are legal. When I start to generate moves I do a quick check first whether the side to move has potentially a pinned piece. If there is none I do no validation at all, all generated none king moves are valid.

Even if there is a possible pin you can tell pretty easy that most moves are safe because they have no potential to expose the king to an enemy slider.

At the end only a small fraction of the moves requires a complexer validation.

Generating pseudo legal moves and detect in search that they are not legal might be still a bit faster but I never actually considered executing illegal moves. This just does not feel right.

Jonatan Pettersson said...

What I currently do (in search) is generate pseudolegal moves and then make each of them on the board and before using it, see if the king is in check.

It's a really easy way to do it, but I guess if I could check for pinned pieces that could have some potential.

I'm worried that my 0x88 board representation might not let that be done so easily though.

Matthew Brades said...

If you want to change to bitboards - do it in Sagan. IIRC it is much easier
to bit-twiddle etc. in C than it is in Java.

Just a thought.

Matthew:out

Jonatan Pettersson said...

Yeah you're probably right. Might as well pick up Sagan again if I want to change to bitboards.

I actually spent some time abstracting out the board representation in Mediocre and it wasn't impossible, but quite cumbersome. Essentially nothing in the evaluation can be saved, too much depending on the board.

Anonymous said...

Out of curiosity, what do you think is missing from the Java bit twiddling? The most painful for me was so far that you can't use booleans as ints, but the rest is OK, I think.

GK

Thomas Petzke said...

The way you do it is easy but can be optimized without sacrificing the design. The inCheck() call to see whether your move was legal is probably expensive and you have to call it twice because you want to know whether the side to move is in check too (for extensions and this kind of stuff).

Or was the inCheck() test the one thing that was trivial with 0x88, then never mind. In bitboards it is not so I came up with those upfront validation ideas.

Matthew Brades said...

Me again,
Do you use a precomputed attack array as described on Bruce Moreland's site?

That might improve matters.

Matthew:out

Abel Belzunces said...

Such an artificial position is not meant to be tuned, but it helps to find strange errors, on the other hand all "big" engines find the mate without special tunning...

Alexander Litvinov said...

Hi. Nice blog. I learned a lot from some of your guides. I'm writing a chess engine in C # and I have yet ready move generation, NegaScout method and the transposition table. But I'm worried about the performance. While my engine is able count on depth 6 in acceptable period of time and my perft 6 takes 119 seconds. I need to improve the move generator. Can you describe the way moves generation into your engine or give links to articles on which you learned?
Sorry for my bad english.