The basics: We start with the basics. The min-max.

If we think a bit about what the engine should do, the answer should be something like; play through the the possible moves and answers as deep as we want to go and then evaluate the position.

This is quite true actually. Take a look at the image to the left. White makes a move, black answers, white makes another move and evaluates the position.

The best evaluation is the line abd, scoring 14. So white might decide to play the inital move a. But we can not forget we have an opponent as well, and he tries to minimize the evaluation.

So black will pick the moves that will result in the lowest score, while white picks the moves that results in the highest score. We can call them min and max instead, and there you have the name for the algorithm.

I illustrate this in the image to the left. We look at it from the bottom up (since there is where the evaluation starts). As you can see, in the left tree white (max) chooses the moves a and d in response to black's (min) a and b moves, since they give the highest evaluations.

Now black chooses his move a, since this gives the lowest evalation. We now have the final evaluation 3 of the left tree that starts with white's move a.

We now do the same thing in the right tree, and as you can see the evalation of white's move b is 4.

So white will play b, since that gives him (at worst) 4 in evaluation.

The engine will always assume the opponent making the best moves. Even if white had the chance of getting an evaluation of 14 by starting with the move a, if black accidently answered with b, we assume black will always play the correct move.

The problem with mini-max is we are not cutting off any lines at any point. We simply look at all the lines, make sure each side picks the right moves and finally we get an evaluation. And as stated in an earlier post, there are MANY lines to go through.

I will not give a code example here since I will not use this algorithm, or rather, I will not use it in this simple form, but an optimized version called Alpha-Beta.

An improvement: While min-max searches all possible lines that can arise from a certain move, Alpha-Beta uses a more intelligent approach that cuts off as much as 80% of the lines, without losing any accuracy at all.

To explain how alpha-beta search works I will use an example (slightly tweaked) from Bruce Moreland's page (link on the right).

This is how it goes; You (your name is Max, what else) and your friend (Min) has a number of bags filled with dollar bills. You will get to pick a bag and then Min will pick one bill from the bag and give it to you. Knowing Min is a greedy bastard you know he will give you the least valuable bill in the bag.

If you only knew about the min-max algorithm you would check every bill in all the bags, and pick the bag which had the most valuable lowest bill.

Now try the alpha-beta way instead.

You start with looking through the first bag. There you find a 100-dollar bill and a 20-dollar bill. Knowing Min will give you the 20-dollar bill if you pick this bag, that is what this bag is worth.

You now start looking in the second bag. First you find a 100-dollar bill, and since it is better than the 20-dollar bill in the first bag, the second bag is now the best so far. You then find a 50-dollar bill, it is worse than the 100-dollar bill, but still better then the 20-dollar bill from the first, so you still want to pick the second bag which is now worth 50 dollars.

So far it works exactly like min-max. Now you move on to the third bag, and you pick up the first bill which is a 10-dollar bill. Since this bill is worse than the 50-dollar bill from the second bag, you can discard the rest of the contents of the bag without looking at it. No matter what more you find in it, it will not matter since you will get the 10-dollar bill. There could be a 1000-dollar bill in there but you will not get it since Min will give you the 10. There could also be a 1-dollar bill in there, but that does not matter either since you have already discarded the bag as worse than the second bag.

So your final choice will be the second bag, and you saved the hassle of looking at all the bills in the third bag.

Putting it into practice: When writing the code we will have two numbers passed around, called Alpha and Beta (there is the name of the algorithm).

If you are Max and the alpha >= beta, you know that Min can force a position that is worse than the best you had so far.

The same applies if you are Min with the alpha < beta.

If any of these occur you can stop looking at the current move and move on to the next. This is like finding the 10-dollar bill in the third bag.

I will now show a bit of code that is used for alphaBeta search. This whole thing could be made by using two methods, one called max and one called min, and then passing numbers between them, having the checks between alpha and beta being like above.

But to get code that is easier to maintain I am going to make one method that will be called recursively and have a tweaked recursive call to simulate max and min switching side:

1 private int alphaBeta(int ply, int alpha, int beta)

2 {

3 if(ply == 0)

4 return positionEvaluation;

5

6 Vector legalMoves = generateMoves();

7 for(int i = 0; i < legalMoves.size(); i++)

8 {

9 makeMove(legalMoves.get(i));

10 eval = -alphaBeta(ply-1, -beta, -alpha);

11 unmakeMove(legalMoves.get(i));

12

13 if(eval >= beta)

14 return beta;

15

16 if(eval > alpha)

17 alpha = eval;

18 }

19 return alpha;

20 }

On line 1 we start with calling the alphaBeta-method with a certain depth we want to use, a very low number for alpha, and a very high number for beta. The numbers for alpha and beta should be lower and higher than the mate value, to make sure we get new values for them when searching.

On line 3 we check if we reached the depth we wanted. If we did we evaluate the position and return the number.

On line 6 and 7, if we did not reach the depth yet, we generate all legal moves in the position and go through them one by one.

On line 9 we make the move on the board.

On line 10 we call the method recursively. As you can see the call is tweaked a bit, this is done to simulate switching from Max to Min. By putting -beta for alpha, and -alpha for beta and then negating the answer we get the right cutoffs from the checks below it.

On line 11 we unmake the move on the board to reset it.

On line 13 we have the beta cutoff. The evaluation of the move is 'too good' so we know that the opponent will never go this way and we can stop looking at the moves in this line. Of course this can both be Max and Min doing the cutoff, depending on where in the recursive call we are.

On line 16 we check if the move is better than our previous best move. If it is update the alpha.

On line 20 we are done with the evaluation and the resulting evaluation is returned. Since we assume Max was the one calling the method, we return the alpha.

I hope that was clear enough. There are not that many lines of code, but being a recursive function it can be very confusing.

Something about move order: If we ordered the moves so in every line we always look at the worst moves first, we would never exceed the beta and get a cutoff. If this happened the algorithm would work the same as Min-max since we would not be able to cutoff any lines.

If on the other hand we ordered the moves so we always looked at the best move first we would be able to cutoff pretty much all lines.

Since we can not know beforehand what moves will be the best we will have to make our best guess. One thing is to always evaluate captures first, they tend to be better than random moves.

Once we start with iterative deepening (more about this later) we can use passed results to order the moves as well.

What we will get: Once we have our alpha-beta search done, the engine can start playing. And even if we only use the most basic of evaluations (piece values) it should play fairly well, or atleast not get mated too easily.

So on to writing the alpha-beta search.

## 4 comments:

Hello,

your work is very interesting.

i have a little question : how you will use the alpha beta without a history class or arraylist of move ?

what protocol will you implement ? uci , xboard ?

Regards

Yves

The code in this post is only an example of how the alpha-beta works, you need program specific code to get the actual moves.

I wrote a bit about this in the "Extracting principal variation"-post.

The easiest way though is to simply add something like:

if(ply == PLY_DEPTH)

globalBestMove = currentMove;

That way when you are at the root and get a new best evaluation, you save the move that generated it.

I will be using xboard/winboard protocol. I am not sure how to make uci using Java, if anyone can help with that I'm eager ears.

Hi, i am unable to access your source code. Please, do you mind emailing it to mishaelharry@gmail.com

Thanks.

Great Article. thanks for sharing!

همراه وب | طراحی سایت | طراحی گرافیک سایت | طراحی قالب وردپرس | اخذ اقامت ترکیه | اجاره ملک در ترکیه | خرید ملک در ترکیه | ثبت شرکت در ترکیه | سرمایه گذاری در ترکیه

Post a Comment