Unit-3
Adversarial Search and Games
What is adversarial search explain with an
example?
•Adversarial search is search when there is an "enemy" or
"opponent" changing the state of the problem every step
in a direction you do not want.
•Examples: Chess, business, trading, war.
•You change state, but then you don't control the next state.
Opponent will change the next state in a way:
unpredictable.
Why is game playing problem considered AI
problem?
•Game Playing is an important domain of artificial intelligence.
•Games don't require much knowledge; the only knowledge we
need to provide is the rules, legal moves and the conditions of
winning or losing the game.
•Generate procedure so that only good moves are generated.
What are the characteristics of adversarial
search?
•Adversarial Search
•Two player.
•Turn taking.
•Zero-sum.
•Perfect information — deterministic, fully observable.
•Have small number of possible actions.
•Precise, formal rules.
What are the elements required to define a
game formally as a kind of search problem
explain it?
•A game can be formally defined as a kind of search
problem with the following components:
•The initial state, which includes the board position and
identifies the player to move.
•A successor function, which returns a list of (move, state)
pairs, each indicating a legal move and the resulting state.
What is game theory in
AI?
Game theory is basically a branch of mathematics that is used to
typical strategic interaction between different players (agents), all
of which are equally rational, in a context with predefined rules (of
playing or maneuvering) and outcomes.
Which is the best way to go for game playing
problem?
We use a Heuristic approach,
as it will find out brute force computation,
looking at hundreds of thousands of
positions.
e.g Chess competition between Human and AI based Computer.
AI Adversarial search
•Adversarial search is a game-playing technique where the agents are surrounded
by a competitive environment. A conflicting goal is given to the agents
(multiagent).
•These agents compete with one another and try to defeat one another in order to
win the game.
•Such conflicting goals give rise to the adversarial search.
•Here, game-playing means discussing those games where human
intelligence and logic factor is used, excluding other factors such as luck factor.
•Tic-tac-toe, chess, checkers, etc., are such type of games where no luck
factor works, only mind works.
•Mathematically, this search is based on the concept of ‘Game Theory.’
•According to game theory, a game is played between two players. To complete
the game, one has to win the game and the other looses automatically.’
•Techniques required to get the best optimal solution
•There is always a need to choose those algorithms which provide
the best optimal solution in a limited time. So, we use the following
techniques which could fulfill our requirements:
•Pruning: A technique which allows ignoring the unwanted portions
of a search tree which make no difference in its final result.
•Heuristic Evaluation Function: It allows to approximate the cost
value at each level of the search tree, before reaching the goal
node.
• Elements of Game Playing search
• To play a game, we use a game tree to know all the possible choices and to pick the best
one out. There are following elements of a game-playing:
• S0: It is the initial state from where a game begins.
• PLAYER (s): It defines which player is having the current turn to make a move in the
state.
• ACTIONS (s): It defines the set of legal moves to be used in a state.
• RESULT (s, a): It is a transition model which defines the result of a move.
• TERMINAL-TEST (s): It defines that the game has ended and returns true.
• UTILITY (s,p): It defines the final value with which the game has ended. This function is
also known as Objective function or Payoff function. The price which the winner will
get i.e.
• (-1): If the PLAYER loses.
• (+1): If the PLAYER wins.
• (0): If there is a draw between the PLAYERS.
• For example, in chess, tic-tac-toe, we have two or three possible outcomes. Either to win,
to lose, or to draw the match with values +1,-1 or 0.
•Let’s understand the working
of the elements with the help
of a game tree designed
for tic-tac-toe. Here, the node
represents the game state and
edges represent the moves
taken by the players.
•A game-tree for tic-tac-toe
•INITIAL STATE (S0): The top node in the game-tree represents the
initial state in the tree and shows all the possible choice to pick out one.
•PLAYER (s): There are two players, MAX and MIN. MAX begins the
game by picking one best move and place X in the empty square box.
•ACTIONS (s): Both the players can make moves in the empty
boxes chance by chance.
•RESULT (s, a): The moves made by MIN and MAX will decide the
outcome of the game.
•TERMINAL-TEST(s): When all the empty boxes will be filled, it will
be the terminating state of the game.
•UTILITY: At the end, we will get to know who wins: MAX or MIN, and
accordingly, the price will be given to them.
Types of algorithms in Adversarial
search
•In a normal search, we follow a sequence of actions to reach the goal
or to finish the game optimally.
•But in an adversarial search, the result depends on the players which
will decide the result of the game.
• It is also obvious that the solution for the goal state will be an
optimal solution because the player will try to win the game with the
shortest path and under limited time.
•There are following types of adversarial search:
•Minmax Algorithm
•Alpha-beta Pruning.
Minimax
Strategy
•In artificial intelligence, minimax is a decision-making strategy
under game theory, which is used to minimize the losing chances in a
game and to maximize the winning chances.
• This strategy is also known as ‘Minmax,’ ’MM,’ or ‘Saddle point.’
•Basically, it is a two-player game strategy where if one wins, the other
loose the game.
•This strategy simulates those games that we play in our day-to-day life.
•Like, if two persons are playing chess, the result will be in favor of
one player and will unfavor the other one.
• The person who will make his best try,efforts as well as cleverness,
will surely win.
•We can easily understand this strategy via game tree- where the nodes
represent the states of the game and edges represent the moves made by
the players in the game.
•Players will be two namely:
•MIN: Decrease the chances of MAX to win the game.
•MAX: Increases his chances of winning the game.
•They both play the game alternatively, i.e., turn by turn and following the
above strategy, i.e., if one wins, the other will definitely lose it. Both
players look at one another as competitors and will try to defeat
one-another, giving their best.
•In minimax strategy, the result of the game or the utility value is generated
by a heuristic function by propagating from the initial node to the root
node.
•It follows the backtracking technique and backtracks to find the best choice.
•MAX will choose that path which will increase its utility value and MIN will
choose the opposite path which could help it to minimize MAX’s utility
MINIMAX Algorithm
•MINIMAX algorithm is a backtracking algorithm where it backtracks to pick
the best move out of several choices.
• MINIMAX strategy follows the DFS (Depth-first search) concept.
•Here, we have two players MIN and MAX, and the game is played
alternatively between them, i.e., when MAX made a move, then the next turn is
of MIN.
•It means the move made by MAX is fixed and, he cannot change it.
• The same concept is followed in DFS strategy, i.e., we follow the same path
and cannot change in the middle.
•That’s why in MINIMAX algorithm, instead of BFS, we follow DFS.
•Keep on generating the game tree/ search tree till a limit d.
•Compute the move using a heuristic function.
•Propagate the values from the leaf node till the current position following
the minimax strategy.
•Make the best move from the choices.
•the two players MAX and MIN are there.
•MAX starts the game by choosing one path and propagating all the
nodes of that path.
•Now, MAX will backtrack to the initial node and choose the best
path where his utility value will be the maximum.
•After this, its MIN chance. MIN will also propagate through a path
and again will backtrack, but MIN will choose the path which could
minimize MAX winning chances or the utility value.
•So, if the level is minimizing, the node will accept the minimum value
from the successor nodes. If the level is maximizing, the node will
accept the maximum value from the successor.
Alpha-beta
Pruning
•Alpha-beta pruning is an advance version of MINIMAX algorithm.
•The drawback of minimax strategy is that it explores each node in the tree deeply
to provide the best path among all the paths.
•This increases its time complexity.
•But as we know, the performance measure is the first consideration for any
optimal algorithm.
•Therefore, alpha-beta pruning reduces this drawback of minimax strategy by
less exploring the nodes of the search tree.
•The method used in alpha-beta pruning is that it cutoff the search by exploring
less number of nodes.
• It makes the same moves as a minimax algorithm does, but it prunes the
unwanted branches using the pruning technique (discussed in adversarial search).
•Alpha-beta pruning works on two threshold values, i.e., ? (alpha) and ? (beta).
•It is the best highest value, a MAX player can have. It is the lower bound,
which represents negative infinity value.
•It is the best lowest value, a MIN player can have. It is the upper bound
which represents positive infinity.
•So, each MAX node has -value, which never decreases,
•and each MIN node has -value, which never increases.
•Note: Alpha-beta pruning technique can be applied to trees of any
depth, and it is possible to prune the entire subtrees easily.
•Consider the example of a game tree where P and Q are two players.
•The game will be played alternatively, i.e., chance by chance.
•Let, P be the player who will try to win the game by maximizing
its winning chances.
•Q is the player who will try to minimize P’s winning chances.
•Here, will represent the maximum value of the nodes, which will be
the value for P as well.
•will represent the minimum value of the nodes, which will be the
value of Q.
• Any one player will start the game. Following the DFS order, the player will choose one path and
will reach to its depth, i.e., where he will find the TERMINAL value.
• If the game is started by player P, he will choose the maximum value in order to increase its winning
chances with maximum utility value.
• If the game is started by player Q, he will choose the minimum value in order to decrease the
winning chances of A with the best possible minimum utility value.
• Both will play the game alternatively.
• The game will be started from the last level of the game tree, and the value will be chosen
accordingly.
• Like in the below figure, the game is started by player Q. He will pick the leftmost value of the
TERMINAL and fix it for beta (?). Now, the next TERMINAL value will be compared with the
?-value. If the value will be smaller than or equal to the ?-value, replace it with the current ?-value
otherwise no need to replace the value.
• After completing one part, move the achieved ?-value to its upper node and fix it for the other
threshold value, i.e., ?.
• Now, its P turn, he will pick the best maximum value. P will move to explore the next part only after
comparing the values with the current ?-value. If the value is equal or greater than the current
?-value, then only it will be replaced otherwise we will prune the values.
• The steps will be repeated unless the result is not obtained.
• So, number of pruned nodes in the above example are four and MAX wins the game with the
maximum UTILITY value, i.e.,3
Constraint Satisfaction Problems in
Artificial Intelligence
•Constraint Satisfaction Problems in Artificial Intelligence
•We have seen so many techniques like Local search,
Adversarial search to solve different problems. The objective of
every
problem-solving technique is one, i.e., to find a solution to reach the
goal. Although, in adversarial search and local search, there were
no constraints on the agents while solving the problems and
reaching to its solutions.
•
Constraint satisfaction
•Constraint satisfaction depends on three components, namely:
•X: It is a set of variables.
•D: It is a set of domains where the variables reside. There is a
specific domain for each variable.
•C: It is a set of constraints which are followed by the set of variables.
Constraint Satisfaction Problems
•Solving Constraint Satisfaction Problems
•The requirements to solve a constraint satisfaction problem (CSP) is:
•A state-space
•The notion of the solution.
•A state in state-space is defined by assigning values to some or
all variables such as
•{X1=v1, X2=v2, and so on…}.
CSP cont….
•Constraint Types in CSP
•With respect to the variables, basically there are following types
of constraints:
•Unary Constraints: It is the simplest type of constraints that
restricts the value of a single variable.
•Binary Constraints: It is the constraint type which relates
two variables. A value x2 will contain a value which lies
between x1 and x3.
•Global Constraints: It is the constraint type which involves
an arbitrary number of variables.
CSP cont….
•Constraint Propagation
•In local state-spaces, the choice is only one, i.e., to search for
a solution. But in CSP, we have two choices either:
•We can search for a solution or
•We can perform a special type of inference called constraint
propagation.
•Constraint propagation is a special type of inference which helps
in reducing the legal number of values for the variables.
•The idea behind constraint propagation is local consistency.
CSP Problems
•Constraint satisfaction includes those problems which contains
some constraints while solving the problem. CSP includes the
following problems:
•Graph Coloring: The problem where the constraint is that no
adjacent sides can have the same color.
CSP Problems cont…
•Sudoku Playing: The gameplay where the constraint is that
no number from 0-9 can be repeated in the same row or
column.
CSP Problems cont…
•n-queen problem: In n-queen
problem, the constraint is that
no queen should be placed
either diagonally, in the same
row or column.
•Crossword: In crossword problem,
the constraint is that there should
be the correct formation of the
words, and it should be
meaningful.
•Cryptarithmetic Problem: This problem has one most
important constraint that is, we cannot assign a different digit to
the same character. All digits should contain a unique alphabet.
•Latin square Problem: In this game, the task is to search the
pattern which is occurring several times in the game. They may be
shuffled but will contain the same digits.
Monte Carlo Tree Search (MCTS)
• Monte Carlo Tree Search (MCTS) is a search technique in the field of Artificial Intelligence
(AI).
• It is a probabilistic and heuristic driven search algorithm that combines the classic tree
search implementations alongside machine learning principles of reinforcement learning.
• In tree search, there’s always the possibility that the current best action is actually not the
most optimal action.
• In such cases, MCTS algorithm becomes useful as it continues to evaluate other
alternatives periodically during the learning phase by executing them, instead of the current
perceived optimal strategy.
• This is known as the ” exploration-exploitation trade-off “.
• It exploits the actions and strategies that is found to be the best till now but also must
continue to explore the local space of alternative decisions and find out if they could replace
the current best.
• Exploration helps in exploring and discovering the unexplored parts of the tree, which could
result in finding a more optimal path.
MCTS cont..
• Monte Carlo Tree Search (MCTS) algorithm:
In MCTS, nodes are the building blocks of the search tree.
These nodes are formed based on the outcome of a number of simulations.
The process of Monte Carlo Tree Search can be broken down where;
Si = value of a node i
into four distinct steps, viz., selection, expansion, simulation xi = empirical mean of a node
and backpropagation. i
Each of these steps is explained in details below: C = a constant
t = total number of simulations
• Selection: In this process, the MCTS algorithm traverses the current tree from the root node using a
specific strategy.
• The strategy uses an evaluation function to optimally select nodes with the highest estimated value.
• MCTS uses the Upper Confidence Bound (UCB) formula applied to trees as the strategy in the
selection process to traverse the tree.
• It balances the exploration-exploitation trade-off. During tree traversal, a node is selected based on
some parameters that return the maximum value.
• The parameters are characterized by the formula that is typically used for this purpose is given below.
• When traversing a tree during the
selection process, the child node that
returns the greatest value from the above
equation will be one that will get
selected. During traversal, once a child
node is found which is also a leaf node,
the MCTS jumps into the expansion step.
• Expansion: In this process, a new child
node is added to the tree to that node
which was optimally reached during the
selection process.
• Simulation: In this process, a simulation
is performed by choosing moves or
strategies until a result or predefined
state is achieved.
• Backpropagation: After determining the
value of the newly added node, the
remaining tree must be updated. So, the
backpropagation process is performed,
where it backpropagates from the new
node to the root node. During the
process, the number of simulation stored
Advantages of Monte Carlo
Tree Search:
•
1. MCTS is a simple algorithm to implement.
2. Monte Carlo Tree Search is a heuristic algorithm. MCTS can operate
effectively without any knowledge in the particular domain, apart
from the rules and end conditions, and can find its own moves and
learn from them by playing random playouts.
3. The MCTS can be saved in any intermediate state and that state
can be used in future use cases whenever required.
4. MCTS supports asymmetric expansion of the search tree based
on the circumstances in which it is operating.
Disadvantages of Monte Carlo
Tree Search:
1. As the tree growth becomes rapid after a few iterations, it requires
a huge amount of memory.
2. There is a bit of a reliability issue with Monte Carlo Tree Search. In
certain scenarios, there might be a single branch or path, that
might lead to loss against the opposition when implemented for
those
turn-based games. This is mainly due to the vast amount of
combinations and each of the nodes might not be visited enough
number of times to understand its result or outcome in the long
run.
3. MCTS algorithm needs a huge number of iterations to be able to
effectively decide the most efficient path. So, there is a bit of a
speed issue there.
Stochastic Games
• games, such as dice tossing, have
a random element to reflect this
unpredictability. These are known as
stochastic games. Backgammon is a
classic game that mixes skill and
luck. The legal moves are determined
by rolling dice at the start of each
player’s turn white, for example, has
rolled a 6–5 and has four alternative
moves in the backgammon scenario
shown in the figure below.
Partially Observable Games
•Partially observable stochastic games (POSGs) are among the
most general formal models that capture such dynamic
scenarios. The model captures stochastic events, partial
information of players about the environment, and the scenario
does not have a fixed horizon.
Which one is the example of
partially-observable?
•An example of a partially observable system would be
•a card game in which some of the cards are discarded
into a pile face down.
•In this case the observer is only able to view their own
cards and potentially those of the dealer.