### 540homework1.pdf

Details

Preview - Click download on the question page for the full document.

1/For a larger array, where width = 7 and height = 6, define the ConnectFour class
which is a version previously implemented tic-tac-toe.
Hint: ConnectFour class inherits from TicTacToe class and has an additional function
which forces the player only has to operate in the lowest empty cell of a column
2/Implement the Upper Confidence Bound for Trees algorithm from the family of
search algorithms of trees in Monte Carlo.
An iteration of the general MCTS algorithm is structured in four phases:
Figure 1 - The four phases of the general MCTS algorithm, Cameron Browne, 2010
- selection: a strategy for selecting an action to exploit (choice of a leaf node in the
tree)
- expansion: the construction of a new node in the tree (add the states of the children
to the node of leaf)
- simulation: choose a step at random and assume the utility of the current state
- back-propagation: update the utility function for parent nodes.
Figure 2 - The pseudo-code of the MCTS algorithm, Cameron Browne, 2010
Find the action that maximizes the following formula:
Representing a node in MCTS as a class:
- the number of visits, N, representing the number of simulations carried out from a
node or a descendant of it.
- the estimated value Q, indicating the quality of the node, based on the number of
games won from this node.
- N node a reference to the parent node.
- list of children, a dictionary which contains for each action a link to the following
node.
- it is a constant in the exploration and can be adjusted and can have a default value
of
Evolution of the algorithm:
1. If the algorithm starts with an empty tree (without memory), then a new node is
built. Otherwise, we select the sub-tree according to the last action of the opponent.
2. Until reaching the limit of the calculation budget:
- Starting from the root node, choose successively the following nodes until reaching
a final state or a node from which all possible actions have not been explored.
- For a node that is not final and from which all actions have not been explored, build
a child node for one of the unexplored actions.
- Simulate a game from the new node to a final state.
- Evaluate final status and determine if a reward is calculated.
- This reward is spread, by updating the statistics (number of visits) for each node at
the root.
Figure 3 - Pseudocode UCT, Cameron Browne, 2010
The function call should look like this:
To define the actions of the problem, we use the play game function which receives
the current game to be played and a strategy. The strategy itself is reduced to a
dictionary with the following structure:
where strategy function can be called by:
Example :
A possible exit scenario is shown on the next page:
3/Define the function query-player (game, state) to make the game more interactive
and display movements available / equal.