0% found this document useful (0 votes)
51 views37 pages

AI Lab Manual: Search Algorithms

The document is a lab manual for an Artificial Intelligence course, detailing various experiments related to search algorithms and game search implementations. It includes programming tasks for uninformed and informed search techniques, such as Breadth First Search, Depth First Search, Best First Search, and A* Search, along with game search implementation. Each experiment outlines the aim, theory, and code examples for students to follow and complete in their lab sessions.

Uploaded by

dhruv
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
51 views37 pages

AI Lab Manual: Search Algorithms

The document is a lab manual for an Artificial Intelligence course, detailing various experiments related to search algorithms and game search implementations. It includes programming tasks for uninformed and informed search techniques, such as Breadth First Search, Depth First Search, Best First Search, and A* Search, along with game search implementation. Each experiment outlines the aim, theory, and code examples for students to follow and complete in their lab sessions.

Uploaded by

dhruv
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Department of Computer Science

and Engineering

LAB MANUAL
SUBJECT: Artificial Intelligence- Lab
(BTCS-605-18)

3rdYear – 6th Semester


(Branch: CSE)

Name: Jaskaranpreet Singh


[Link]

CGC College of Engineering


Landran, Mohali-140307
INDEX

S. No Name of Experiment Date Remarks

Write a program to conduct uninformed search and


1 informed search.

Write a program to conduct uninformed search


2
(Breadth First Search).
Write a program to conduct uninformed search
3
(Depth First Search).
Implementation of informed search (Best First
4
Search).
Implementation of informed search Strategy (A*
5 Search).

6 Write a programme to conduct game search.

Implementation of Game Search using MINIMAX


7
Algorithm.
Write a program to construct a Bayesian network
8 from given data.

9 Write a program to infer from the Bayesian network.

Write a program to run value and policy iteration in


10
a grid world.

11 Write a program to do reinforcement learning in a


grid world.
Experiment-1

Aim: Write a program to conduct uninformed and informed search.


Theory:
An uninformed search is a searching technique that has no additional information about the distance
from the current state to the goal.

Informed Search is another technique that has additional information about the estimate distance from
the current state to the goal.

Basis of Informed search Uninformed search


comparison

Basic Uses knowledge to find the No use of knowledge


knowledge steps to the solution.

Efficiency Highly efficient as consumes Efficiency is mediatory


less time and cost.

Cost Low Comparatively high

Performance Finds the solution more Speed is slower than the informed
quickly. search.

Algorithms Heuristic depth-first and Depth-first search, breadth-first


breadth-first search, and A* search, and lowest cost first search
search
Experiment-2

Aim: Write a program to conduct Breadth First Search (Uninformed search) for a
graph.

Program
#include<iostream>
#include <list>
using namespace std;
// This class represents a directed graph using
// adjacency list representation class
Graph
{
int V; // No. of vertices

// Pointer to an array containing adjacency


// lists list<int>
*adj;
public:
Graph(int V); // Constructor

// function to add an edge to graph void


addEdge(int v, int w);

// prints BFS traversal from a given source s void


BFS(int s);
};

Graph::Graph(int V)
{
this->V = V;
adj = new list<int>[V];
}

void Graph::addEdge(int v, int w)


{
adj[v].push_back(w); // Add w to v’s list.
}

void Graph::BFS(int s)
{
// Mark all the vertices as not visited bool
*visited = new bool[V];
for(int i = 0; i < V; i++) visited[i]
= false;

// Create a queue for BFS


list<int> queue;

// Mark the current node as visited and enqueue it


visited[s] = true;
queue.push_back(s);

// 'i' will be used to get all adjacent


// vertices of a vertex
list<int>::iterator i;

while(![Link]())
{
// Dequeue a vertex from queue and print it s =
[Link]();
cout << s << " ";
queue.pop_front();

// Get all adjacent vertices of the dequeued


// vertex s. If a adjacent has not been visited,
// then mark it visited and enqueue it
for (i = adj[s].begin(); i != adj[s].end(); ++i)
{
if (!visited[*i])
{
visited[*i] = true;
queue.push_back(*i);
}
}
}
}

// Driver program to test methods of graph class int


main()
{
// Create a graph given in the above diagram
Graph g(4);
[Link](0, 1);
[Link](0, 2);
[Link](1, 2);
[Link](2, 0);
[Link](2, 3);
[Link](3, 3);

cout << "Following is Breadth First Traversal "


<< "(starting from vertex 2) \n";
[Link](2);

return 0;
}
Output
Experiment-3

Aim: Write a program to conduct uninformed search (Depth First Search).

Program
#include<bits/stdc++.h> using
namespace std;
// Graph class represents a directed graph
// using adjacency list representation class Graph
{
int V; // No. of vertices

// Pointer to an array containing


// adjacency lists list<int>* adj;

// A recursive function used by DFS void


DFSUtil(int v, bool visited[]);

public:
Graph(int V); // Constructor
// function to add an edge to graph

void addEdge(int v, int w);


// DFS traversal of the vertices
// reachable from v void DFS(int
v);
};

Graph::Graph(int V)
{
this->V = V;
adj = new list<int>[V];
}

void Graph::addEdge(int v, int w)


{
adj[v].push_back(w); // Add w to v’s list.
}

void Graph::DFSUtil(int v, bool visited[])


{
// Mark the current node as visited and
// print it visited[v] = true;
cout << v << " ";

// Recur for all the vertices adjacent


// to this vertex list<int>::iterator
i;
for (i = adj[v].begin(); i != adj[v].end(); ++i) if
(!visited[*i])
DFSUtil(*i, visited);
}

// DFS traversal of the vertices reachable from v.


// It uses recursive DFSUtil() void
Graph::DFS(int v)
{
// Mark all the vertices as not visited bool* visited
= new bool[V];
for (int i = 0; i < V; i++) visited[i] =
false;

// Call the recursive helper function


// to print DFS traversal DFSUtil(v,
visited);
}

// Driver code int main()


{
// Create a graph given in the above diagram Graph g(4);
[Link](0, 1);
[Link](0, 2);
[Link](1, 2);
[Link](2, 0);
[Link](2, 3);
[Link](3, 3);

cout << "Following is Depth First Traversal" " (starting


from vertex 2) \n";
[Link](2);

return 0;
}
Output
Experiment-4

Aim: Implementation of Informed Search (Best First Search).


Program
#include <bits/stdc++.h>
using namespace std;
typedef pair<int, int> pi;

vector<vector<pi> > graph;

// Function for adding edges to graph


void addedge(int x, int y, int cost)
{
graph[x].push_back(make_pair(cost, y));
graph[y].push_back(make_pair(cost, x));
}

// Function For Implementing Best First Search


// Gives output path having lowest cost
void best_first_search(int source, int target, int n)
{
vector<bool> visited(n, false);
// MIN HEAP priority queue
priority_queue<pi, vector<pi>, greater<pi> > pq;
// sorting in pq gets done by first value of pair pq.
push(make_pair(0, source));
visited = true;

while (![Link]()) {
int x = [Link]().second;
// Displaying the path having lowest cost
cout << x << " ";
[Link]();
if (x == target) break;

for (int i = 0; i < graph[x].size(); i++)


{
if (!visited[graph[x][i].second])
{
visited[graph[x][i].second] = true;
[Link](graph[x][i]);
}
}
}
}

// Driver code to test above methods


int main()
{
// No. of Nodes int v = 14;
[Link](v);

// The nodes shown in above example(by alphabets) are


// implemented using integers addedge(x,y,cost); addedge(0, 1, 3);
addedge(0, 2, 6);
addedge(0, 3, 5);
addedge(1, 4, 9);
addedge(1, 5, 8);
addedge(2, 6, 12);
addedge(2, 7, 14);
addedge(3, 8, 7);
addedge(8, 9, 5);
addedge(8, 10, 6);
addedge(9, 11, 1);
addedge(9, 12, 10);
addedge(9, 13, 2);

int source = 0; int target = 9;

// Function call best_first_search(source, target, v);

return 0;
}

Output
Experiment-5

Aim: Implementation of Informed Search Strategy (A* Search).

Program
#include <iostream>
#include <cmath>
#include <list>
#include <vector>
#include <algorithm>

class Vector2
{
int x, y;
public:
Vector2(int _x, int _y) : x(_x), y(_y) {}
Vector2() = default;
Vector2 operator +(const Vector2& other)
{
Vector2 temp;
temp.x = this->x + other.x;
temp.y = this->y + other.y;
return temp;
}
int getX() const { return x; }
int getY() const { return y; }

friend class Map;


};

struct Node
{
Vector2 position;
int G, H, F;
Node* parent = nullptr;

Node() = default;
Node(const Node& other) = default;
Node(Vector2 pos):position(pos) {};

void calc(const Vector2& endPos) {


H = static_cast<int>((abs(static_cast<double>([Link]() - [Link]())) +
abs(static_cast<double>([Link]() - [Link]()))));
G = parent ? parent->G + 1 : 1;
F = G + H;
}

bool operator==(const Node& other) const {


return ([Link]() == [Link]() && [Link]() ==
[Link]());
}
bool operator!=(const Node& other) const {
return !(*this == other);
}
bool operator<(const Node& other) const {
return(F < other.F);
}
};

class Map
{
std::vector<char> data;
int size;
public:
Map() = default;
Map(int _size) : size(_size) {
[Link](size * size);
for (int i = 0; i < size * size; ++i) data[i] = '.';
}
void display() const{
for (int i = 1; i <= size * size; ++i) {
std::cout << data[i - 1] << " ";
if (!(i % size)) std::cout << "\n";
}
}
bool getIfInDanger(Vector2 position) const {
if (position.y < 0) position.y = 0;
if (position.x < 0) position.x = 0;
if (position.y >= 20) position.y = size - 1;
if (position.x >= 20) position.x = size - 1;
return(data[[Link]() + ([Link]() * size)] == 'X');
}
void setElement(char&& asda, Vector2 position) {
data[[Link]() + ([Link]() * size)] = asda;
}
};

class Solver
{
Vector2 startPos, endPos;
std::vector<Vector2> directions;
Map map;
public:
Solver(Vector2 _startPos, Vector2 _endPos, int size) : startPos(_startPos),
endPos(_endPos){
Map temp(size);
map = temp;
[Link]('X', Vector2(14, 15));
[Link]('X',Vector2(15,15));
[Link]('X', Vector2(16, 15));
[Link]('X', Vector2(16, 14));
[Link]('X', Vector2(16, 13));

[Link](8);
directions[0] = Vector2(-1, 1);
directions[1] = Vector2(-1, 0);
directions[2] = Vector2(-1, -1);
directions[3] = Vector2(0, 1);
directions[4] = Vector2(0, -1);
directions[5] = Vector2(1, 1);
directions[6] = Vector2(1, 0);
directions[7] = Vector2(1, -1);
}
bool aStar() {
Node startNode(startPos);
Node goalNode(Vector2([Link](), [Link]()));

if ([Link]([Link]) || [Link]([Link])) {
std::cout << "Either the start of this map is obstructed or so is the end.";
return false;
}

std::list<Node> openList;
std::list<Node> closedList;

[Link](endPos);
openList.push_back(startNode);

while (![Link]()) {
auto current = Node(*std::min_element([Link](), [Link]()));

[Link](endPos);

closedList.push_back(current);
[Link](current);
if (current == goalNode) break;

for (auto& direction : directions) {


Node successor(direction + [Link]);

if ([Link]([Link]) || [Link]() > 20 - 1 ||


[Link]() > 20 - 1 || [Link]() < 0 ||
[Link]() < 0 ||
std::find([Link](), [Link](), successor) != [Link]()) {
continue;
}
[Link](endPos);

auto inOpen = std::find([Link](), [Link](), successor);


if (inOpen == [Link]()) {
[Link] = &[Link]();
[Link](endPos);

openList.push_back(successor);
}
else
if (successor.G < inOpen->G) [Link] = &[Link]();
}
}

if (![Link]()) {
std::cout << "No path has been found\n";
return false;
}

auto inClosed = std::find([Link](), [Link](), goalNode);


if (inClosed != [Link]()) {
while (*inClosed != startNode) {
[Link]('Y',inClosed->position);
*inClosed = *inClosed->parent;
}
}

[Link]();
return true;
}
};

int main()
{
Solver solve(Vector2(0,0),Vector2(19,19), 20);
[Link]();
}
Output
Experiment-6

Aim: Write a programme to conduct game search.

Program
#include <iostream>
#include<list>
#include <cstdlib>
#include<string>
#include <ctime>
using namespace std;

typedef struct{
int *row;
}WinList;

class Player {
private:
string name;
int score;
public:
Player() :Player {""}{}
Player(string n) :score{0}, name{n}{}

void won(){
//increment the score
score++;
}
int getScore(){ return this->score;}

string getName(){ return this->name;}


};

class Game {
private:
char board[9];
int emptyIndex[9];
int gameOn, againstComputer;
int emptyCount;
WinList winlist[8];

void displayBoard(){
cout <<endl;
cout << " | | "<<endl;
cout << " "<< board[0] <<" | "<<board[1]<<" | "<<board[2]<<endl;
cout << " | | "<<endl;
cout << "-----------"<<endl;
cout << " | | "<<endl;
cout << " "<< board[3] <<" | "<<board[4]<<" | "<<board[5]<<endl;
cout << " | | "<<endl;
cout << "-----------"<<endl;
cout << " | | "<<endl;
cout << " "<< board[6] <<" | "<<board[7]<<" | "<<board[8]<<endl;
cout << " | | "<<endl;
cout <<endl;
}

void computerInput(){
int pos;
pos = rand()%10;
if(emptyIndex[pos] == 1){
if(emptyCount < 0)
return;
computerInput();
} else {
cout<< "Computer choose: " << pos+1 << endl;
emptyIndex[pos] =1;
emptyCount-=1;
board[pos] = 'O';
}

void playerInput(Player &player){


int pos;
cout << endl;
cout << "\t" << [Link]() <<" Turn: ";
cout <<"\t Enter the position " << endl;
cin >> pos;
pos -=1;
if(emptyIndex[pos] == 1){
cout << "-----Position not empty-------"<< endl;
playerInput(player);
} else {
emptyIndex[pos] =1;
emptyCount-=1;
[Link]().compare("Player I") == 0 ? board[pos] ='X': board[pos] ='O';
}

void checkWin(Player &p1,Player &p2){


int i,j,k;
bool flag = false;
char first_symbol;
for(i=0; i<8; i++){
first_symbol = board[winlist[i].row[0]];
if((first_symbol != 'X') && (first_symbol != 'O')){
flag = false;
continue;
}
flag = true;
for(j=0;j<3;j++){
if(first_symbol != board[winlist[i].row[j]]){
flag = false;
break;
}
}
if(flag){
gameOn = 0;
if(first_symbol == 'X'){
cout << "-----------------------"<< endl;
cout << "\t Player I WON"<< endl;
cout << "-----------------------"<< endl;
[Link]();
} else {
[Link]();
if(againstComputer){
cout << "-----------------------"<< endl;
cout << "\t Computer WON"<< endl;
cout << "-----------------------"<< endl;
} else {
cout << "-----------------------"<< endl;
cout << "\t Player II WON"<< endl;
cout << "-----------------------"<< endl;

}
}
displayScore(p1,p2);
break;
}
}
}

void play(Player &p1,Player &p2){


char rematch ='\0';
int hand = 0;
gameOn =1;
displayBoard();
while((emptyCount > 0) && (gameOn != 0)){

if(againstComputer)
hand == 1 ? computerInput(): playerInput(p2);
else
hand == 1 ? playerInput(p1): playerInput(p2);
hand= !hand;
displayBoard();
checkWin(p1,p2);
}
if (emptyCount <=0){
cout << " -----------------------"<< endl;
cout << "\t No WINNER"<< endl;
cout << " -----------------------"<< endl;
}
cout<< endl;
cout << "Rematch Y/N: ";
cin >> rematch;
if((rematch == 'Y')||(rematch == 'y')){
init();
play(p1,p2);
}

}
void displayScore(Player &p1, Player &p2){
cout << endl;
cout << "\t SCORE: \t";
if(againstComputer)
cout<<" Player I: " <<[Link]()<<" \t Computer: "<<[Link]()<< endl;
else
cout<<" Player I: " <<[Link]()<<" \t Player II: "<<[Link]()<< endl;
}

public:
Game(): emptyCount{0}, gameOn{1}, againstComputer{0}{
init();
winlist[0].row = new int[3]{0,1,2};
winlist[1].row = new int[3]{3,4,5};
winlist[2].row = new int[3]{6,7,8};
winlist[3].row = new int[3]{0,3,6};
winlist[4].row = new int[3]{1,4,7};
winlist[5].row = new int[3]{2,5,8};
winlist[6].row = new int[3]{0,4,8};
winlist[7].row = new int[3]{2,4,6};
}

void init(){
gameOn = 1;

emptyCount =0;
srand(time(0));
for(size_t i=0; i<10; i++){
emptyIndex[i] = 0;
board[i] = (i+1) +'0';
emptyCount++;
}
emptyCount--;
}
void onePlayerGame(){
//Creating Player
Player p("Player I");
Player c("Computer");
cout << " -----------------------"<< endl;
cout << "\t Player I: X \t Computer: O"<< endl;
cout << " -----------------------"<< endl;
cout << endl;
againstComputer = 1;
play(c,p);

void twoPlayerGame(){
//Creating Player
Player p("Player I");
Player c("Player II");
cout << " -----------------------"<< endl;
cout << "\t Player I: X \t Player II: O"<< endl;
cout << " -----------------------"<< endl;
cout << endl;
againstComputer = 0;
play(c,p);
}
};

int main()
{
int ch;

while(1){
cout<< " ----------MENU----------" << endl;
cout << "\t 1. 1 Player game" <<endl;
cout << "\t 2. 2 Player game" <<endl;
cout << "\t 3. To exit " <<endl;
cout <<" ------------------------" << endl;
cout << endl;
cout <<"\t Select an option" << endl;
cin >> ch;
switch(ch){
case 1:{
Game *game = new Game;
game->init();
game->onePlayerGame();
}
break;
case 2:{
Game *game = new Game;
game->init();
game->twoPlayerGame();
}
break;
case 3:
return 0;
default:
cout << "OOPs Invalid Option! TRY AGAIN";
}

}
return 0;
}

Output
Experiment-7

Aim: Implementation of Game Search using MINIMAX Algorithm.


Program
#include<bits/stdc++.h>
using namespace std;

// Returns the optimal value a maximizer can obtain.


// depth is current depth in game tree.
// nodeIndex is index of current node in scores[].
// isMax is true if current move is
// of maximizer, else false
// scores[] stores leaves of Game tree.
// h is maximum height of Game tree
int minimax(int depth, int nodeIndex, bool isMax,
int scores[], int h)
{
// Terminating condition. i.e
// leaf node is reached
if (depth == h)
return scores[nodeIndex];

// If current move is maximizer,


// find the maximum attainable
// value
if (isMax)
return max(minimax(depth+1, nodeIndex*2, false, scores, h),
minimax(depth+1, nodeIndex*2 + 1, false, scores, h));

// Else (If current move is Minimizer), find the minimum


// attainable value
else
return min(minimax(depth+1, nodeIndex*2, true, scores, h),
minimax(depth+1, nodeIndex*2 + 1, true, scores, h));
}

// A utility function to find Log n in base 2


int log2(int n)
{
return (n==1)? 0 : 1 + log2(n/2);
}

// Driver code
int main()
{
// The number of elements in scores must be
// a power of 2.
int scores[] = {3, 5, 2, 9, 12, 5, 23, 23};
int n = sizeof(scores)/sizeof(scores[0]);
int h = log2(n);
int res = minimax(0, 0, true, scores, h);
cout << "The optimal value is : " << res << endl;
return 0;
}

Output
Experiment: 8

Aim: Write a program to construct a Bayesian network from given data.

Program:
import numpy as np
import csv
import pandas as pd
from [Link] import BayesianModel
from [Link] import MaximumLikelihoodEstimator
from [Link] import VariableElimination

#read Clevel and Heart Disease data


heartDisease = pd.read_csv('[Link]')
heartDisease = [Link]('?',[Link])

#display the data


print('Few examples from the dataset are given below')
print([Link]())

#Model Bayesian Network


Model=BayesianModel([('age','trestbps'),('age','fbs'),
('sex','trestbps'),('exang','trestbps'),('trestbps','heartdisease'),('fbs','heartdisease'),('heartdisease','
restecg'), ('heartdisease','thalach'),('heartdisease','chol')])

#Learning CPDs using Maximum Likelihood Estimators


print('\n Learning CPD using Maximum likelihood estimators')
[Link](heartDisease,estimator=MaximumLikelihoodEstimator)

# Inferencing with Bayesian Network


print('\n Inferencing with Bayesian Network:')
HeartDisease_infer = VariableElimination(model)

#computing the Probability of HeartDisease given Age


print('\n 1. Probability of HeartDisease given Age=30')
q=HeartDisease_infer.query(variables=['heartdisease'],evidence={'age':28})
print(q['heartdisease'])

#computing the Probability of HeartDisease given cholesterol


print('\n 2. Probability of HeartDisease given cholesterol=100')
q=HeartDisease_infer.query(variables=['heartdisease'],evidence={'chol':100})
print(q['heartdisease'])
Output
Experiment: 9

Aim: Write a program to infer from the Bayesian network.


Program:
import random
from random import seed, randint
import numpy

def game(winningdoor, selecteddoor, change=False):


assert winningdoor < 3
assert winningdoor >= 0

# Presenter removes the first door that was not selected neither winning
removeddoor = next(i for i in range(3) if i != selecteddoor and i != winningdoor)

# Player decides to change its choice


if change:
selecteddoor = next(i for i in range(3) if i != selecteddoor and i != removeddoor)

# We suppose the player never wants to change its initial choice.


return selecteddoor == winningdoor

if __name__ == '__main__':
playerdoors = [Link].random_integers(0,2, (1000 * 1000 * 1,))

winningdoors = [d for d in playerdoors if game(1, d)]


print("Winning percentage without changing choice: ", len(winningdoors) /
len(playerdoors))

winningdoors = [d for d in playerdoors if game(1, d, change=True)]


print("Winning percentage while changing choice: ", len(winningdoors) / len(playerdoors))

from numpy import random


import numpy as np
import time

def MontyHallSimulation (N):


ChoiceUnchanged=[]
ChoiceChanged=[]
NN=1
for i in range(0,N):

# 1) The car is placed behind a random door.


WinningDoor=[Link](['Door 1', 'Door 2', 'Door 3'])

# 2) The contestant selects a random door.


FirstSelection=[Link](['Door 1', 'Door 2', 'Door 3'])
# 3) The host opens a door that is different than the contestants choice
# and not the door with the car.
HostOpens=list(set(['Door 1', 'Door 2', 'Door 3'])-set([FirstSelection,WinningDoor]))[0]

# 4) The other door is not the participant's selected door and not the opened door.

OtherDoor=list(set(['Door 1', 'Door 2', 'Door 3'])-set([FirstSelection,HostOpens]))[0]

# 5) Add "True" to a list where the participant DOES NOT change their selection AND
thier
# selection identified the door with the car.
[Link](FirstSelection==WinningDoor)

# 6) Add "True" to a list where the participant DOES change their selection and thier
# new selected door has the car behind it.
[Link](OtherDoor==WinningDoor)

# NOTE: The boolean object "TRUE" is equal to 1 and "False" is equal to 0.


# As such, we can use the "sum" function to get the total number of wins for each strategy.
print(f'\n\{N:,} games were played \n\
Chances of winning the car based on the following strategies:\n\
Remaining with initial selection: {"{:.1%}".format(sum(ChoiceUnchanged)/N)}\n\
Switching doors: {"{:.1%}".format(sum(ChoiceChanged)/N)}')

###############################
###### Run the Simulation######
###############################
Start_time = [Link]()
MontyHallSimulation(N=100000)
print(f'\nSimulation Completed in: {round([Link]()-Start_time,2)} Seconds')
Output
Experiment 10

Aim: Write a program to run value and policy iteration in a grid world.
Program
public class GridWorld2
{
// General settings
private static double Ra = -3; // reward in non-terminal states (used to initialise r[][])
private static double gamma = 1; // discount factor
private static double pGood = 0.8; // probability of taking intended action
private static double pBad = (1-pGood)/2; // 2 bad actions, split prob between them
private static int N = 10000; // max number of iterations of Value Iteration
private static double deltaMin = 1e-9; // convergence criterion for iteration

// Main data structures

private static double U[][]; // long-term utility


private static double Up[][]; // UPrime, used in updates
private static double R[][]; // instantaneous reward
private static char Pi[][]; // policy

private static int rMax = 3, cMax = 4;

public static void main(String[] args)


{
int r,c;
double delta = 0;

// policy: initially null


Pi = new char[rMax][cMax];

// initialise U'
Up = new double[rMax][cMax]; // row, col
for (r=0; r<rMax; r++) {
for (c=0; c<cMax; c++) {
Up[r][c] = 0;
}
}
// Don't initialise U: will set U=Uprime in iterations
U = new double[rMax][cMax];

// initialise R: set everything to Ra and then override the terminal states


R = new double[rMax][cMax]; // row, col
for (r=0; r<rMax; r++) {
for (c=0; c<cMax; c++) {
R[r][c] = Ra;
}
}
R[0][3] = 100; // positive sink state
R[1][3] = -100; // negative sink state
R[1][1] = 0; // unreachable state

// Now perform Value Iteration.


int n = 0;
do
{
// Simultaneous updates: set U = Up, then compute changes in
Up using prev value of U.
duplicate(Up, U); // src, dest
n++;
delta = 0;
for (r=0; r<rMax; r++) {
for (c=0; c<cMax; c++) {
updateUPrime(r, c);
double diff = [Link](Up[r][c] - U[r][c]);
if (diff > delta)
delta = diff;
}
}
} while (delta > deltaMin && n < N);

// Display final matrix


[Link]("After " + n + " iterations:\n");
for (r=0; r<rMax; r++) {
for (c=0; c<cMax; c++) {
[Link]("% 6.1f\t", U[r][c]);
}
[Link]("\n");
}

// Before displaying the best policy, insert chars in the sinks and the non-moving block
Pi[0][3] = '+'; Pi[1][3] = '-'; Pi[1][1] = '#';

[Link]("\nBest policy:\n");
for (r=0; r<rMax; r++) {
for (c=0; c<cMax; c++) {
[Link](Pi[r][c] + " ");
}
[Link]("\n");
}
}

public static void updateUPrime(int r, int c)


{
// IMPORTANT: this modifies the value of Up, using values in U.

double a[] = new double[4]; // 4 actions


// If at a sink state or unreachable state, use that value
if ((r==0 && c==3) || (r==1 && c==3) || (r==1 && c==1)) {
Up[r][c] = R[r][c];
}
else
{
a[0] = aNorth(r,c)*pGood + aWest(r,c)*pBad + aEast(r,c)*pBad;
a[1] = aSouth(r,c)*pGood + aWest(r,c)*pBad + aEast(r,c)*pBad;
a[2] = aWest(r,c)*pGood + aSouth(r,c)*pBad + aNorth(r,c)*pBad;
a[3] = aEast(r,c)*pGood + aSouth(r,c)*pBad + aNorth(r,c)*pBad;

int best = maxindex(a);

Up[r][c] = R[r][c] + gamma * a[best];

// update policy
Pi[r][c] = (best==0 ? 'N' : (best==1 ? 'S' : (best==2 ? 'W': 'E')));
}
}

public static int maxindex(double a[])


{
int b=0;
for (int i=1; i<[Link]; i++)
b = (a[b] > a[i]) ? b : i;
return b;
}

public static double aNorth(int r, int c)


{
// can't go north if at row 0 or if in cell (2,1)
if ((r==0) || (r==2 && c==1))
return U[r][c];
return U[r-1][c];
}

public static double aSouth(int r, int c)


{
// can't go south if at row 2 or if in cell (0,1)
if ((r==rMax-1) || (r==0 && c==1))
return U[r][c];
return U[r+1][c];
}

public static double aWest(int r, int c)


{
// can't go west if at col 0 or if in cell (1,2)
if ((c==0) || (r==1 && c==2))
return U[r][c];
return U[r][c-1];
}

public static double aEast(int r, int c)


{
// can't go east if at col 3 or if in cell (1,0)
if ((c==cMax-1) || (r==1 && c==0))
return U[r][c];
return U[r][c+1];
}

public static void duplicate(double[][]src, double[][]dst)


{
// Copy data from src to dst
for (int x=0; x<[Link]; x++) {
for (int y=0; y<src[x].length; y++) {
dst[x][y] = src[x][y];
}
}
}
}

Output
Experiment 11
Aim: Write a program to do reinforcement learning in a grid world.
Program
import numpy as np
import random
gamma = 1 # discounting rate
gridSize = 4
rewardValue = -1
terminationStates = [[0,0], [gridSize-1, gridSize-1]]
actions = [[-1, 0], [1, 0], [0, 1], [0, -1]]
numIterations = 1000
def actionValue(initialPosition,action):
if initialPosition in terminationStates:
finalPosition = initialPosition
reward=0
else:
#Compute final position
finalPosition = [Link](initialPosition) + [Link](action)
reward= rewardValue
# If the action moves the finalPosition out of the grid, stay in same cell
if -1 in finalPosition or gridSize in finalPosition:
finalPosition = initialPosition
reward= rewardValue

#print(finalPosition)
return finalPosition, reward
# Initialize valueMap and valueMap1
valueMap = [Link]((gridSize, gridSize))
valueMap1 = [Link]((gridSize, gridSize))
states = [[i, j] for i in range(gridSize) for j in range(gridSize)]
def policy_evaluation(numIterations,gamma,theta,valueMap):
for i in range(numIterations):
delta=0
for state in states:
weightedRewards=0
for action in actions:
finalPosition,reward = actionValue(state,action)
weightedRewards += 1/4* (reward + gamma *
valueMap[finalPosition[0],finalPosition][1])
valueMap1[state[0],state[1]]=weightedRewards
delta =max(delta,abs(weightedRewards-valueMap[state[0],state[1]]))
valueMap = [Link](valueMap1)
if(delta < 0.01):
print(valueMap)
break
valueMap = [Link]((gridSize, gridSize))
valueMap1 = [Link]((gridSize, gridSize))
states = [[i, j] for i in range(gridSize) for j in range(gridSize)]
policy_evaluation(1000,1,0.001,valueMap)
Output

You might also like