Techknowledge Publication: Artificial Intelligence and Soft Computing
Techknowledge Publication: Artificial Intelligence and Soft Computing
Soft Computing
(Code : CSC703)
ge
io led
ic ow
n
Strictly as Per the Choice Based Credit and Grading System (Revise 2016)
bl kn
Copyright © by Authors. All rights reserved. No part of this publication may be reproduced, copied, or stored in a
retrieval system, distributed or transmitted in any form or by any means, including photocopy, recording, or other
electronic or mechanical methods, without the prior written permission of the publisher.
This book is sold subject to the condition that it shall not, by the way of trade or otherwise, be lent, resold, hired out, or
otherwise circulated without the publisher’s prior written consent in any form of binding or cover other than which it is
published and without a similar condition including this condition being imposed on the subsequent purchaser and
without limiting the rights under copyright reserved above.
ge
io led
First Edition : July 2019 (TechKnowledge Publications)
ic ow
This edition is for sale in India, Bangladesh, Bhutan, Maldives, Nepal, Pakistan, Sri Lanka and designated countries in
n
South-East Asia. Sale and purchase of this book outside of these countries is unauthorized by the publisher.
bl kn
at
Pu ch
ISBN 978-93-89424-35-5
Te
Published by
TechKnowledge Publications
ge
io led
ic ow
n
bl kn
at
Pu ch
Te
Dear Students,
We are extremely happy to present this book to you. We have divided the subject into small
chapters so that the topics can be arranged and understood properly. The topics within the chapters
ge
have been arranged in a proper sequence to ensure smooth flow and understanding of the subject.
io led
We present this book in the loving memory of Late Shri. Pradeepji Lunawat, our source
We are thankful to Prof. Arunoday Kumar, Shri. Shital Bhandari & Shri.
at
Pu ch
Chandroday Kumar for the encouragement and support that they have extended. We are also
thankful to Seema Lunawat for ebooks and the staff members of TechKnowledge Publications and
Te
others for their efforts to make this book as good as it is. We have jointly made every possible
efforts to eliminate all the errors in this book. However if you find any, please let us know,
We are also thankful to our family members and friends for their patience
and encouragement.
- Authors
2. To distinguish various search techniques and to make student understand knowledge representation and planning.
ge
3. To become familiar with basics of Neural Networks and Fuzzy Logic.
1. Identify the various characteristics of Artificial Intelligence and Soft Computing techniques.
ic ow
2. Choose an appropriate problem solving method for an agent to find a sequence of actions to reach the goal state.
n
3. Analyse the strength and weakness of AI approaches to knowledge representation, reasoning and planning.
bl kn
1.3 Soft Computing: Introduction of soft computing, soft computing vs. hard
computing, various types of soft computing techniques. (Refer Chapter-1)
2.2 Uninformed Search Methods: Depth Limited Search, Depth First Iterative
Deepening (DFID), Informed Search Method: A* Search
3.2 First order logic: syntax and Semantic, Knowledge Engineering in FOL Inference in
FOL : Unification, Forward Chaining, Backward Chaining and Resolution
3.3 Planning Agent, Types of Planning: Partial Order, Hierarchical Order, Conditional
Order (Refer Chapter-3)
4.1 Introduction to Fuzzy Set: Fuzzy set theory, Fuzzy set versus crisp set, Crisp
relation & fuzzy relations, membership functions,
4.2 Fuzzy Logic: Fuzzy Logic basics, Fuzzy Rules and Fuzzy Reasoning
ge
4.3 Fuzzy inference systems : Fuzzification of input variables, defuzzification and fuzzy
controllers. (Refer Chapter-4)
io led
5.0 Artificial Neural Network 12
Total 52
1.1 Introduction to Artificial Intelligence ............................. 1-1 Problem Solving Agent, Formulating Problems, Example
Problems Uninformed Search Methods : Depth Limited
ge
1.2 Foundations and Mathematical Treatments ................. 1-1
Search, Depth First Iterative Deepening (DFID), Informed
1.2.1 Acting Humanly : The Turing Test Approach ............... 1-1
Search Method : A* Search Optimization Problems : Hill
1.2.2 Thinking Humanly : The Cognitive
io led climbing Search, Simulated annealing, Genetic algorithm
Modelling Approach ..................................................... 1-2
1.2.4 Acting Rationally : The Rational Agent Approach ........ 1-3 2.2 Formulating Problems ................................................. 2-1
n
1.3 Categorization of Intelligent Systems........................... 1-4 2.2.1 Components of Problems Formulation ........................ 2-2
bl kn
1.4.1 Computational Intelligence vs. Artificial Intelligence .... 1-6 2.3 Measuring Performance of Problem Solving Algorithm /
at
Pu ch
Agent........................................................................... 2-5
1.5 History of Artificial Intelligence ..................................... 1-6
2.4 Node Representation in Search Tree .......................... 2-5
1.6 Applications of Artificial Intelligence............................. 1-7
Te
ge
2.10.5 Pseudo Code............................................................. 2-13
2.17.1(B) Steepest Ascent Hill Climbing ................................... 2-34
2.10.6 Performance Evaluation ............................................ 2-14
2.17.1(C) Limitations of Hill Climbing ........................................ 2-34
2.11 Bidirectional Search .................................................. 2-14
io led 2.17.1(D) Solutions on Problems in Hill Climbing ...................... 2-36
2.11.1 Concept ..................................................................... 2-14
2.17.2 Simulated Annealing ................................................. 2-36
2.11.2 Process ..................................................................... 2-14
2.17.2(A) Comparing Simulated Annealing with Hill Climbing ... 2-37
ic ow
2.22.1 Example of α-β Pruning ............................................. 2-56 3.9 Knowledge Engineering in First Order Logic .............. 3-24
2.22.3 Properties of α-β........................................................ 2-60 3.9.1 Knowledge Engineering Process ............................... 3-24
ge
3.1 A Knowledge Based Agent .......................................... 3-1 3.12.1 Introduction to Planning ............................................. 3-36
3.1.1 Architecture of a KB Agent .......................................... 3-2 3.12.2 Simple Planning Agent .............................................. 3-37
3.2.1 Description of the WUMPUS World ............................. 3-4 3.13.1 Problem Solving and Planning................................... 3-38
3.2.2 PEAS Properties of WUMPUS World .......................... 3-5 3.14 Goal of Planning ........................................................ 3-38
ic ow
3.2.3 Exploring a WUMPUS World ....................................... 3-6 3.14.1 Major Approaches ..................................................... 3-39
3.3.1 Role of Reasoning in AI ............................................... 3-9 3.16 Planning as State-Space Search ............................... 3-41
3.4 Representation of Knowledge using Rules .................. 3-9 3.16.1 Example of State Space Search ................................ 3-43
at
Pu ch
Introduction to Fuzzy Set: Fuzzy set theory, Fuzzy set 4.8.3 Disadvantages of FLSs ............................................. 4-43
versus crisp set, Crisp relation & fuzzy relations, 4.9 Solved Problems ....................................................... 4-43
membership functions, Fuzzy Logic: Fuzzy Logic basics,
4.10 Design of Controllers (Solved Problems) ................... 4-66
Fuzzy Rules and Fuzzy Reasoning, Fuzzy inference
systems: Fuzzification of input variables, defuzzification Chapter 5 : Artificial Neural Network 5-1 to 5-60
and fuzzy controllers.
Syllabus :
4.1 Introduction to Fuzzy Set ............................................. 4-1 Introduction : Fundamental concept : Basic Models of
ge
4.3 Fuzzy Set Theory ........................................................ 4-3
Architecture: Perceptron, Single layer Feed Forward ANN,
4.3.1 Fuzzy Set : Definition................................................... 4-3
io led Multilayer Feed Forward ANN, Activation functions,
4.3.2 Types of Universe of Discourse ................................... 4-3 Supervised Learning : Delta learning rule, Back
4.3.3 Different Notations for Propagation algorithm., Un-Supervised Learning algorithm
Representing Fuzzy Sets............................................. 4-5 : Self Organizing Maps
ic ow
4.3.5 Important Terminologies 5.1 Fundamental Concept - Artificial Neural Networks....... 5-1
n
related to Fuzzy Sets ................................................... 4-7 5.1.1 Artificial Neural Networks ............................................ 5-1
bl kn
4.3.6 Properties of Fuzzy Sets ........................................... 4-10 5.1.2 Biological Neural Networks .......................................... 5-2
at
5.5.4 Multilayer Feedback Networks ................................... 5-24 Hybrid Approach - Fuzzy Neural Systems Expert system :
5.6 Supervised Learning.................................................. 5-24 Introduction, Characteristics, Architecture, Stages in the
development of expert system,
5.6.1 Perceptron Learning Rule .......................................... 5-24
5.6.1(D) Model for Multilayer Perceptron ................................. 5-27 6.1.3 Fuzzy Neural System .................................................. 6-3
5.6.1(E) The EX-OR Problem and Need 6.1.4 Adaptive Neuro - Fuzzy Inference
ge
for Multi-layer Perceptron .......................................... 5-27 System (ANFIS) .......................................................... 6-4
5.6.1(F) Linearly and Non-Linearly Separable Patterns ........... 5-28 6.2 Expert System ............................................................. 6-6
io led
5.6.2 Delta Learning Rule ................................................... 5-31 6.2.1 Introduction ................................................................. 6-6
5.6.2(A) Delta Learning Model ................................................ 5-32 6.2.2 Characteristics of Expert Systems ............................... 6-7
5.6.2(C) Algorithm-Delta Learning ........................................... 5-35 6.3 Building Blocks of Expert Systems .............................. 6-8
n
5.6.3 Back Propagation Algorithm ...................................... 5-36 6.4 Development Phases of Expert Systems................... 6-11
bl kn
5.6.3(A) Architecture of Back Propagation Network ................ 5-37 6.5 Representing and Using Domain Knowledge............. 6-11
5.6.3(B) Algorithm (Error Back Propagation Training) ............. 5-39 6.6 Expert System-Shell .................................................. 6-15
at
Pu ch
5.8 Solved Problems ....................................................... 5-42 6.8.1 Knowledge Elicitation ................................................ 6-18
Introduction to Artificial
1 Intelligence(AI) and Soft Computing
Unit I
Syllabus
1.2 Intelligent Agents : Agents and Environments ,Rationality, Nature of Environment, Structure of Agent, types of
Agent
ge
1.3 Soft Computing: Introduction of soft computing, soft computing vs. hard computing, various types of soft
computing techniques. io led
1.1 Introduction to Artificial Intelligence
John McCarthy who has coined the word “Artificial Intelligence” in 1956, has defined AI as “the science and
ic ow
choose the next action based on the current state of the system, in short act intelligently or rationally. As it has a very
wide range of applications, it is truly a universal field.
at
Pu ch
In simple words, Artificial Intelligent System works like a Human Brain, where a machine or software shows
intelligence while performing given tasks; such systems are called intelligent systems or expert systems. You can say
Te
Q. Explain turing test designed for satisfactory operational definition of AI. (May 16, 5 Marks)
Definition 1 : “The art of creating machines that perform functions that requires intelligence when performed by
people.” (Kurzweil, 1990)
Definition 2 : “The study of how to make computers do things at which, at the moment, people are better.” (Rich and
Knight, 1991)
AI&SC (MU-Sem. 7-Comp) 1-2 Intro. to Artificial Intelligence(AI) and Soft Computing
To judge whether the system can act like a human, Sir Alan Turing had designed a test known as Turing test.
As shown in Fig. 1.2.1, in Turing test, a computer needs to interact with a human interrogator by answering his
questions in written format. Computer passes the test if a human interrogator, cannot identify whether the written
responses are from a person or a computer. Turing test is valid even after 60 year of research.
ge
For this test, the computer would need to possess the following capabilities:
1. Natural Language Processing (NLP) : This unit enables computer to interpret the English language and
io led
communicate successfully.
2. Knowledge Representation : This unit is used to store knowledge gathered by the system through input devices.
ic ow
3. Automated Reasoning: This unit enables to analyze the knowledge stored in the system and makes new
inferences to answer questions.
n
bl kn
4. Machine Learning: This unit learns new knowledge by taking current input from the environment and adapts to
new circumstances, thereby enhancing the knowledgebase of the system.
at
Pu ch
To pass total Turing test, the computer will also need to have computer vision, which is required to perceive objects
from the environment and Robotics, to manipulate those objects.
Te
Fig. 1.2.2 lists all the capabilities a computer needs to have in order to exhibit artificial intelligence. Mentioned above
are the six disciplines which implement most of the artificial intelligence.
Definition1 : “The exciting new effort to make computers think ... machines with minds, in the full and literal sense”.
(Haugeland, 1985)
Definition 2 : “The automation of activities that we associate with human thinking, activities such as decision making,
problem solving, learning ...” (Hellman, 1978)
Cognitive science :It is interdisciplinary field which combines computer models from Artificial Intelligence with the
techniques from psychology in order to construct precise and testable theories for working of human mind.
AI&SC (MU-Sem. 7-Comp) 1-3 Intro. to Artificial Intelligence(AI) and Soft Computing
In order to make machines think like human, we need to first understand how human think. Research showed that
there are three ways using which human’s thinking pattern can be caught.
1. Introspection through which human can catch their own thoughts as they go by.
2. Psychological experiments can be carried out by observing a person in action.
3. Brain imaging can be done by observing the brain in action.
By catching the human thinking pattern, it can be implemented in computer system as a program and if the program's
input output matches with that of human, then it can be claimed that the system can operate like humans.
Definition1 : “The study of mental faculties through the use of computational models”. (Charniak and McDermott,
1985)
ge
Definition2 : “The study of the computations that make it possible to perceive, reason, and act”.
The laws of thought are supposed to implement the operation of the mind and their study initiated the field called
io led
logic. It provides precise notations to express facts of the real world.
It also includes reasoning and “right thinking” that is irrefutable thinking process. Also computer programs based on
those logic notations were developed to create intelligent systems.
ic ow
1. This approach is not suitable to use when 100% knowledge is not available for any problem.
2. As vast number of computations was required even to implement a simple human reasoning process; practically, all
at
Pu ch
problems were not solvable because even problems with just a few hundred facts can exhaust the computational
resources of any computer.
Te
Rational Agent
Agents perceive their environment through sensors over a prolonged time period and adapt to change to create and
pursue goals and take actions through actuators to achieve those goals. A rational agent is the one that does “right”
things and acts rationally so as to achieve the best outcome even when there is uncertainty in knowledge.
The rational-agent approach has two advantages over the other approaches
1. As compared to other approaches this is the more general approach as, rationality can be achieved by selecting
the correct inference from the several available.
2. Rationality has specific standards and is mathematically well defined and completely general and can be used to
develop agent designs that achieve it. Human behavior, on the other hand, is very subjective and cannot be
proved mathematically.
AI&SC (MU-Sem. 7-Comp) 1-4 Intro. to Artificial Intelligence(AI) and Soft Computing
The two approaches namely, thinking humanly and thinking rationally are based on the reasoning expected from
intelligent systems while; the other two acting humanly and acting rationally are based on the intelligent behaviour
expected from them.
As AI is a very broad concept, there are different types or forms of AI. The critical categories of AI can be based on the
capacity of intelligent program or what the program is able to do. Under this consideration there are three main
categories:
Weak AI is AI that specializes in one area. It is not a general purpose intelligence. An intelligent agent is built to solve a
ge
particular problem or to perform a specific task is termed as narrow intelligence or weak AI. For example, it took years
of AI development to be able to beat the chess grandmaster, and since then we have not been able to beat the
io led
machines at chess. But that is all it can do, which is does extremely well.
Strong AI or general AI refers to intelligence demonstrated by machines in performing any intellectual task that human
can perform. Developing strong AI is much harder than developing weak AI. Using artificial general intelligence
n
machines can demonstrate human abilities like reasoning, planning, problem solving, comprehending complex ideas,
bl kn
learning from self experiences, etc. Many companies, corporations’ are working on developing a general intelligence
but they are yet to complete it.
at
Pu ch
As defined by a leading AI thinker Nick Bostrom, “Super intelligence is an intellect that is much smarter than the best
human brains in practically every field, including scientific creativity, general wisdom and social skills.” Super
intelligence ranges from a machine which is just a little smarter than a human to a machine that is trillion times
smarter. Artificial super intelligence is the ultimate power of AI.
1.4 Components of AI
AI is a vast field for research and it has got applications in almost all possible domains. By keeping this in mind,
components of AI can be identified as follows: (Refer Fig.1.4.1)
1. Perception
2. Knowledge representation
3. Learning
4. Reasoning
5. Problem solving
1. Perception
In order to work in the environment, intelligent agents need to scan the environment and the various objects in it.
Agent scans the environment using various sense organs like camera, temperature sensor, etc. This is called as
perception. After capturing various scenes, perceiver analyses the different objects in it and extracts their features
and relationships among them.
ge
2. Knowledge representation io led
The information obtained from environment through sensors may not be in the format required by the system.
Hence, it need to be represented in standard formats for further processing like learning various patterns, deducing
inference, comparing with past objects, etc. There are various knowledge representation techniques like Prepositional
ic ow
Learning is a very essential part of AI and it happens in various forms. The simplest form of learning is by trial and
error. In this form the program remembers the action that has given desired output and discards the other trial
at
Pu ch
actions and learns by itself. It is also called as unsupervised learning. In case of rote learning, the program simply
remembers the problem solution pairs or individual items. In other case, solution to few of the problems is given as
Te
input to the system, basis on which the system or program needs to generate solutions for new problems. This is
known as supervised learning.
4. Reasoning
Reasoning is also called as logic or generating inferences form the given set of facts. Reasoning is carried out based on
strict rule of validity to perform a specified task. Reasoning can be of two types, deductive or inductive. The deductive
reasoning is in which the truth of the premises guarantees the truth of the conclusion while, in case of inductive
reasoning, the truth of the premises supports the conclusion, but it cannot be fully dependent on the premises. In
programming logic generally deductive inferences are used. Reasoning involves drawing inferences that are relevant
to the given problem or situation.
5. Problem-solving
AI addresses huge variety of problems. For example, finding out winning moves on the board games, planning actions
in order to achieve the defined task, identifying various objects from given images, etc. As per the types of problem,
there is variety of problem solving strategies in AI. Problem solving methods are mainly divided into general purpose
methods and special purpose methods. General purpose methods are applicable to wide range of problems while,
special purpose methods are customized to solve particular type of problems.
AI&SC (MU-Sem. 7-Comp) 1-6 Intro. to Artificial Intelligence(AI) and Soft Computing
Natural Language Processing, involves machines or robots to understand and process the language that human speak,
and infer knowledge from the speech input. It also involves the active participation from machine in the form of dialog
i.e. NLP aims at the text or verbal output from the machine or robot. The input and output of an NLP system can be
speech and written text respectively.
Computational Intelligence is the study of the design of Artificial Intelligence is study of making machines which
intelligent agents can do things which at presents human do better.
ge
CI constructs the system starting from the bottom level AI analyses the overall structure of an intelligent system
computations, hence follows bottom-up approach. by following top down approach.
io led
CI concentrates on low level cognitive function AI concentrates of high level cognitive structure design.
implementation.
ic ow
The term Artificial Intelligence (AI) was introduced by John McCarthy, in 1955. He defined artificial intelligence as
“The science and engineering of making intelligent machines”.
at
Pu ch
Mathematician Alan Turing and others presented a study based on logic driven computational theories which showed
that any computer program can work by simply shuffling “0” and “1” (i.e. electricity off and electricity on). Also, during
that time period, research was going on in the areas like Automations, Neurology, Control theory, Information theory,
Te
etc.
This inspired a group of researchers to think about the possibility of creating an electronic brain. In the year 1956 a
conference was conducted at the campus of Dartmouth College where the field of artificial intelligence research was
founded.
This conference was attended by John McCarthy, Marvin Minsky, Allen Newell and Herbert Simon, etc., who are
supposed to be the pioneers of artificial intelligence research for a very long time. During that time period, Artificial
Intelligence systems were developed by these researchers and their students.
During that time period these founders predicted that in few years machines can do any work that a man can do, but
they failed to recognize the difficulties which can be faced.
AI&SC (MU-Sem. 7-Comp) 1-7 Intro. to Artificial Intelligence(AI) and Soft Computing
Meanwhile we will see the ideas, viewpoints and techniques which Artificial Intelligence has inherited from other
disciplines. They can be given as follows :
1. Philosophy : Theories of reasoning and learning have emerged, along with the viewport that the mind is
constituted by the operation of a physical system.
2. Mathematical : Formal theories of logic, probability, decision making and computation have emerged.
3. Psychology : Psychology has emerged tools to investigate the human mind and a scientific language which are
used to express the resulting theories.
4. Linguistic : Theories of the structure and meaning of language have emerged.
5. Computer science : The tools which can make artificial intelligence a reality has emerged.
ge
You must have seen use of Artificial Intelligence in many SCI-FI movies. To name a few we have I Robot, Wall-E, The
Matrix Trilogy, Star Wars, etc. movies. Many a times these movies show positive potential of using AI and sometimes
io led
also emphasize the dangers of using AI. Also there are games based on such movies, which show us many probable
applications of AI.
Artificial Intelligence is commonly used for problem solving by analyzing or/and predicting output for a system. AI can
ic ow
provide solutions for constraint satisfaction problems. It is used in wide range of fields for example in diagnosing
n
diseases, in business, in education, in controlling a robots, in entertainment field, etc.
bl kn
at
Pu ch
Te
Fig. 1.6.1 shows few fields in which we have applications of artificial intelligence. There can be many fields in which
Artificially Intelligent Systems can be used.
1. Education
Training simulators can be built using artificial intelligence techniques. Software for pre-school children are developed
to enable learning with fun games. Automated grading, Interactive tutoring, instructional theory are the current areas
of application.
2. Entertainment
Many movies, games, robots are designed to play as a character. In games they can play as an opponent when human
player is not available or not desirable.
3. Medical
AI has applications in the field of cardiology (CRG), Neurology (MRI), Embryology (Sonography), complex operations of
internal organs, etc. It can be also used in organizing bed schedules, managing staff rotations, store and retrieve
information of patient. Many expert systems are enabled to predict the decease and can provide with medical
prescriptions.
AI&SC (MU-Sem. 7-Comp) 1-8 Intro. to Artificial Intelligence(AI) and Soft Computing
4. Military
Training simulators can be used in military applications. Also areas where human cannot reach or in life stacking
conditions, robots can be very well used to do the required jobs. When decisions have to be made quickly taking into
account an enormous amount of information, and when lives are at stake, artificial intelligence can provide crucial
assistance. From developing intricate flight plans to implementing complex supply systems or creating training
simulation exercises, AI is a natural partner in the modern military.
5. Business and Manufacturing
Latest generation of robots are equipped well with the performance advances, growing integration of vision and an
enlarging capability to transform manufacturing.
Intelligent planners are available with AI systems, which can process large datasets and can consider all the
ge
constraints to design plans satisfying all of them.
7. Voice technology
io led
Voice recognition is improved a lot with AI. Systems are designed to take voice inputs which are very much applicable
in case of handicaps. Also scientists are developing an intelligent machine to emulate activities of a skillful musician.
ic ow
Composition, performance, sound processing, music theory are some of the major areas of research.
n
8. Heavy industry
bl kn
Huge machines involve risk in operating and maintaining them. Human robots are better replacing human operators.
These robots are safe and efficient. Robot are proven to be effective as compare to human in the jobs of repetitive
at
Pu ch
AI Applications can be roughly classified based on the type of tools/approaches used for inoculating intelligence in the
system, forming sub areas of AI. Various sub domains/ areas in intelligent systems can be given as follows; Natural
Language Processing, Robotics, Neural Networks and Fuzzy Logic. Fig. 1.7.1 shows these areas in Intelligent Systems.
1. Natural language processing : One of the application of AI is in field of Natural Language Processing (NLP). NLP
enables interaction between computers and human (natural) language. Practical applications of NLP are in
machine translation (e.g. Lunar System), information retrieval, text categorization, etc. Few more applications are
extracting 3D information using vision, speech recognition, perception, image formation.
AI&SC (MU-Sem. 7-Comp) 1-9 Intro. to Artificial Intelligence(AI) and Soft Computing
2. Robotics : One more major application of AI is in Robotics. Robot is an active agent whose environment is the
physical world. Robots can be used in manufacturing and handling material, in medical field, in military, etc. for
automating the manual work.
3. Neural networks : Another application of AI is using Neural Networks. Neural Network is a system that works like
a human brain/nervous system. It can be useful for stock market analysis, in character recognition, in image
compression, in security, face recognition, handwriting recognition, Optical Character Recognition (OCR), etc.
4. Fuzzy logic : Apart from these AI systems are developed with the help of Fuzzy Logic. Fuzzy Logic can be useful in
making approximations rather than having a fixed and exact reasoning for a problem. You must have seen systems
th
like AC, fridge, washing machines which are based on fuzzy logic (they call it “6 sense technology!”).
ge
Artificial Intelligence has touched each and every aspect of our life. From washing machine, Air conditioners, to smart
phones everywhere AI is serving to ease our life. In industry, AI is doing marvellous work as well. Robots are doing the
sound work in factories. Driverless cars have become a reality. WiFi-enabled Barbie uses speech-recognition to talk and
io led
listen to children. Companies are using AI to improve their product and increase sales. AI saw significant advances in
machine learning. Following are the areas in which AI is showing significant advancements.
1. Deep learning
ic ow
Convolutional Neural Networks enabling the concept of deep learning is the top most area of focus in Artificial
n
intelligence in todays’ era. Many problems and applications areas of AI like, natural language and text processing,
bl kn
speech recognition, computer vision, information retrieval, and multimodal information processing empowered by
at
2. Machine learning
Te
The goal of machine learning is to program computers to use example data or past experience to solve a given
problem. Many successful applications of machine learning include systems that analyze past sales data to predict
customer behaviour, optimize robot behaviour so that a task can be completed using minimum resources, and extract
knowledge from bioinformatics data.
3. AI replacing workers
In industry where there are safety hazards, robots are doing a good job. Human resources are getting replaced by
robots rapidly. People are worries to see that the white color jobs of data processing are being done exceedingly well
by intelligent programs. A study from The National Academy of Sciences brought together technologists and
economists and social scientists to figure out what's going to happen.
The concepts of smarter homes, smarter cars and smarter world is evolving rapidly with the invention of internet of
things. The future is no far when each and every object will be wirelessly connected to something in order to perform
soma smart actions without any human instructions or interference. The worry is how the mined data can potentially
be exploited.
AI&SC (MU-Sem. 7-Comp) 1-10 Intro. to Artificial Intelligence(AI) and Soft Computing
5. Emotional AI
Emotional AI, where AI can detect human emotions, is another upcoming and important area of research. Computers
ability to understand speech will lead to an almost seamless interaction between human and computer. With
increasingly accurate cameras, voice and facial recognition, computers are better able to detect our emotional state.
Researchers are exploring how this new knowledge can be used in education, to treat depression, to accurately
predict medical diagnoses, and to improve customer service and shopping online.
Using AI, customers’ buying patterns, behavioral patterns can be studied and systems that can predict the purchase or
can help customer to figure out the perfect item. AI cab be used to find out what will make the customer happy or
unhappy. For example, if a customer is shopping online, like a dress pattern but needs dark shades and thick material,
computer understand the need and brings out new set of perfectly matching clothing for him.
7. Ethical AI
ge
With all the evolution happening in technology in every walk of life, ethics must be considered at the forefront of
research. For example, in case of driverless car, while driving, if the decision has to be made between weather to dash
io led
a cat or a lady having both in an uncontrollable distance in front of the car, is an ethical decision. In such cases how
the programming should decide who is more valuable, is a question. These are not the problems to be solved by
computer engineers or research scientists but someone has to come up with an answer.
ic ow
Agent is something that perceives its environment through sensors and acts upon that environment through effectors
Pu ch
environment are called as sensors. Sensors collect percepts or inputs from environment and passes it to the
processing unit.
Actuators or effectors are the organs or tools using which the agent acts upon the environment. Once the sensor
senses the environment, it gives this information to nervous system which takes appropriate action with the help of
actuators.
In case of human agents we have hands, legs as actuators or effectors.
ge
After understanding what an agent is, let’s try to figure out sensor and actuator for a robotic agent, can you think of
sensors and actuators in case of a robotic agent?
io led
The robotic agent has cameras, infrared range finders, scanners, etc. used as sensors, while various types of motors,
screen, printing devices, etc. used as actuators to perform action on given input.
ic ow
n
bl kn
at
Pu ch
Te
The agent function is the description of what all functionalities the agent is supposed to do. The agent function
provides mapping between percept sequences to the desired actions. It can be represented as [f: P* A]
Agent program is a computer program that implements agent function in an architecture suitable language. Agent
programs needs to be installed on a device in order to run the device accordingly. That device must have some form of
sensors to sense the environment and actuators to act upon it. Hence agent is a combination of the architecture
hardware and program software.
Take a simple example of vacuum cleaner agent. You might have seen vacuum cleaner agent in “WALL-E”(animated
movie). Let’s understand how to represent the percept’s (input) and actions (outputs) used in case of a vacuum
cleaner agent.
AI&SC (MU-Sem. 7-Comp) 1-12 Intro. to Artificial Intelligence(AI) and Soft Computing
As shown in Fig. 1.9.4, there are two blocks A and B having some dirt. Vacuum cleaner agent supposed to sense the
dirt and collect it, thereby making the room clean. In order to do that the agent must have a camera to see the dirt
and a mechanism to move forward, backward, left and right to reach to the dirt. Also it should absorb the dirt. Based
on the percepts, actions will be performed. For example : Move left, Move right, absorb, No Operation.
Hence the sensor for vacuum cleaner agent can be camera, dirt sensor and the actuator can be motor to make it
ge
move, absorption mechanism. And it can be represented as :
[A, Dirty], [B, Clean], [A, absorb],[B, Nop], etc.
io led
1.9.2 Definitions of Agent
ic ow
There are various definitions exist for an agent. Let’s see few of them.
IBM states that agents are software entities that carry out some set of operations on behalf of a user or another
n
program.
bl kn
FIPA : Foundation for Intelligent Physical Agents (FIPA) terms that, an agent is a computational process that
at
Pu ch
Another definition is given as “An agent is anything that can be viewed as perceiving its environment through sensors
Te
By Russell and Norvig, F. Mills and R. Stuffle beam’s definition says that “An agent is anything that is capable of acting
upon information it perceives. An intelligent agent is an agent capable of making decisions about how it acts based on
experience”.
AI&SC (MU-Sem. 7-Comp) 1-13 Intro. to Artificial Intelligence(AI) and Soft Computing
From above definitions we can understand that an agent is: (As per Terziyan, 1993)
o Goal-oriented o Creative
o Adaptive o Mobile
o Social o Self-configurable
ge
In case of intelligent agents, the software modules are responsible for exhibiting intelligence. Generally observed
capabilities of an intelligent agent can be given as follows:
io led
o Ability to remain autonomous (Self-directed)
o Responsive
o Goal-Oriented
ic ow
Intelligent agent is the one which can take input from the environment through its sensors and act upon the
n
environment through its actuators. Its actions are always directed to achieve a goal.
bl kn
some hot element), then it asks your brain if it knows “what action should be taken when you go near hot elements?”
Now the brain will inform your hands (actuators) that you should immediately take it away from the hot element
otherwise it will burn. Once this signal reaches your hand you will take your hand away from the hot pan.
The agent keeps taking input from the environment and goes through these states every time. In above example, if
your action takes more time then in that case your hand will be burnt.
So the new task will be to find solution if the hand is burnt. Now, you think about the states which will be followed in
this situation. As per Wooldridge and Jennings, “An intelligent agent is one that is capable of taking flexible
self-governed actions”.
They say for an intelligent agent to meet design objectives, flexible means three things:
1. Reactiveness 2. Pro-activeness
3. Social ability
1. Reactiveness : It means giving reaction to a situation in a stipulated time frame. An agent can perceive the
environment and respond to the situation in a particular time frame. In case of reactiveness, reaction within
ge
situation time frame is more important. You can understand this with above example, where, if an agent takes
more time to take his hand away from the hot pan then agents hand will be burnt.
2.
io led
Pro-activeness : It is controlling a situation rather than just responding to it. Intelligent agent show goal-directed
behavior by taking the initiative. For example : If you are playing chess then winning the game is the main
objective. So here we try to control a situation rather than just responding to one-one action which means that
ic ow
killing or losing any of the 16 pieces is not important, whether that action can be helpful to checkmate your
opponent is more important.
n
3. Social ability : Intelligent agents can interact with other agents (also humans). Take automatic car driver example,
bl kn
where agent might have to interact with other agent or a human being while driving the car.
Following are few more features of an intelligent agent.
at
Pu ch
o Self-Learning : An intelligent agent changes its behaviour based on its previous experience. This agent keeps
updating its knowledge base all the time.
Te
o Movable/Mobile : An Intelligent agent can move from one machine to another while performing actions.
o Self-governing : An Intelligent agent has control over its own actions.
Q. Define rationality and rational agent. Give an example of rational action performed by any intelligent agent.
(Dec. 15, 5 Marks)
For problem solving, if an agent makes a decision based on some logical reasoning, then, the decision is called as a
“Rational Decision”. The way humans have ability to make right decisions, based on his/her experience and logical
reasoning; an agent should also be able to make correct decisions, based on what it knows from the percept sequence
and actions which are carried out by that agent from its knowledge.
Agents perceive their environment through sensors over a prolonged time period and adapt to change to create and
pursue goals and take actions through actuators to achieve those goals. A rational agent is the one that does “right”
things and acts rationally so as to achieve the best outcome even when there is uncertainty in knowledge.
A rational agent is an agent that has clear preferences, can model uncertainty via expected values of variables or
functions of variables, and always chooses to perform the action with the optimal expected outcome for itself from
among all feasible actions. A rational agent can be anything that makes decisions, typically a person, a machine, or
software program.
AI&SC (MU-Sem. 7-Comp) 1-15 Intro. to Artificial Intelligence(AI) and Soft Computing
Rationality depends on four main criteria: First is the performance measure which defines the criterion of success for
an agent, second is the agent's prior knowledge of the environment, and third is the action performed by the agent
and the last one is agent's percept sequence to date.
Performance measure is one of the major criteria for measuring success of an agent's performance. Take a
vacuum-cleaner agent's example. The performance measure of a vacuum-cleaner agent can depend upon various
factors like it's dirt cleaning ability, time taken to clean that dirt, consumption of electricity, etc.
For every percept sequence a built-in knowledge base is updated, which is very useful for decision making, because it
stores the consequences of performing some particular action. If the consequences direct to achieve desired goal then
we get a good performance measure factor, else, if the consequences do not lead to desired goal state, then we get a
poor performance measure factor.
ge
io led
ic ow
(a) Agent's finger is hurt while using nail and hammer (b) Agent is using nail and hammer efficiently
n
Fig. 1.10.1
bl kn
For example, see Fig.1.10.1. If agent hurts his finger while using nail and hammer, then, while using it for the next time
agent will be more careful and the probability of not getting hurt will increase. In short agent will be able to use the
at
Pu ch
Q. Describe different types of environments applicable to AI agents. (Dec. 13, May 15, 10 Marks)
AI&SC (MU-Sem. 7-Comp) 1-16 Intro. to Artificial Intelligence(AI) and Soft Computing
ge
io led
ic ow
n
bl kn
at
Pu ch
Te
Environments are called partially observable when sensors cannot provide errorless information at any given
time for every internal state, as the environment is not seen completely at any point of time.
Also there can be unobservable environments where the agent sensors fail to provide information about internal
states.
For example, In case of an automated car driver system, automated car cannot predict what the other drivers are
thinking while driving cars. Only because of the sensor’s information gathering expertise it is possible for an
automated car driver to take the actions.
Multi-agent environment is further classified as Co-operative multi-agent and Competitive multi-agent. Now, you
might be thinking in case of an automated car driver system which type of agent environment do we have?
Let's understand it with the help of an automated car driving example. For a car driving system 'X', other car say
'Y' is considered as an Agent. When 'Y' tries to maximize its performance measure and the input taken by car 'Y'
depends on the car 'X'. Thus it can be said that for an automated car driving system we have a cooperative
multi-agent environment.
Whereas in case of “chess game” when two agents are operating as opponents, and trying to maximize their own
performance, they are acting in competitive multi agent environment.
An environment is called deterministic environment, when the next state of the environment can be completely
determined by the previous state and the action executed by the agent.
For example, in case of vacuum cleaner world, 8-puzzle problem, chess game the next state of the environment
ge
solely depends on the current state and the action performed by agent.
Stochastic environment generally means that the indecision about the actions is enumerated in terms of
io led
probabilities. That means environment changes while agent is taking action, hence the next state of the world
does not merely depends on the current state and agent’s action. And there are few changes happening in the
environment irrespective of the agent’s action. An automated car driving system has a stochastic environment as
ic ow
the other player. In such cases if we have partially observable environment then the environment is considered
to be stochastic.
at
Pu ch
If the environment is deterministic except for the actions of other agents, then the environment is strategic. That
is, in case of game like chess, the next state of environment does not only depend upon the current action of
Te
agent but it is also influenced by the strategy developed by both the opponents for future moves.
We have one more type of environment in this category. That is when the environment types are not fully
observable or non-deterministic; such type of environment is called as uncertain environment.
An episodic task environment is the one where each of the agent's action is divided into an atomic incidents or
episodes. The current incident is different than the previous incident and there is no dependency between the
current and the previous incident. In each incident the agent receives an input from environment and then
performs a corresponding action.
Generally, classification tasks are considered as episodic. Consider an example of pick and place robot agent,
which is used to detect defective parts from the conveyor belt of an assembly line. Here, every time agent will
make the decision based on current part, there will not be any dependency between the current and previous
decision.
In sequential environments, as per the name suggests, the previous decision can affect all future decisions. The
next action of the agent depends on what action he has taken previously and what action he is supposed to take
in future.
AI&SC (MU-Sem. 7-Comp) 1-18 Intro. to Artificial Intelligence(AI) and Soft Computing
For example, in checkers where previous move can affect all the following moves. Also sequential environment
can be understood with the help of an automatic car driving example where, current decision can affect the next
decisions. If agent is initiating breaks, then he has to press clutch and lower down the gear as next consequent
actions.
ge
For example, In chess game or any puzzle like block word problem or 8-puzzle if we introduce timer, and if agent’s
performance is calculated by time taken to play the move or to solve the puzzle, then it is called as semi-dynamic
environment.
io led
Lastly, if the environment changes while an agent is performing some task, then it is called dynamic environment.
In this type of environment agent's sensors have to continuously keep sending signals to agent about the current
ic ow
state of the environment so that appropriate action can be taken with immediate effect.
Automatic car driver example comes under dynamic environment as the environment keeps changing all the
n
time.
bl kn
You have seen discrete and continuous signals in old semesters. When you have distinct, quantized, clearly
defined values of a signal it is considered as discrete signal.
Te
Same way, when there are distinct and clearly defined inputs and outputs or precepts and actions, then it is called
a discrete environment. For example : chess environment has a finite number of distinct inputs and actions.
When a continuous input signal is received by an agent, all the precepts and actions cannot be defined
beforehand then it is called continuous environment. For example : An automatic car driving system.
In a known environment, the output for all probable actions is given. Obviously, in case of unknown
environment, for an agent to make a decision, it has to gain knowledge about - how the environment works.
Table 1.11.1 summarizes few task environment and their characteristics.
Table 1.11.1 : Task environments
Task environment Car driving Part – Picking Cross word Soccer game Checkers with
Robot puzzle clock
Agents Multi agent Single agent single Multi agent Multi agent
(cooperative) (competitive) (competitive)
Task environment Car driving Part – Picking Cross word Soccer game Checkers with
Robot puzzle clock
Q. Give PEAS description for a robot soccer player. Characterize its environment. (May 16, 5 Marks)
ge
Q. What are PEAS descriptor ? Give PEAS descriptors for Part – picking Robot. (May 13, Dec. 14, 3 Marks)
PEAS : PEAS stands for Performance Measure, Environment, Actuators, and Sensors. It is the short form used for
io led
performance issues grouped under Task Environment.
You might have seen driverless/ self driving car videos of Audi/ Volvo/ Mercedes, etc. To develop such driverless cars
we need to first define PEAS parameters.
ic ow
Performance Measure : It the objective function to judge the performance of the agent. For example, in case of pick
n
and place robot, number of correct parts in a bin can be the performance measure.
bl kn
Environment : It the real environment where the agent need to deliberate actions.
Actuators : These are the tools, equipment or organs using which agent performs actions in the environment. This
at
Pu ch
(i) Safety : Automated system should be able to drive the car safely without dashing anywhere.
(ii) Optimum speed : Automated system should be able to maintain the optimal speed depending upon the
surroundings.
(iii) Comfortable journey : Automated system should be able to give a comfortable journey to the end user, i.e.
depending upon the road it should ensure the comfort of the end user.
(iv) Maximize profits : Automated system should provide good mileage on various roads, the amount of energy
consumed to automate the system should not be very high, etc. such features ensure that the user is
benefited with the automated features of the system and it can be useful for maximizing the profits.
2. Environment
(i) Roads : Automated car driver should be able to drive on any kind of a road ranging from city roads to
highway.
AI&SC (MU-Sem. 7-Comp) 1-20 Intro. to Artificial Intelligence(AI) and Soft Computing
(ii) Traffic conditions : You will find different set of traffic conditions for different type of roads. Automated
system should be able to drive efficiently in all types of traffic conditions. Sometimes traffic conditions are
formed because of pedestrians, animals, etc.
(iii) Clients : Automated cars are created depending on the client’s environment. For example, in some countries
you will see left hand drive and in some countries there is a right hand drive. Every country/state can have
different weather conditions. Depending upon such constraints automated car driver should be designed.
(i) Steering wheel which can be used to direct car in desired direction (i.e. right/left)
(ii) Aaccelerator, gear, etc. can be useful to increase or decrease the speed of the car.
ge
(iv) Light signal, horn can be very useful as indicators for an automated car.
4. Sensors: To take input from environment in car driving example cameras, sonar system, speedometer, GPS,
io led
engine sensors, etc. are used as sensors.
(iii) Actuators : Arm with tooltips, to pick and drop parts from one place to another.
at
Pu ch
(iv) Sensors : Camera to scan the position from where part should be picked and joint angle sensors which are used
to sense the obstacles and move in appropriate place.
Te
a. Healthy patient: system should make use of sterilized instruments to ensure the safety (healthiness) of the
patient.
b. Minimize costs : The automated system results should not be very costly otherwise overall expenses of the
patient may increase, Lawsuits. Medical diagnosis system should be legal.
(iv) Actuators : Keyboard and mouse which is useful to make entry of symptoms, findings, patient's answers to given
questions. Scanner to scan the reports, camera to click pictures of patients.
Depending upon the degree of intelligence and ability to achieve the goal, agents are categorized into five basic types.
These five types of agents are depicted in the Fig. 1.12.1.
ge
io led Fig.1.12.1 : Types of agents
An agent which performs actions based on the current input only, by ignoring all the previous inputs is called as simple
reflex agent.
It is a totally uncomplicated type of agent. The simple reflex agent's function is based on the situation and its
corresponding action (condition-action protocol). If the condition is true, then matching action is taken without
considering the percept history.
You can understand simple reflexes with the help of a real life example, say some object approaches eye then, you will
blink your eye. This type of simple reflex is called natural/innate reflex.
Consider the example of the vacuum cleaner agent. It is a simple reflex agent, as its decision is based only on whether
the current location contains dirt. The agent function is tabulated in Table 1.12.1.
AI&SC (MU-Sem. 7-Comp) 1-22 Intro. to Artificial Intelligence(AI) and Soft Computing
Few possible input sequences and outputs for vacuum cleaner world with 2 locations are considered for simplicity.
Table 1.12.1
ge
io led
{A, dirt} Suck
ic ow
n
{B, dirt} Suck
bl kn
at
Pu ch
Te
: :
: :
AI&SC (MU-Sem. 7-Comp) 1-23 Intro. to Artificial Intelligence(AI) and Soft Computing
In case of above mentioned vacuum agent only one sensor is used and that is a dirt sensor. This dirt sensor can detect
if there is dirt or not. So the possible inputs are ‘dirt’ and ‘clean’.
Also the agent will have to maintain a database of actions, which will help to decide what output should be given by
an agent. Database will contain conditions like : If there is dirt on the floor to left or right then find out if there is dirt
in the next location and repeat these actions till the entire assigned area is cleaned then, vacuum cleaner should suck
that dirt. Else, dirt should move. Once the assigned area is fully covered, no other action should be taken until further
instruction.
If the vacuum cleaner agent keeps searching for dirt and clean area, then, it will surely get trapped in an infinite loop.
Infinite loops are unavoidable for simple reflex agents operating in partially observable environments. By randomizing
its actions the simple reflex agent can avoid these infinite loops. For example, on receiving {clean} as input, the
vacuum cleaner agent should either go to left or right direction.
ge
If the performance of an agent is of the right kind then randomized behaviour can be considered as rational in few
multi-agent environments. io led
1.12.2 Model-Based Reflex Agents
Partially observable environment cannot be handled well by simple reflex agents because it does not keep track on
ic ow
the previous state. So, one more type of agent was created that is model based reflex agent.
An agent which performs actions based on the current input and one previous input is called as model-based agent.
n
Partially observable environment can be handled well by model-based agent.
bl kn
From Fig. 1.12.3, it can be seen that once the sensor takes input from the environment, agent checks for the current
at
Pu ch
state of the environment. After that, it checks for the previous state which shows how the world is developing and
how the environment is affected by the action which was taken by the agent at earlier stage. This is termed as model
of the world.
Te
Once this is verified, based on the condition-action protocol an action is decided. This decision is given to effectors and
the effectors give this output to the environment.
AI&SC (MU-Sem. 7-Comp) 1-24 Intro. to Artificial Intelligence(AI) and Soft Computing
The knowledge about “how the world is changing” is called as a model of the world. Agent which uses such model
while working is called as the “model-based agent”.
Consider a simple example of automated car driver system. Here, the world keeps changing all the time. You must
have taken a wrong turn while driving on some or the other day of your life. Same thing applies for an agent. Suppose
if some car “X” is overtaking our automated driver agent “A”, then speed and direction in which “X” and “A” are
moving their steering wheels is important. Take a scenario where agent missed a sign board as it was overtaking other
car. The world around that agent will be different in that case.
Internal model based on the input history should be maintained by model-based reflex agent, which can reflect at
least some of the unobserved aspects of the current state. Once this is done it chooses an action in the same way as
the simple reflex agent.
ge
1.12.3 Goal-Based Agents
io led
ic ow
n
bl kn
at
Pu ch
Te
Model-based agents are further developed based on the “goal” information. This new type of agent is called as
goal-based agent. As the name suggests, Goal information will illustrate the situations that is desired. These agents
are provided with goals along with the model of the world. All the actions selected by the agent are with reference of
the specified goals. Goal based agents can only differentiate between goal states and non-goal states. Hence, their
performance can be 100% or zero.
The limitation of goal based agent comes with its definition itself. Once the goal is fixed, all the actions are taken to
fulfill it. And the agent loses flexibility to change its actions according to the current situation.
You can take example of a vacuum cleaning robot agent whose goal is to keep the house clean all the time. This agent
will keep searching for dirt in house and will keep the house clean all the time. Remember M-O the cleaning robot
from Wall-E movie which keeps cleaning all the time no matter what is the environment or the Healthcare companion
robot Baymax from Big Hero 6 which does not deactivate until user says that he/she is satisfied with care.
AI&SC (MU-Sem. 7-Comp) 1-25 Intro. to Artificial Intelligence(AI) and Soft Computing
Q. Explain utility-based agents with the help of neat diagram. (May 13, 10 Marks)
ge
io led
ic ow
n
Fig.1.12.5:Utility-based agents
bl kn
Utility function is used to map a state to a measure of utility of that state. We can define a measure for determining
at
Pu ch
how advantageous a particular state is for an agent. To obtain this measure utility function can be used.
The term utility is used to depict how “happy” the agent is to find out a generalized performance measure, various
Te
world states according to exactly how happy they would make an agent is compared.
Take one example; you might have used Google maps to find out a route which can take you from source location to
your destination location in least possible time. Same logic is followed by utility based automatic car driving agent.
Goals utility based automatic car driving agent can be used to reach given location safely within least possible time
and save fuel. So this car driving agent will check the possible routes and the traffic conditions on these routes and will
select the route which can take the car at destination in least possible time safely and without consuming much fuel.
Q. Explain the learning agent with the help of suitable diagram. (May 13, 10 Marks)
Q. Explain the structure of learning agent architecture. What is role of critic in learning ? (Dec. 13, May 15, 10 Marks)
Q. What are the basic building blocks of learning agent ? Explain each of them with a neat block diagram.
(Dec. 15, May 16, 8/10 Marks)
AI&SC (MU-Sem. 7-Comp) 1-26 Intro. to Artificial Intelligence(AI) and Soft Computing
ge
Why do you give mock tests ? When you get less marks for some question, you come to know that you have made
some mistake in your answer. Then you learn the correct answer and when you get that same question in further
io led
examinations, you write the correct answer and avoid the mistakes which were made in the mock test. This same
concept is followed by the learning agent.
Learning based agent is advantageous in many cases, because with its basic knowledge it can initially operate in an
ic ow
unknown environment and then it can gain knowledge from the environment based on few parameters and perform
actions to give better results.
n
Following are the components of learning agent :
bl kn
1. Critic
2. Learning element
at
Pu ch
3. Performance element
4. Problem generator
Te
1. Critic : It is the one who compares sensor’s input specifying effect of agent’s action on the environment with the
performance standards and generate feedback for leaning element.
2. Learning element : This component is responsible to learn from the difference between performance standards
and the feedback from critic. According to the current percept it is supposed to understand the expected
behavior and enhance its standards
3. Performance element : Based on the current percept received from sensors and the input obtained by the
learning element, performance element is responsible to choose the action to act upon the external
environment.
4. Problem generator : Based on the new goals learnt by learning agent, problem generator suggests new or
alternate actions which will lead to new and instructive understanding.
Soft computing is a collection of all the techniques that help us to construct computationally intelligent systems. It has
been now realized that, real world problems are complex, pervasively imprecise and uncertain. To solve such
problems, we require computationally intelligent systems that combine knowledge, techniques and methodologies
from various sources.
AI&SC (MU-Sem. 7-Comp) 1-27 Intro. to Artificial Intelligence(AI) and Soft Computing
Hard computing involves the traditional methods of computing that require precisely stated analytical models. They
often require more computational time. Examples of hard computing are:
ge
Solving Computational geometry problem etc.
Unlike hard computing, soft computing techniques are tolerant of imprecision, uncertainty, partial truth and
io led
approximation that are present in the real world problems. Examples of soft computing techniques are Neural
networks, Fuzzy logic, genetic algorithms etc.
1. Hard computing is a conventional type of computing that Soft computing techniques are imprecision,
requires a precisely stated analytic model. approximation and uncertainty tolerant.
at
Pu ch
2. Hard computing requires programs to be written. Soft computing techniques are model free.
They can evolve their own models and
programs.
Te
3. Hard computing is deterministic and uses two-valued logic. Soft computing is stochastic and uses
multi-valued logic such as fuzzy logic.
4. Hard computing needs exact data to solve a particular Soft computing can deal with incomplete,
problem. uncertain and noisy data.
5. Hard computing techniques perform sequential computation. Soft computing allows parallel
computations. E.g. Neural networks.
6. The solution or output of hard computing is precise. Soft computing can generate approximate
output or solution.
7. Hard computing is based on crisp logic, binary logic and Soft computing is based on neural
numerical analysis. networks, fuzzy logic, and evolutionary
computations etc.
8. Hard computing techniques are not fault tolerant. The reason Soft computing techniques are fault
is conventional programs and algorithms are built in such a tolerant due to their redundancy,
way that errors have serious consequences, unless enough adaptability and reduced precision
redundancy is added into the system. characteristics.
AI&SC (MU-Sem. 7-Comp) 1-28 Intro. to Artificial Intelligence(AI) and Soft Computing
Soft Computing is the fusion of different techniques that were designed to model and enable solutions to complex real
world problems.
These real world problems are the problems that are too difficult to model, mathematically.
These problems result from the fact that our world seems to be imprecise, uncertain and difficult to categorize.
The soft computing techniques are capable of handling such uncertainty, impreciseness and vagueness present in the
real world data.
Most of the Soft computing techniques are based on some biological inspired methodologies such as human nervous
systems, genetics, evolution, ant’s behaviors etc.
Soft Computing is the fusion of different techniques that were designed to model and enable solutions to complex real
world problems, which are not modeled or too difficult to model, mathematically.
ge
Soft computing consist several computing paradigms mainly are:
o Neural Network
io led
o Fuzzy Logic
o Evolutionary Algorithms such as Genetic algorithm
ic ow
Every paradigm of soft computing mentioned above has its own strength. In order to build a computationally
intelligent system, we may integrate multiple techniques or methodologies to take advantage of the strengths of each
n
of them. Such systems are called Hybrid soft computing systems.
bl kn
Table 1.15.1 summarizes the soft computing methodologies and their strengths.
at
Pu ch
The seamless integration of these methodologies forms the base of soft computing.
Neural networks have the capability of recognizing patterns and adapting themselves to cope with changing
environments.
The evolutionary algorithms such as Genetic Algorithms are search and optimization techniques based on biological
evolution that help us to optimize certain parameters in a given problem.
Fuzzy logic incorporates human knowledge and performs inference and decision making.
An Artificial Neural Network (ANN), inspired by the biological nervous system basically tries to mimic the working of a
human brain.
An ANN is composed of a large number of highly interconnected processing elements called neurons. All these
neurons work in parallel to solve a specific problem.
AI&SC (MU-Sem. 7-Comp) 1-29 Intro. to Artificial Intelligence(AI) and Soft Computing
4. Parallel organization of neural networks permits solutions to problems where multiple constraints must be satisfied
simultaneously.
5. Because of its parallel nature, when an element of the neural network fails, it can continue without any problem.
ge
Applications of Neural Networks
Neural networks have been successfully applied to a broad spectrum of data-intensive applications. Few of them are
io led
listed below.
(a) Forecasting
ic ow
Neural network can be used very effectively in forecasting exchange rates, predicting stock values, inflation and cash
forecasting, forecasting weather conditions etc. Researchers have proved that the forecasting accuracy of NN systems
n
bl kn
Digital images require a large amount of memory for storage. As a result, the transmission of image from one
computer to another can be very expensive in terms of time and bandwidth required.
Te
With the explosion of Internet, more sites are using images. Image compression is a technique that removes
some of the redundant information present in the image without affecting its perceptibility, thus, reducing the
storage size required to store the image.
NN can be effectively used to compress the image. Several NN techniques such as Kohonen’s self organizing
maps, Back propagation algorithm, Cellular neural network etc. can be used for image compression.
Neural networks have been applied successfully in industrial process control of dynamic systems.
Neural networks (especially multi layer perceptrons) have been proved to be the best choice for modelling non-
linear systems and implementing general – purpose non-linear controllers, due to their universal approximation
capabilities. For example control and management of agriculture machinery.
Well known application using image recognition is the Optical Character Recognition (OCR) tools that are
available with the standard scanning software for the home computer.
Scansoft has had great success in combining NN with a rule based system for correctly recognizing both
characters and words, to get a high level of accuracy.
AI&SC (MU-Sem. 7-Comp) 1-30 Intro. to Artificial Intelligence(AI) and Soft Computing
Customer Relationship Management requires key information to be derived from raw data collected for each
individual customer. This can be achieved by building models using historical data information.
Many companies are now using neural technology to help in their day to day business
processes. They are doing this to achieve better performance, greater insight, faster development and increased
productivity.
By using Neural Networks for data mining in the databases, patterns, however complex, can be identified for the
different types of customers, thus giving valuable customer information to the company.
Also, NN could be useful for important tasks related to CRM, such as forecasting call centre loading, demand and
sales levels, monitoring and analyzing the market, validating, completing and enhancing databases, clustering and
ge
profiling client base etc.
One example is the airline reservation system AMT, which could predict sales of tickets in relation to destination,
io led
time of year and ticket price.
Medicine is the field that has always taken benefits from the latest and advanced technologies.
n
Artificial Neural Networks (ANN) is currently the next promising area of interest in medical science.
bl kn
It is believed that neural networks will have extensive application to biomedical problems in the next few years.
at
Pu ch
ANN has already been successfully applied in medical applications such as diagnostic systems, bio chemical
analysis, disease detection, image analysis and drug development.
Te
Fuzzy logic is an approach to computing based on "degrees of truth" rather than the usual "true or false" (1 or 0)
Boolean logic on which the modern computer is based.
The idea of fuzzy logic was first proposed by [Link] Zadeh of the University of California at Berkeley in the 1960s.
The human brain can interpret imprecise, vague and incomplete information provided by sensory organs.
Fuzzy logic is a powerful mathematical tool that can deal with such imprecise, incomplete and uncertain information
present in complex real world problems.
Using fuzzy logic, it is now possible to include vague human assessment in computing problems.
Also, it provides an effective means for conflict resolution of multiple criteria and better assessment of options.
Fuzzy logic can be used in the development of various applications such as pattern recognition, optimization, control
applications, identification and any intelligent system for decision making.
cover a wider range of operating conditions, more readily customizable in natural language terms.
Fuzzy logic can be used in applications where human like decision making with an ability to generate precise
solutions from certain or approximate information is required.
Fuzzy logic has been extensively used in design of controllers for home appliances such as washing machine,
vacuum cleaner, air conditioner etc.
Fuzzy logic can also be used for other applications such as facial pattern recognition, anti-skid braking systems,
transmission systems, control of subway systems and unmanned helicopters.
ge
Another application area of fuzzy logic is development of knowledge-based systems for multi objective
optimization of power systems, weather forecasting systems, models for new product pricing or project risk
io led
assessment, medical diagnosis and treatment plans, and stock trading.
Fuzzy logic has been successfully used in numerous fields such as control systems engineering, image processing,
power engineering, industrial automation, robotics, consumer electronics and optimization.
ic ow
genetics.
at
Pu ch
GAs are often used to find optimal or near-optimal solutions to difficult problems which otherwise would take a
lifetime to solve.
Te
GA can efficiently explore a large space of candidate designs and find optimum solutions.
GAs were developed by John Holland and his students and colleagues at the University of Michigan,
In GAs, we select the initial pool or a population of possible solutions to the given problem.
These solutions then undergo various GA operations like recombination and mutation which in turn produce new
children.
Each individual or candidate solution is assigned a fitness value and the fitter individuals are given a higher chance to
mate and yield more “fitter” individuals. This is in line with the Darwinian Theory of “Survival of the Fittest”.
Thus GA keeps “evolving” better individuals or solutions over generations, till it reaches a stopping criterion.
Genetic Algorithms are sufficiently randomized in nature, but they perform much better than random local search.
GAs are easy to understand since they do not demand the knowledge of complex mathematics.
They can solve multimodal, non differentiable, non continuous or even NP-complete problems.
Useful when the search space is very large and there are a large number of parameters involved.
ge
for race cars to provide faster, lighter, more fuel efficient and safer vehicles for all the things we use vehicles for.
2. Engineering design : GA are most commonly used to optimize the structural and operational design of
io led
buildings, factories, machines, etc. GA s are used for optimizing the design of robot gripping arms, satellite booms,
building trusses turbines, , flywheels or any other computer-aided engineering design application. 3.
3. Robotics : GAs have found applications that span the range of architectures for intelligent robotics. GAs can be
ic ow
used to design the entirely new types of robots that can perform multiple tasks and have more general
application.
n
bl kn
Review Questions
at
Pu ch
Q. 3 Explain the various artificial intelligence problems and artificial intelligence techniques.
Q. 14 What are various agent environments ? Give PEAS representation for an agent.
Q. 16 Explain various types of intelligent agents, state limitations of each and how it is overcome in other type of agent.
ge
Q. 21 What are the constituents of Soft Computing ? Explain each in brief.
2.2 Uninformed Search Methods : Depth Limited Search, Depth First Iterative Deepening (DFID), Informed Search
Method : A* Search
2.3 Optimization Problems : Hill climbing Search, Simulated annealing, Genetic algorithm
ge
Search is an indivisible part of intelligence. An intelligent agent is the one who can search and select the most
io led
appropriate action in the given situation, among the available set of actions. When we play any game like chess, cards, tic-
tac-toe, etc.; we know that we have multiple options for next move, but the intelligent one who searches for the correct
move will definitely win the game. In case of travelling salesman problem, medical diagnosis system or any expert system;
ic ow
all they required to do is to carry out search which will produce the optimal path, the shortest path with minimum cost and
n
efforts. Hence, this chapter focuses on the searching techniques used in AI applications. Those are known as un-informed
bl kn
Now let us see how searching play a vital role in solving AI problems. Given a problem, we can generate all the
Te
possible states it can have in real time, including start state and end state. To generate solution for the same is
nothing but searching a path from start state to end state.
Problem solving agent is the one who finds the goal state from start state in optimal way by following the shortest
path, thereby saving the memory and time. It’s supposed to maximize its performance by fulfilling all the performance
measures.
Searching techniques can be used in game playing like Tic-Tac-Toe or navigation problems like Travelling Salesman
Problem.
First, we will understand the representation of given problem so that appropriate searching techniques can be applied
to solve the problem.
Q. Explain how you will formulate search problem. (Dec. 12, 3 Marks)
Given a goal to achieve; problem formulation is the process of deciding what states to be considered and what actions
to be taken to achieve the goal. This is the first step to be taken by any problem solving agent.
State space : The state space of a problem is the set of all states reachable from the initial state by executing any
sequence of actions. State is representation of all possible outcomes.
The state space specifies the relation among various problem states thereby, forming a directed network or graph in
which the nodes are states and the links between nodes represent actions.
AI&SC (MU-Sem. 7-Comp) 2-2 Problem Solving
State Space Search : Searching in a given space of states pertaining to a problem under consideration is called a state
space search.
Path : A path is a sequence of states connected by a sequence of actions, in a given state space.
ge
1. Initial state : The initial state is the one in which the agent starts in.
2. Actions : It is the set of actions that can be executed or applicable in all possible states. A description of what each
io led
action does; the formal name for this is the transition model.
3. Successor function : It is a function that returns a state on executing an action on the current state.
4. Goal test : It is a test to determine whether the current state is a goal state. In some problems the goal test can be
ic ow
carried out just by comparing current state with the defined goal state, called as explicit goal test. Whereas, in some
n
of the problems, state cannot be defined explicitly but needs to be generated by carrying out some computations, it is
bl kn
which can be compared explicitly; but in the case of chess game, the goal state cannot be predefined but it’s a
scenario called as “Checkmate”, which has to be evaluated implicitly.
5. Path cost : It is simply the cost associated with each step to be taken to reach to the goal state. To determine the cost
Te
to reach to each state, there is a cost function, which is chosen by the problem solving agent.
Problem solution : A well-defined problem with specification of initial state, goal test, successor function, and path
cost. It can be represented as a data structure and used to implement a program which can search for the goal state.
A solution to a problem is a sequence of actions chosen by the problem solving agent that leads from the initial state
to a goal state. Solution quality is measured by the path cost function.
Optimal solution : An optimal solution is the solution with least path cost among all solutions.
A general sequence followed by a simple problem solving agent is, first it formulates the problem with the goal to be
achieved, then it searches for a sequence of actions that would solve the problem, and then executes the actions one at a
time.
1 2 3 1 2 3
4 8 – 4 5 6
7 6 5 7 8 –
ge
the blank switching their positions.
4. Goal test : {{1, 2, 3},{4, 5, 6},{7, 8, 0}}
io led
5. Path cost : Number of steps to reach to the final state.
Solution :
{{1, 2, 3},{4, 8, 0},{7, 6, 5}} {{1, 2, 3},{4, 8, 5},{7, 6, 0}} {{1, 2, 3},
ic ow
{4, 8, 5},{7, 0, 6}} {{1, 2, 3},{4, 0, 5},{7, 8, 6}}{{1, 2, 3},{4, 5, 0},{7, 8, 6}} {{1, 2, 3},{4, 5, 6},{7, 8, 0}}
n
Path cost = 5 steps
bl kn
Pu ch
The problem statement as discussed in the previous section. Let’s formulate the problem first.
States : In this problem, state can be data structure having triplet (i, j, k) representing the number of missionaries,
Te
Solution :
The sequence of actions within the path :
(3,3,1) → (2,2,0)→(3,2,1) →(3,0,0) →(3,1,1) →(1,1,0) →(2,2,1) →(0,2,0) →(0,3,1) →(0,1,0) → (0,2,1) →(0,0,0)
Cost = 11 crossings
3. Vacuum-cleaner problem
States : In vacuum cleaner problem, state can be represented as [<block>, clean] or [<block>, dirty]. The agent can be
in one of the two blocks which can be either clean or dirty. Hence there are total 8 states in the vacuum cleaner world.
1. Initial State : Any state can be considered as initial state. For example, [A, dirty]
2. Actions : The possible actions for the vacuum cleaner machine are left, right, absorb, idle.
AI&SC (MU-Sem. 7-Comp) 2-4 Problem Solving
3. Successor function : Fig. 2.2.2 indicating all possible states with actions and the next state.
ge
io led
Fig. 2.2.2 : The state space for vacuum world
4. Goal Test : The aim of the vacuum cleaner is to clean both the blocks. Hence the goal test if [A, Clean]
ic ow
There are varieties of real time problems that can be formulated and solved by searching. Robot Navigation, Rout
Pu ch
Finding Problem, Traveling Salesman Problem (TSP), VLSI design problem, Automatic Assembly Sequencing, etc. are
few to name.
Te
There are number of applications for route finding algorithms. Web sites, car navigation systems that provide driving
directions, routing video streams in computer networks, military operations planning, and airline travel-planning
systems are few to name. All these systems involve detailed and complex specifications.
For now, let us consider a problem to be solved by a travel planning web site; the airline travel problem.
State : State is represented by airport location and current date and time. In order to calculate the path cost state may
also record more information about previous segments of flights, their fare bases and their status as domestic or
international.
1. Initial state : This is specified by the user's query, stating initial location, date and time.
2. Actions : Take any flight from the current location, select seat and class, leaving after the current time, leaving
enough time for within airport transfer if needed.
3. Successor function : After taking the action i.e. selecting fight, location, date, time; what is the next location date
and time reached is denoted by the successor function. The location reached is considered as the current location
and the flight's arrival time as the current time.
5. Path cost : In this case path cost is a function of monetary cost, waiting time, flight time, customs and
immigration procedures, seat quality, time of day, type of airplane, frequent-flyer mileage awards and so on.
AI&SC (MU-Sem. 7-Comp) 2-5 Problem Solving
ge
3. m : It is the maximum depth of any path in the search tree.
The connectors are the indicators of which all states are directly reachable from current state, based on the successor
function.
n
Thus the parent child relation is build and the search tree can be generated. Fig. 2.4.1 shows the representation of a
bl kn
Q. Write short note on Uniform search. (MU - Dec. 14, 2.5 Marks)
Why is it called uninformed search? What is not been informed about the search?
AI&SC (MU-Sem. 7-Comp) 2-6 Problem Solving
The term “uninformed” means they have only information about what is the start state and the end state along with
the problem definition.
These techniques can generate successor states and can distinguish a goal state from a non-goal state.
All these search techniques are distinguished by the order in which nodes are expanded.
The uninformed search techniques also called as “blind search”.
2.6.1 Concept
In depth-first search, the search tree is expanded depth wise; i.e. the deepest node in the current branch of the search
ge
tree as expanded. As the leaf node is reached, the search backtracks to previous node. The progress of the search is
illustrated in Fig.2.6.1.
The explored nodes are shown in light gray. Explored nodes with no descendants in the fringe are removed from
io led
memory. Nodes at depth three have no successors and M is the only goal node.
Process
ic ow
n
bl kn
at
Pu ch
Te
2.6.2 Implementation
DFS uses a LIFO fringe i.e. stack. The most recently generated node, which is on the top in the fringe, is chosen first for
expansion. As the node is expanded, it is dropped from the fringe and its successors are added. So when there are no more
successors to add to the fringe, the search “back tracks” to the next deepest node that is still [Link] can be
implemented in two ways, recursive and non-recursive. Following is the algorithm for the same.
2.6.3 Algorithm
ge
(i) if node is a goal node then return success;
(ii) push all children of node onto the stack;
3. return failure
io led
(b) Recursive implementation of DFS
DFS(c) :
ic ow
3. return failure;
Starting from state A execute DFS. The goal node is G. Show the order in which the nodes are expanded.
Assume that the alphabetically smaller node is expanded first to break ties. MU - May 16, 10 Marks
AI&SC (MU-Sem. 7-Comp) 2-8 Problem Solving
Soln. :
Fig. P. 2.6.1
ge
MU - May 14
The root node is expanded first, then all the successors of the root node are expanded, then their successors, and so
on.
n
bl kn
In turn, all the nodes at a particular depth in the search tree are expanded first and then the search will proceed for
the next level node expansion.
at
Thus, the shallowest unexpanded node will be chosen for expansion. The search process of BFS is illustrated in
Pu ch
Fig. 2.7.1.
2.7.2 Process
Te
2.7.3 Implementation
In BFS we use a FIFO queue for the fringe. Because of which the newly inserted nodes in the fringe will automatically
be placed after their parents.
Thus, the children nodes, which are deeper than their parents, go to the back of the queue, and old nodes, which are
shallower, get expanded first. Following is the algorithm for the same.
2.7.4 Algorithm
1. Put the root node on a queue
2. while (queue is not empty)
(a) remove a node from the queue
(i) if (node is a goal node) return success;
(ii) put all children of node onto the queue;
3. return failure;
AI&SC (MU-Sem. 7-Comp) 2-9 Problem Solving
Completeness : It is complete, provided the shallowest goal node is at some finite depth.
Optimality : It is optimal, as it always finds the shallowest solution.
d
Time complexity : O(b ), number of nodes in the fringe.
d
Space complexity : O(b ), total number of nodes explored.
Uniform cost search is a breadth first search with all paths having same cost. To make it work in real time conditions
we can have a simple extension to the basic implementation of BFS. This results in an algorithm that is optimal with
any path cost.
ge
In BFS as we always expand the shallowest node first; but in uniform cost search, instead of expanding the shallowest
node, the node with the lowest path cost will be expanded first. The implementation details are as follow.
io led
2.8.2 Implementation
Uniform cost search can be achieved by implementing the fringe as a priority queue ordered by path cost. The
ic ow
algorithm shown below is almost same as BFS; except for the use of a priority queue and the addition of an extra
check in case a shorter path to any node is discovered.
n
The algorithm takes care of nodes which are inserted in the fringe for exploration, by using a data structure having
bl kn
The priority queue used here contains total cost from root to the node. Uniform cost search gives the minimum path
Pu ch
cost the maximum priority. The algorithm using this priority queue is the following.
2.8.3 Algorithm
Te
Completeness : Completeness is guaranteed provided the cost of every step exceeds some small positive constant.
Optimality : It produces optimal solution as nodes are expanded in order of their path cost.
Time complexity : Uniform-cost search considers path costs rather than depths; so its complexity is does not merely
depends on b and d. Hence we consider C* be the cost of the optimal solution, and assume that every action costs at
C*/ €
least €. Then the algorithm's worst-case time and space complexity is O(b ), which can be much greater than bd.
C*/ €
Space complexity : O(b ), indicating number of node in memory at execution time.
In order to avoid the infinite loop condition arising in DFS, in depth limited search technique, depth-first search is
ge
carried out with a predetermined depth limit.
The nodes with the specified depth limit are treated as if they don’t have any successors. The depth limit solves the
io led
infinite-path problem.
But as the search is carried out only till certain depth in the search tree, it introduces problem of incompleteness.
ic ow
Depth-first search can be viewed as a special case of depth-limited search with depth limit equal to the depth of the
tree. The process of DLS is depicted in Fig. 2.9.1.
n
2.9.2 Process
bl kn
If depth limit is fixed to 2, DLS carries out depth first search till second level in the search tree.
at
Pu ch
Te
2.9.3 Implementation
As in case of DFS in DLS we can use the same fringe implemented as queue.
Additionally the level of each node needs to be calculated to check whether it is within the specified depth limit.
Depth-limited search can terminate with two conditions :
2.9.4 Algorithm
Determine the start node and the search depth.
Check if the current node is the goal node
AI&SC (MU-Sem. 7-Comp) 2-11 Problem Solving
If not : Do nothing
If yes : return
Check if the current node is within the specified search depth
If not : Do nothing
If yes : Expand the node and save all of its successors in a stack.
Call DLS recursively for all nodes of the stack and go back to Step 2.
ge
if (node is a goal node) return success;
for each child of node io led
{
if (DLS(child, limit, depth + 1))
return success;
ic ow
}
return failure;
n
}
bl kn
Te
l
Time complexity : Same as DFS, O (b ), where l is the specified depth limit.
l
Space complexity : Same as DFS, O(b ), where l is the specified depth limit.
Iterative deepening depth first search is a combination of BFS and DFS. In DFID search happens depth wise but, at a
time the depth limit will be incremented by one. Hence iteratively it deepens down in the search tree.
It eventually turns out to be the breadth-first search as it explores a complete layer of new nodes at each iteration
before going on to the next layer.
It does this by gradually increasing the depth limit-first 0, then 1, then 2, and so on-until a goal is found; and thus
guarantees the optimal solution. Iterative deepening combines the benefits of depth-first and breadth-first search.
ge
io led
ic ow
n
bl kn
at
Pu ch
Te
Fig. 2.10.1 shows four iterations of on a binary search tree, where the solution is found on the fourth iteration.
2.10.2 Process
ge
function 1iterative : Depending search (problem) returns a solution, or failure
for depth = 0 to do
io led
result Depth – Limited – Search (problem, depth)
Fig. 2.10.1 the iterative depending search algorithm, which repeatedly applies depth limited search with increasing
limits. It terminates when a solution is found or if the depth limited search returns failure, meaning that no solution exists.
n
bl kn
2.10.3 Implementation
at
It has exactly the same implementation as that of DLS. Additionally, iterations are required to increment the depth
Pu ch
2.10.4 Algorithm
Te
ge
d
complexity is O(b ).
d
Space complexity : Memory requirements of DFID are modest, i.e. O(b ).
io led
Note : As the performance evaluation is quite satisfactory on all the four parameters, DFID is the preferred uninformed
search method when the search space is large and the depth of the solution is not known.
ic ow
In bidirectional search, two simultaneous searches are run. One search starts from the initial state, called forward
at
Pu ch
search and the other starts from the goal state, called backward search. The search process terminates when the searches
meet at a common node of the search tree. Fig. 2.11.1 shows the general search process in bidirectional search.
Te
2.11.2 Process
2.11.3 Implementation
In Bidirectional search instead of checking for goal node, one need to check whether the fringes of the two searches
intersect; as they do, a solution has been found.
AI&SC (MU-Sem. 7-Comp) 2-15 Problem Solving
When each node is generated or selected for expansion, the check can be done. It can be implemented with a hash
table, to guarantee constant time.
For example, consider a problem which has solution at depth d= 6. If we run breadth first search in each direction,
then in the worst case the two searches meet when they have generated all of the nodes at depth 3. If b= 10.
This requires a total of 2,220 node generations, as compared with 1,111,110 for a standard breadth-first search.
Completeness : Yes, if branching factor b is finite and both directions use breadth first search.
Optimality : Yes, if all costs are identical and both directions use breadth first search.
d/2
Time complexity : Time complexity of bidirectional search using breadth-first searches in both directions is O(b ).
Space complexity : As at least one of the two fringes need to kept in memory to check for the common node, the
d/2
space complexity is O(b ).
ge
2.11.5 Pros of Bidirectional Search
6
o Suppose b = 10, d = 6. Breadth first search will examine10 = 1, 000, 000 nodes.
n
3
o Bidirectional search will examine2 × 10 = 2, 000 nodes.
bl kn
One can combine different search strategies in different directions to avail better performance.
Overhead of checking whether each new node appears in the other search is involved.
For large d, is still impractical!
For two bi-directional breadth-first searches, with branching factor b and depth of the solution d we have memory
d/2
requirement of b for each search.
Q. Compare different uniformed search strategies. (May 13, Dec. 14, 10 Marks)
Table 2.12.1 depicts the comparison of all uninformed search techniques basis on their performance evaluation. As we
know, the algorithms are evaluated on four criteria viz. completeness, optimality, time complexity and space complexity.
The notations used are as follows :
b : Branching factor
d : Depth of the shallowest solution
m : Maximum depth of the search tree
l : Depth limit
AI&SC (MU-Sem. 7-Comp) 2-16 Problem Solving
ge
1. These methods use search tree, start node and goal node as input These methods have additional information
for starting search. about the search tree nodes, along with the
io led start and goal node.
2. They use only the information from the problem definition. They incorporate additional measure of a
potential of a specific state to reach the goal.
ic ow
3. Sometimes these methods use past explorations, e.g. cost of the All these methods use a potential of a state
path generated so far. (node) to reach a goal is measured through
n
heuristic function.
bl kn
4. All unidirectional techniques are based on the pattern of All bidirectional search techniques totally
exploration of nodes in the search tree. depend on the evaluated value of each node
at
Pu ch
costly with respect to time and space. techniques are cost effective with respect to
time and space.
6. Comparatively more number of nodes will be explored in these As compared to uninformed techniques less
methods. number of nodes are explored in this case.
7. Example : Breadth First Search Example : Hill Climbing search
Depth First search Best First search, A* Search, IDA* search,
Uniform Cost search, SMA* search
Depth Limited search,
Iterative Deepening DFS
ge
In Connectivity Useful in finding spanning trees and forest.
7. BFS always provides the shallowest path solution. DFS does not guarantee the shallowest path solution.
io led
8. No backtracking is required in BFS. Backtracking is implemented in DFS.
9. BFS is optimal and complete if branching factor is DFS is neither complete nor optimal even in case of
ic ow
Informed searching techniques is a further extension of basic un-informed search techniques. The main idea is to
generate additional information about the search state space using the knowledge of problem domain, so that the
search becomes more intelligent and efficient. The evaluation function is developed for each state, which quantities
the desirability of expanding that state in order to reach the goal.
All the strategies use this evaluation function in order to select the next state under consideration, hence the name
“Informed Search”. These techniques are very much efficient with respect to time and space requirements as
compared to uninformed search techniques.
AI&SC (MU-Sem. 7-Comp) 2-18 Problem Solving
Q. Explain heuristic function with example. (Dec. 12, Dec. 14, May 15, 5 Marks)
Q. What is heuristics function ? How will you find suitable heuristic function ? Give suitable example. (Dec. 13, 10 Marks)
Q. Define heuristic function. (Dec. 15, 2 Marks)
A heuristic function is an evaluation function, to which the search state is given as input and it generates the tangible
representation of the state as output.
It maps the problem state description to measures of desirability, usually represented as number weights. The value
of a heuristic function at a given node in the search process gives a good estimate of that node being on the desired
path to solution.
It evaluates individual problem state and determines how much promising the state is. Heuristic functions are the
ge
most common way of imparting additional knowledge of the problem states to the search algorithm. Fig. 2.14.1 shows
the general representation of heuristic function.
io led
ic ow
The representation may be the approximate cost of the path from the goal node or number of hopes required to
n
reach to the goal node, etc.
bl kn
The heuristic function that we are considering in this syllabus, for a node n is, h(n) = estimated cost of the cheapest
path from the state at node n to a goal state.
at
Pu ch
Example : For the Travelling Salesman Problem, the sum of the distances traveled so far can be a simple heuristic
function.
Te
Heuristic function can be of two types depending on the problem domain. It can be a Maximization Function or
Minimization function of the path cost.
In maximization types of heuristic, greater the cost of the node, better is the node while; in case of minimization
heuristic, lower is the cost, better is the node. There are heuristics of every general applicability as well as domain
specific. The search strategies are general purpose heuristics.
It is believed that in general, a heuristic will always lead to faster and better solution, even though there is no
guarantee that it will never lead in the wrong direction in the search tree.
Design of heuristic plays a vital role in performance of search.
As the purpose of a heuristic function is to guide the search process in the most profitable path among all that are
available; a well designed heuristic functions can provides a fairly good estimate of whether a path is good or bad.
However in many problems, the cost of computing the value of a heuristic function would be more than the effort
saved in the search process. Hence generally there is a trade-off between the cost of evaluating a heuristic function
and the savings in search that the function provides.
So, are you ready to think of your own heuristic function definitions? Here is the word of caution. See how the
function definition impact.
Following are the examples demonstrate how design of heuristic function completely alters the scenario of searching
process.
AI&SC (MU-Sem. 7-Comp) 2-19 Problem Solving
ge
o h2 = the sum of the distances of the tiles from their goal positions. Because tiles cannot be moved diagonally, the
distance counted is the sum of horizontal and vertical distances. This is also known as the Manhattan Distance. In
io led
the Fig. 2.14.2, the start state has h2 = 3 + 1 + 2 + 2 + 2 + 3 + 3 + 2 = 18. Clearly, h2 is also an admissible heuristic
because any move can, at best, move one tile one step closer to the goal.
As expected, neither heuristic overestimates the true number of moves required to solve the puzzle, which is
ic ow
26 (h1+ h2). Additionally, it is easy to see from the definitions of the heuristic functions that for any given state, h2 will
always be greater than or equal to h1. Thus, we can say that h2 dominates h1.
n
2.14.2 Example of Block World Problem
bl kn
MU - Dec. 15
at
Pu ch
Q. Give an example heuristics function for block world problem. (Dec. 15, 3 Marks)
Q. Find the heuristics value for a particular state of the blocks world problem. (Dec. 15, 5 Marks)
Te
Fig. 2.14.3 depicts a block problem world, where the A, B, C,D letter bricks are piled up on one another and required
to be arranged as shown in goal state, by moving one brick at a time. As shown, the goal state with the particular
arrangement of blocks need to be attain from the given start state. Now it’s time to scratch your head and define a
heuristic function that will distinguish start state from goal state. Confused??
Let’s design a function which assigns + 1 for the brick at right position and – 1 for the one which is at wrong position.
Consider Fig. 2.14.4
ge
Fig. 2.14.5 : State evaluations using Heuristic function “h1”
io led
Fig. 2.14.5 shows the heuristic values generated by heuristic function “h1” for various different states in the state
space. Please observe that, this heuristic is generating same value for different states.
ic ow
Due to this kind of heuristic the search may end up in limitless iterations as the state showing most promising
heuristic value may not hold true or search may end up in finding an undesirable goal state as the state
n
evaluation may lead to wrong direction in the search tree.
bl kn
Let’s have another heuristic design for the same problem. Fig. 2.14.6 is depicting a new heuristic function “h2”
at
Pu ch
definition, in which the correct support structure of each brick is given +1 for each brick in the support structure.
And the one not having correct support structure, 1 for each brick in the wrong support structure.
Te
As we observe in Fig. 2.14.7, the same states are considered again as that of Fig. 2.14.5, but this time using h2,
each one of the state is assigned a unique value generate according to heuristic function h2.
Observing this example one can easily understand that, in the second part of the example, search will be carried
out smoothly as each unique state is getting a unique value assigned to it.
This example makes it clear that, the design of heuristic plays a vital role in search process, as the whole search is
carried out by considering the heuristic values as basis for selecting the next state to be explored.
The state having the most promising value to reach to the goal state will be the first prior candidate for
exploration, this continues till we find the goal state.
AI&SC (MU-Sem. 7-Comp) 2-21 Problem Solving
For each block that has the correct support structure : + 1 to every block in the support structure.
For each block that has the wrong support structure : – 1 to every block in the support structure.
This leads to a discussion of a better heuristic function definition.
ge
Is there any particular way of defining a heuristic function that will guarantee a better performance in search
process?? io led
2.14.3 Properties of Good Heuristic Function
1. It should generate a unique value for each unique state in search space.
ic ow
2. The values should be a logical indicator of the profitability of the state in order to reach the goal state.
3. It may not guarantee to find the best solution, but almost always should find a very good solution.
n
bl kn
4. It should reduce the search time; specifically for hard problems like travelling salesman problem where the time
required is exponential.
at
The main objective of a heuristic is to produce a solution in a reasonable time frame that is good enough for solving
Pu ch
the problem, as it’s an extra task added to the basic search process.
The solution produced by using heuristic may not be the best of all the actual solutions to this problem, or it may
Te
simply approximate the exact solution. But it is still valuable because finding the solution does not require a
prohibitively long time. So we are investing some amount of time in generating heuristic values for each state in
search space but reducing the total time involved in actual searching process.
Do we require to design heuristic for every problem in real world? There is a trade-off criterion for deciding whether
to use a heuristic for solving a given problem. It is as follows.
o Optimality : Does the problem require to find the optimal solution, if there exist multiple solutions for the same?
o Completeness : In case of multiple existing solution of a problem, is there a need to find all of them? As many
heuristics are only meant to find one solution.
o Accuracy and precision : Can the heuristic guarantee to find the solution within the precision limits? Is the error
bar on the solution unreasonably large?
o Execution time : Is it going to affect the time required to find the solution? Some heuristics converge faster than
others. Whereas, some are only marginally quicker than classic methods.
In many AI problems, it is often hard to measure precisely the goodness of a particular solution. But still it is important
to keep performance question in mind while designing algorithm. For real world problems, it is often useful to
introduce heuristics based on relatively unstructured knowledge. It is impossible to define this knowledge in such a
way that mathematical analysis can be performed.
AI&SC (MU-Sem. 7-Comp) 2-22 Problem Solving
ge
Fig. 2.15.1 depicts the search process of Best first search on an example search tree. The values noted below the
nodes are the estimated heuristic values of nodes.
io led
ic ow
n
bl kn
at
Pu ch
Te
2.15.2 Implementation
Best first search uses two lists in order to record the path. These are namely OPEN list and CLOSED list for
implementation purpose.
OPEN list stores nodes that have been generated, but have not examined. This is organized as a priority queue, in
which nodes are stored with the increasing order of their heuristic value, assuming we are implementing maximization
heuristic. It provides efficient selection of the current best candidate for extension.
CLOSED list stores nodes that have already been examined. This CLOSED list contains all nodes that have been
evaluated and will not be looked at again. Whenever a new node is generated, check whether it has been generated
before. If it is already visited before, check its recorded value and change the parent if this new value is better than
previous one. This will avoid any node being evaluated twice, and will never get stuck into an infinite loops.
ge
Completeness : Not complete, may follow infinite path if heuristic rates each state on such a path as the best option.
io led
Most reasonable heuristics will not cause this problem however.
Optimality : Not optimal; may not produce optimal solution always.
m
Time Complexity : Worst case time complexity is still O(b )where m is the maximum depth.
ic ow
m
Space Complexity : Since must maintain a queue of all unexpanded states, space-complexity is also O(b ).
n
2.15.5 Greedy Best First Search
bl kn
A greedy algorithm is an algorithm that follows the heuristic of making the locally optimal choice at each stage with
at
When Best First Search uses a heuristic that leads to goal node, so that nodes which seems to be more promising are
expanded first. This particular type of search is called greedy best-first search.
Te
In greedy best first search algorithm, first successor of the parent is expanded. For the successor node, check the
following :
1. If the successor node's heuristic is better than its parent, the successor is set at the front of the queue, with the
parent reinserted directly behind it, and the loop restarts.
2. Else, the successor is inserted into the queue, in a location determined by its heuristic value. The procedure will
evaluate the remaining successors, if any of the parent.
In many cases, greedy best first search may not always produce an optimal solution, but the solution will be locally
optimal, as it will be generated in comparatively less amount of time. In mathematical optimization, greedy algorithms
solve combinatorial problems.
For example, consider the traveling salesman problem, which is of a high computational complexity, works well with
greedy strategy as follows. Refer to Fig. 2.15.2. The values written on the links are the straight line distances from the
nodes. Aim is to visit all the cities A through F with the shortest distance travelled.
Let us apply a greedy strategy for this problem with a heuristic as, “At each stage visit an unvisited city nearest to the
current city”. Simple logic… isn’t it? This heuristic need not find a best solution, but terminates in a reasonable number
of steps by finding an optimal solution which typically requires unreasonably many steps. Let’s verify.
AI&SC (MU-Sem. 7-Comp) 2-24 Problem Solving
As greedy algorithm, it will always make a local optimal choice. Hence it will select node C first as it found to be the
one with less distance from the next non-visited node from node A, and then the path generated will be
ACDBEF with the total cost = 10 + 18 + 5 + 25 + 15 = 73. While by observing the graph one can find the
optimal path and optimal distance the salesman needs to travel. It turns out to be, ABDEFC where the cost
ge
comes out to be 18 + 5 + 15 + 15 + 18 = 68.
2. Optimality : It’s not optimal; as it goes on selecting a single path and never checks for other possibilities.
m
n
3. Time Complexity : O(b ), but a good heuristic can give dramatic improvement.
bl kn
m
4. Space Complexity : O(b ) , It needs to keep all nodes in memory.
at
2.16 A* Search
Pu ch
Q. Explain A* Algorithm. What is the drawback of A* ? Also shows that A* is optimally efficient. (May 13, 10 Marks)
Q. Describe A* algorithm with merits and demerits. (Dec. 13, 10 Marks)
Q. Explain A* algorithm with example. (May 14, Dec. 14, 10 Marks)
Q. Explain A* search with example. (May 15, 10 Marks)
2.16.1 Concept
A* pronounced as “Aystar” (Hart, 1972) search method is a combination of branch and bound and best first search,
combined with the dynamic programming principle.
It’s a variation of Best First search where the evaluation of a state or a node not only depends on the heuristic value of
the node but also considers its distance from the start state. It’s the most widely known form of best-first search.
A* algorithm is also called as OR graph / tree search algorithm.
In A* search, the value of a node n, represented as f(n) is a combination of g(n), which is the cost of cheapest path to
reach to the node from the root node, and h(n), which is the cost of cheapest path to reach from the node to the goal
node. Hence f(n) = g(n) + h(n).
As the heuristic can provide only the estimated cost from the node to the goal we can represent h(n) as h*(n);
similarly g*(n) can represent approximation of g(n) which is the distance from the root node observed by A* and the
algorithm A* will have,
As we observe the difference between the A* and Best first search is that; in Best first search only the heuristic
estimation of h(n) is considered while A* counts for both, the distance travelled till a particular node and the
estimation of distance need to travel more to reach to the goal node, it always finds the cheapest solution.
A reasonable thing to try first is the node with the lowest value of g*(n) +h*(n). It turns out that this strategy is more
than just reasonable, provided that the heuristic function h*(n) satisfies certain conditions which are discussed further
in the chapter. A* search is both complete and optimal.
2.16.2 Implementation
ge
1
{
i. Remove the node with the lowest value of f from OPEN to CLOSED and call it as a Best_Node.
io led
ii If Best_Node = Goal state then Found = true
iii. else
2
{
ic ow
processed */
4
{
a. Call the matched node as OLD and add it in the list of Best_Node successors.
b. Ignore the Succ node and change the parent of OLD, if required.
- If g(Succ) < g(OLD) then make parent of OLD to beBest_Node and change the values of g and f for OLD
- If g(Succ) >= g(OLD) then ignore
4
}
a. If Succ CLOSED then /* already processed */
5
{
i. Call the matched node as OLD and add it in the list of Best_Node successors.
ii. Ignore the Succ node and change the parent of OLD, if required
- If g(Succ) < g(OLD) then make parent of OLD to be Best_Node and change the values of g and f for OLD.
- Propogate the change to OLD’s children using depth first search
- If g(Succ) >= g(OLD) then do nothing
5
}
a. If Succ OPEN or CLOSED
6
{
AI&SC (MU-Sem. 7-Comp) 2-26 Problem Solving
As stated already the success of A* totally depends upon the design of heuristic function and how well it is able to
evaluate each node by estimating its distance from the goal node. Let us understand the effect of heuristic function on the
ge
execution of the algorithm and how the optimality gets affected by it.
A. Underestimation
io led
If we can guarantee that heuristic function ‘h’ never over estimates actual value from current to goal that is, the
value generated by h is the always lesser than the actual cost or actual number of hopes required to reach to the
ic ow
goal state. In this case, A* algorithm is guaranteed to find an optimal path to a goal, if one exists.
Example :
n
bl kn
f = g + h, Here h is underestimated.
at
Pu ch
Te
Fig. 2.16.1
If we consider cost of all arcs to be 1. A is expanded to B, C and D. ‘f’ values for each node is computed. B is
chosen to be expanded to E. We notice that f(E) = f(C) = 5. Suppose we resolve in favor of E, the path currently we
are expanding. E is expanded to F. Expansion of a node F is stopped as f(F)=6 so we will now expand node C.
Hence by underestimating h(B), we have wasted some effort but eventually discovered that B was farther away
than we thought. Then we go back and try another path, and will find optimal path.
AI&SC (MU-Sem. 7-Comp) 2-27 Problem Solving
B. Overestimation
Here h is overestimated that is, the value generated for each node is greater than the actual number of steps
required to reach to the goal node.
Example :
ge
io led
Fig. 2.16.2
ic ow
As shown in the example, A is expanded to B, C and D. Now B is expanded to E, E to F and F to G for a solution
path of length 4. Consider a scenario when there a direct path from D to G with a solution giving a path of
n
length [Link] path will never be found because of overestimating h(D).
bl kn
Thus, some other worse solution might be found without ever expanding D. So by overestimating h, one cannot
guarantee to find the cheaper path solution.
at
Pu ch
2.16.5 Admissibility of A*
MU - Dec. 12, May 14
Te
Q. What do you mean an admissible heuristics function ? Discuss with suitable example. (Dec. 12, 5 Marks)
Q. Write short note on admissibility of A*. (May 14, 5 Marks)
A search algorithm is admissible, if for any graph, it always terminates in an optimal path from initial state to goal
state, if path exists. A heuristic is admissible if it never overestimates the actual cost from current state to goal state.
Alternatively, we can say that A* always terminates with the optimal path in case h(n) is an admissible heuristic
function.
A heuristic h(n) is admissible if for every node n, if h(n) ≤ h*(n), where h*(n) is the true cost to reach the goal state
from n. An admissible heuristic never overestimates the cost to reach the goal. Admissible heuristics are by nature
optimistic because they think the cost of solving the problem is less than it actually is.
An obvious example of an admissible heuristic is the straight line distance. Straight line distance is admissible because
the shortest path between any two points is a straight line, so the straight line cannot overestimate the actual road
distance.
o Theorem : If h(n) is admissible, tree search using A* is optimal.
o Proof : Optimality of A*with admissible heuristic.
Suppose some suboptimal goal G2 has been generated and is in the fringe. Let n be an unexpanded node in the fringe
such that n is on a shortest path to an optimal goal G.
AI&SC (MU-Sem. 7-Comp) 2-28 Problem Solving
ge
h(n) ≤ h*(n) since h is admissible
g(n) + h(n) ≤ g(n) + h*(n)
io led
f(n) ≤ f(G)
Hence f(G2) > f(n), and A* will never select G2 for expansion.
ic ow
2.16.6 Monotonicity
MU -May 16
n
bl kn
2.16.7 Properties of A*
ge
io led
ic ow
n
bl kn
at
Pu ch
Te
h (X) = the number of tiles not in their goal position in a given state X
g(X) = depth of node X in the search tree
For Initial node f(initial_node) = 4
Ex. 2.16.1 : Consider the graph given in Fig. P.2.16.1 below. Assume that the initial state is S and the goal state is 7. Find
a path from the initial state to the goal state using A* Search. Also report the solution cost. The straight line
distance heuristic estimates for the nodes are as follows : h(1) = 14, h(2) = 10. h(3) = 8, h(4) = 12,
h(5) = 10, h(6) = 10, h(S) = 15. MU - Dec. 15, 5 Marks
AI&SC (MU-Sem. 7-Comp) 2-30 Problem Solving
Fig. P. 2.16.1
Soln. :
ge
io led
Open : 4(12 + 4), 1(14 + 3)
Closed : S(15)
ic ow
ge
Open : 3(8 +11), 6(10 + 10) Open : 6(10 + 10)
Closed : S(15),4(12+4),5(10+6),
io led Closed : s(15), 4(12 + 4), 5(10 + 6),
1(14+ 3),2(10+7) 1(14 + 3) 2(10 + 7) 3(8 + 11)
ic ow
n
bl kn
at
Pu ch
Te
2.16.9 Caparison among Best First Search, A* search and Greedy Best First Search
MU - May 16
Q. Compare following informed searching algorithms based on performance measure with justification : Complete,
Optimal, Time complexity and space complexity.
(a)Greedy best first (b) A* (c) recursive best-first (RBFS) (May 16, 10 Marks)
ge
io led
ic ow
n
bl kn
at
Pu ch
Te
Hill climbing is simply a combination of depth first with generate and test where a feedback is used here to decide on
the direction of motion in the search space.
Hill climbing technique is used widely in artificial intelligence, to solve solving computationally hard problems, which
has multiple possible solutions.
AI&SC (MU-Sem. 7-Comp) 2-33 Problem Solving
In the depth-first search, the test function will merely accept or reject a solution. But in hill climbing the test function
is provided with a heuristic function which provides an estimate of how close a given state is to goal state.
In Hill climbing, each state is provided with the additional information needed to find the solution, i.e. the heuristic
value. The algorithm is memory efficient since it does not maintain the complete search tree. Rather, it looks only at
the current state and immediate level states.
For example, if you want to find a mall from your current location. There are n possible paths with different directions
to reach to the mall. The heuristic function will just give you the distance of each path which is reaching to the mall, so
that it becomes very simple and time efficient for you to reach to the mall.
Hill climbing attempts to iteratively improve the current state by means of an evaluation function. “Consider all the
possible states laid out on the surface of a landscape. The height of any point on the landscape corresponds to the
evaluation function of the state at that point” (Russell and Norvig, 2003). Fig. 2.17.1 depicts the typical hill climbing
scenario, where multiple paths are available to reach to the hill top from ground level.
ge
io led
ic ow
n
Fig. 2.17.1 : Hill Climbing Scenario
bl kn
Hill climbing always attempts to make changes that improve the current state. In other words, hill climbing can only
at
Hill climbing is a type of local search technique. It is relatively simple to implement. In many cases where state space is
of moderate size, hill climbing works even better than many advanced techniques.
Te
For example, hill climbing when applied to travelling salesman problem; initially it produces random combinations of
solutions having all the cities visited. Then it selects the better rout by switching the order, which visits all the cities in
minimum cost.
There are two variations of hill climbing as discussed follow.
It is the simplest way to implement hill climbing. Following is the algorithm for simple hill climbing technique. Overall
the procedure looks similar to that of generate and test but, the main difference between the two is use of heuristic
function for state evaluation which is used in hill climbing. The goodness of any state is decided by the heuristic value of
that state. It can be either incremental heuristic or detrimental one.
Algorithm
1. Evaluate the initial state. If it is a goal state, then return and quit; otherwise make it a current state and go to Step 2.
2. Loop until a solution is found or there are no new operators left to be applied (i.e. no new children nodes left to be
explored).
a. Select and apply a new operator (i.e. generate new child node)
b. Evaluate the new state :
(i) If it is a goal state, then return and quit.
AI&SC (MU-Sem. 7-Comp) 2-34 Problem Solving
(ii) If it is better than current state then make it a new current state.
(iii) If it is not better than the current state then continue the loop, go to Step 2.
As we study the algorithm, we observe that in every pass the first node / state that is better than the current state is
considered for further exploration. This strategy may not guarantee that most optimal solution to the problem, but may
save upon the execution time.
As the name suggests, steepest hill climbing always finds the steepest path to hill top. It does so by selecting the best
node among all children of the current node / state. All the states are evaluated using heuristic function. Obviously, the
time requirement of this strategy is more as compared to the previous one. The algorithm for steepest ascent hill climbing
is as follows.
Algorithm
1. Evaluate the initial state, if it is a goal state, return and quit; otherwise make it as a current state.
ge
2. Loop until a solution is found or a complete iteration produces no change to current state :
a. SUCC = a state such that any possible successor of the current state will be better than SUCC.
io led
b. For each operator that applies to the current state, evaluate the new state :
(i) If it is goal; then return and quit
(ii) If it is better than SUCC then set SUCC to this state.
ic ow
c. SUCC is better than the current state set the current state to SUCC.
As we compare simple hill climbing with steepest ascent, we find that there is a tradeoff for the time requirement and
n
the accuracy or optimality of the solution.
bl kn
In case of simple hill climbing technique as we go for first better successor, the time is saved as all the successors are
at
not evaluated but it may lead to more number of nodes and branches getting explored, in turn the solution found may
Pu ch
further expansion, it involves more time in evaluating all the successors at earlier stages, but the solution found will be
always the optimal solution, as only the states leading to hill top are explored. This also makes it clear that the
evaluation function i.e. the heuristic function definition plays a vital role in deciding the performance of the algorithm.
Now let’s see what can be the impact of incorrect design of heuristic function on the hill climbing techniques.
Following are the problems that may arise in hill climbing strategy. Sometimes the algorithms may lead to a position,
which is not a solution, but from which there is no move possible which will lead to a better place on hill i.e. no further
state that is going closer to the solution. This will happen if we have reached one of the following three states.
1. Local Maximum : A “local maximum” is a location in hill which is at height from other parts of the hill but is not
the actual hill top. In the search tree, it is a state better than all its neighbors, but there is not next better state
which can be chosen for further expansion. Local maximum sometimes occur within sight of a solution. In such
cases they are called “Foothills”.
AI&SC (MU-Sem. 7-Comp) 2-35 Problem Solving
ge
io led
ic ow
the search space, plateau situation occurs when all the neighbouring states have the same value. On a plateau, it
is not possible to determine the best direction in which to move by making local comparisons.
at
Pu ch
Te
Fig. 2.17.5
3. Ridge : A "ridge" is an area in the hill such that, it is higher than the surrounding areas, but there is no further
uphill path from ridge. In the search tree it is the situation, where all successors are either of same value or
lesser, it’s a ridge condition. The suitable successor cannot be searched in a simple move.
Fig. 2.17.7
Fig. 2.17.8 depicts all the different situations together in hill climbing.
ge
io led
ic ow
In order to overcome these problems we can try following techniques. At times combination of two techniques will
provide a better solution.
at
Pu ch
1. A good way to deal with local maximum, we can back track to some earlier nodes and try a different direction.
2. In case of plateau and ridges, make a big jump in some direction to a new area in the search. This can be done by
applying two more rules of the same rule several times, before testing. This is a good strategy is dealing with
Te
The idea is to use simulated annealing to search for feasible solutions and converge to an optimal solution. In order to
achieve that, at the beginning of the process, some downhill moves may be made. These downhill moves are made
purposely, to do enough exploration of the whole space early on, so that the final solution is relatively insensitive to
the starting state. It reduces the chances of getting caught at a local maximum, or plateau, or a ridge.
Algorithm
1. Evaluate the initial state.
2. Loop until a solution is found or there are no new operators left to be applied :
Set T according to an annealing schedule
Select and apply a new operator
Evaluate the new state :
goal quit
E = Val(current state) Val(new state)
ge
E < 0 new current state
-E/kT
else new current state with probability e .
io led
We observe in the algorithm that, if the next state is better than the current, it readily accepts it as a new current
state. But in case when the next state is not having the desirable value even then it accepts that state with some
where E is the positive change in the energy level, T is temperature and k is Boltzmann’s constant.
-E/kT
probability, e
ic ow
Thus, in the simulated annealing there are very less chances of large uphill moves than the small one. Also, the
n
probability of uphill moves decreases with the temperature decrease. Hence uphill moves are more likely in the
bl kn
beginning of the annealing process, when the temperature is high. As the cooling process starts, temperature comes
down, in turn the uphill moves. Downhill moves are allowed any time in the whole process. In this way, comparatively
at
very small upward moves are allowed till finally, the process converges to a local minimum configuration, i.e. the
Pu ch
A hill climbing algorithm never makes “downhill” moves toward states with lower value and it can be incomplete,
because it can get stuck on a local maximum.
In contrast, a purely random walk, i.e. moving to a successor chosen at random from the set of success or s
independent of whether it is better than the current state, is complete but extremely inefficient. Therefore, it is
reasonable to try a combination of hill climbing with a random walk in some way that yields both efficiency and
completeness. Simulated annealing is the answer…!!
As we know that, hill climbing can get stuck at local minima or maxima, thereby halting the algorithm abruptly, it may
not guarantee optimal solution. Few attempts were made to solve this problem by trying hill climbing considering
multiple start points or by increasing the size of neighbourhood, but none worked out to produce satisfactory results.
Simulated annealing has solved the problem by performing some downhill moves at the beginning of search so that,
local maximum can be avoided at later stage.
Hill climbing procedure chooses the best state from those available or at least better than the current state for further
expansion. Unlike hill climbing, simulated annealing chooses a random move from the neighbourhood. If the successor
state turned out to be better than its current state then simulated annealing will accept it for further expansion. If the
successor state is worse, then it will be accepted based on some probability.
AI&SC (MU-Sem. 7-Comp) 2-38 Problem Solving
In all the variations of hill climbing till now, we have considered only one node getting selected at a time for further
search process. These algorithms are memory efficient in that sense. But when an unfruitful branch gets explored
even for some amount of time it is a complete waste of time and memory. Also the solution produce may not be the
optimal one.
The local beam search algorithm keeps track of k best states by performing parallel k searches. At each step it
generates successor nodes and selects k best nodes for next level of search. Thus rather than focusing on only one
branch it concentrates on k paths which seems to be promising. If any of the successors found to be the goal, search
process stops.
In parallel local beam search, the parallel threads communicate to each other, hence useful information is passed
among the parallel search threads.
ge
In turn, the states that generate the best successors say to the others, “Come over here, the grass is greener!” The
algorithm quickly terminates unfruitful branches exploration and moves its resources to where the path seems most
io led
promising. In stochastic beam search the maintained successor states are chosen with a probability based on their
goodness.
ic ow
OPEN list;
Step 4 : While (Found = false and not able to proceed further)
Te
{
Sort OPEN list;
Select top W elements from OPEN list and put it in W_OPEN list and empty OPEN list;
While (W_OPEN ≠ and Found = false)
{
Get NODE from W_OPEN;
if NODE = Goal state then Found = true else
{
Find SUCCs of NODE, if any with its estimated cost
store in OPEN list;
}
} // end inner while
} // end outer while
Step 5 : If Found = true then return Yes otherwise return No and Stop
AI&SC (MU-Sem. 7-Comp) 2-39 Problem Solving
ge
As shown in Fig. 2.17.9, here k = 2, hence two better successors are selected for expansion at first level of search and
at each next level, two better successors will be selected by both searches. They do exchange their information with
each other throughout the search process. The search will continue till goal state is found or no further search is
io led
possible.
It may seem to be that local beam search is same as running k searches in parallel. But it is not so. In case of parallel
searches, all search run independent of each other. While in case of local beam search, the parallel running threads
ic ow
continuously coordinate with one another to decide the fruitful region of the search tree.
n
Local beam search can suffer from a lack of diversity among the k states by quickly concentrating to small region of
bl kn
and pitfalls for the prediction of the performance on independent test data. Unlike most other learning systems that
have been previously discussed, there are far more choices to be made in applying the gradient descent method.
The key variations of these choices are : The learning rate and local minima - the selection of a learning rate is of
Te
critical importance in finding the true global minimum of the error distance.
Back propagation training with too small a learning rate will make agonizingly slow progress. Too large a learning rate
will proceed much faster, but may simply produce oscillations between relatively poor solutions.
Both of these conditions are generally detectable through experimentation and sampling of results after a fixed
number of training epochs.
Typical values for the learning rate parameter are numbers between 0 and 1 : 0.05 < h < 0.75.
One would like to use the largest learning rate that still converges to the minimum solution momentum - empirical
evidence shows that the use of a term called momentum in the back propagation algorithm can be helpful in speeding
the convergence and avoiding local minima.
The idea about using a momentum is to stabilize the weight change by making non radical revisions using a
combination of the gradient decreasing term with a fraction of the previous weight change :
Δ w(t) = – ∂Ee/∂w(t) + α Δ w(t – 1)
where a is taken 0£ a £ 0.9, and t is the index of the current weight change.
This gives the system a certain amount of inertia since the weight vector will tend to continue moving in the same
direction unless opposed by the gradient term.
The momentum has the following effects :
AI&SC (MU-Sem. 7-Comp) 2-40 Problem Solving
o It smooths the weight changes and suppresses cross-stitching, that is cancels side-to-side oscillations across the
error valey;
o When all weight changes are all in the same direction the momentum amplifies the learning rate causing a faster
convergence;
o Enables to escape from small local minima on the error surface.
The hope is that the momentum will allow a larger learning rate and that this will speed convergence and avoid local
minima. On the other hand, a learning rate of 1 with no momentum will be much faster when no problem with local
minima or non-convergence is encountered ;· sequential or random presentation - the epoch is the fundamental unit
for training, and the length of training often is measured in terms of epochs. During a training epoch with revision
after a particular example, the examples can be presented in the same sequential order, or the examples could be
presented in a different random order for each epoch. The random representation usually yields better results.
The randomness has advantages and disadvantages :
o Advantages : It gives the algorithm some stochastic search properties. The weight state tends to jitter around its
ge
equilibrium, and may visit occasionally nearby points. Thus it may escape trapping in suboptimal weight
configurations. The on-line learning may have a better chance of finding a global minimum than the true gradient
io led
descent technique.
o Disadvantages : The weight vector never settles to a stable configuration. Having found a good minimum it may
then continue to wander around it.
ic ow
Random initial state - unlike many other learning systems, the neural network begin in a random state. The network
weights are initialized to some choice of random numbers with a range typically between -0.5 and 0.5 (The inputs are
n
usually normalized to numbers between 0 and 1 ). Even with identical learning conditions, the random initial weights
bl kn
can lead to results that differ from one training session to another.
The training sessions may be repeated till getting the best results.
at
Pu ch
Gas are adaptive heuristic search algorithms based on the evolutionary ideas of natural selection and genetics. As such
they represent an intelligent exploitation of a random search used to solve optimization problems. Although
randomized, Gas are by no means random, instead they exploit historical information to direct the search for better
performance within the search space. The basic techniques of the Gas are designed to simulate processes in natural
systems necessary for evolution, especially those following the principles of “survival of the fittest” laid down by
Charles Darwin.
Genetic algorithms are implemented as a computer simulation in which a population of abstract representations
(called chromosomes or the genotype or the genome) of candidate solutions (called individuals, creatures, or
phenotypes) to an optimization problem evolves towards better solutions. The solutions are represented in binary as
strings of 0s and 1s, but other encodings are also possible. The evolution usually starts from a population of randomly
generated individual and occurs in generations. In each generation, the fitness of every individual in the population is
evaluated, multiple individuals are stochastically selected from the current population (based on their fitness), and
modified to form a new population. The new population is then used in the next iteration of the algorithm.
AI&SC (MU-Sem. 7-Comp) 2-41 Problem Solving
2.17.4(A) Terminologies of GA
Gene
Gene is the smallest unit in genetic algorithm. The gene represents the smallest unit of information in the problem
domain and can be thought of as the basic building block for a possible solution. If the problem context were, for
example, the creation of a well-balanced investment portfolio, a gene might represent the number of shares of a
particular security to purchase.
Chromosome
Chromosome is a series of genes that represent the components of one possible solution to the problem. The
chromosome is represented in computer memory as a bit string of binary digits that can be “decoded” by the genetic
algorithm to determine how good a particular chromosome’s gene pool solution is for a given problem. The decoding
process simply informs the genetic algorithm what the various genes within the chromosome represent.
Encoding
ge
Encoding of chromosomes is one of the problems, to start solving problem with GA. Encoding depends on the type of
the problem. There are various types of encoding techniques like binary encoding, permutation encoding, value
io led
encoding, etc.
Population
ic ow
A population is a pool of individuals (chromosomes) that will be sampled for selection and evaluation. The
performance of each individual will be computed and a new population will be reproduced using standard genetic
n
operators.
bl kn
Reproduction
Reproduction is the process of creating new individuals called off-springs from the parents population. This new
at
Pu ch
population will be evaluated again to select the desired results. Reproduction is done basically using two genetic
operators : crossover and mutation. However, the genetic operators used can vary from model to model, there are a
Te
few standard or canonical operators : crossover and recombination of genetic material contained in different parent
chromosomes, random mutation of data in individual chromosomes, and domain specific operations, such as
migration of genes.
Mutation
Mutation is another refinement step that randomly changes the value of a gene from its current setting to a
completely different one. The majority of the mutations formed by this process are, as is often the case in nature, less
fit than more so. Occasionally, however, a highly superior and beneficial mutation will occur. Mutation provides the
genetic algorithm with the opportunity to create chromosomes and information genes that can explore previously
uncharted areas of the solution space, thus increasing the chances for the discovery of an optimal solution. There are
various types of mutation techniques like bit inversion, order changing, value encoding.
1. [Start] Generate random population of n chromosomes (suitable solutions for the problem)
2. [Fitness] Evaluate the fitness f(x) of each chromosome x in the population
3. [New population] Create a new population by repeating following steps until the new population is complete
ge
4. [Selection] Select two parent chromosomes from a population according to their fitness (the better fitness, the bigger
chance to be selected) io led
5. [Crossover] With a crossover probability cross over the parents to form a new offspring (children). If no crossover was
performed, offspring is an exact copy of parents.
6. [Mutation] With a mutation probability mutate new offspring at each locus (position in chromosome).
ic ow
9. [Test] If the end condition is satisfied, stop, and return the best solution in current population
10. [Loop] Go to step 2
at
Pu ch
Let’s consider following pair of chromosomes encoded using permutation encoding technique and are undergoing the
complete process of GA. Assuming that they are selected using rank selection method and will be applied, arithmetic
crossover and value encoding mutation techniques.
a. Parent chromosome
Chromosome A 153264798
Chromosome B 856723149
Child chromosome after arithmetic crossover : i.e adding bits of both chromosomes.
b. Child chromosome
After applying value encoding mutation i.e. adding or subtracting a small value to selected values. E.g. subtracting 1
rd th
to 3 and 4 bit.
c. Child chromosome
It can be observed that the child produced is much better than both parents.
AI&SC (MU-Sem. 7-Comp) 2-43 Problem Solving
ge
io led
ic ow
There can be two types of environments in case of multi-agent : Competitive and cooperative.
1. Competitive environment
at
Pu ch
In this type of environment every agent makes an effort to win the game by defeating or by creating superior it
over other agents who are also trying to win the game.
Te
2. Cooperative environment
In this type of environment all the agents jointly perform activities in order to achieve same goal.
Car driving agent is an example of a cooperative environment.
Under artificial intelligence category, there are few special features as shown in Table 2.18.1, which make the game
more interesting.
Table 2.18.1 : AI game features
ge
amount of time to take an action.
Unpredictable In AI games action of opponent agent is fuzzy which Card game with
io led
opponent makes the game challenging and unpredictable. multiplayer
Players are called unpredictable when the next step
depends upon an input set which is generated by the
ic ow
other player.
n
2.18.2(A) Zero Sum Game
bl kn
“Zero sum game” concept is associated with payoffs which are assigned to each player when the instance of the game
at
Pu ch
is over. It is a mathematical representation of circumstances when the game is in a neutral state. (i.e. agents winning
or losing is even handed with the winning and losing of other agents).
For example, if player 1 wins chess game it is marked with say +1 point and at the same time the loss of player 2 is
Te
marked with – 1 point, thus sum is zero. Another condition is when game is draw; in that case players 1 and 2 are
marked with zero points. (Here +1, – 1 and 0 are called payoffs).
Non-zero sum game’s don’t have algebraic sum of payoffs as zero. In this type of games one player winning the game
does not necessarily mean that the other player has lost the game.
There are two types of non-zero sum games :
1. Positive sum game
2. Negative sum game
It is also called as cooperative game. Here, all players have same goal and they contribute together to play the game.
For example, educational games.
It is also called as competitive game. Here, every player has a different goal so no one really wins the game,
everybody loses. Real world example of a war suits the most.
AI&SC (MU-Sem. 7-Comp) 2-45 Problem Solving
To understand game playing, we will first take look at all appropriate aspects of a game which give overview of the
stages in a game play. See Fig. 2.19.1.
Accessible environments : Games with accessible environments have all the necessary information handy. For
example : Chess.
Search : Also there are games which require search functionality which illustrates how players have to search through
possible game positions to play a game. For example : minesweeper, battleships.
Unpredictable opponent : In AI games opponents can be unpredictable, this introduces uncertainty in game play and
thus game-playing has to deal with contingency/ probability problems. For example : Scrabble.
ge
io led
ic ow
n
bl kn
Fig. 2.20.1shows examples of two main varieties of problems faced in artificial intelligence games. First type is “Toy
Te
Game play follows some strategies in order to mathematically analyse the game and generate possible outcomes. A
two player strategy table can be seen in Table 2.20.1.
AI&SC (MU-Sem. 7-Comp) 2-46 Problem Solving
…
ge
attention to the single payoff function of Player I, which we call here A.
The strategic form, or normal form, of a two-person zero-sum game is given by a triplet (X, Y,A), where
io led
(1) X is a nonempty set, the set of strategies of Player I
(2) Y is a nonempty set, the set of strategies of Player II
(3) A is a real-valued function defined on X ×Y. (Thus, A(x, y) is a real number for every x ∈ X and every y ∈ Y .)
ic ow
The interpretation is as follows. Simultaneously, Player I chooses x ∈ X and Player II chooses y ∈ Y , each unaware of
n
the choice of the other. Then their choices are made known and I wins the amount A(x, y) from II.
bl kn
This is a very simple definition of a game; yet it is broad enough to encompass the finite combinatorial games and
games such as tic-tac-toe and chess.
at
Pu ch
On the basis of how many times Player I or Player II is winning the game, following strategies can be discussed.
1. Equalizing Strategy : A strategy that produces the same average winnings no matter what the opponent does is called
Te
an equalizing strategy.
2. Optimal Strategy : If I has a procedure that guarantees him at least A(x, y) amount on the average, and II has a
procedure that keeps her average loss to at most A(x, y). Then A(x, y) is called the value of the game, and the
procedure each uses to insure this return is called an optimal strategy or a minimax strategy.
3. Pure Strategies and Mixed Strategies : It is useful to make a distinction between a pure strategy and a mixed strategy.
We refer to elements of X or Y as pure strategies. The more complex entity that chooses among the pure strategies at
random in various proportions is called a mixed strategy.
Game can be classified under deterministic or probabilistic category. Let’s see what we mean by deterministic and
Probabilistic.
(a) Deterministic
It is a fully observable environment. When there are two agents playing the game alternatively and the final
results of the game are equal and opposite then the game is called deterministic.
Take example of tic-tac-toe where two players play a game alternatively and when one player wins a game then
other player losses game.
AI&SC (MU-Sem. 7-Comp) 2-47 Problem Solving
(b) Probabilistic
Probabilistic is also called as non-deterministic type. It is opposite of deterministic games, where you can have
multiple players and you cannot determine the next action of the player.
You can only predict the probability of the next action. To understand probabilistic type you can take example of
card games.
Another way of classification for games can be based on exact/perfect information or based on inexact /
approximate information. Now, let us understand these terms.
1. Exact/perfect information : Games in which all the actions are known to other player is called as game of
exact or perfect information. For example tic-tac-toe or board games like chess, checkers.
2. Inexact / approximate information : Game in which all the actions are not known to other players (or actions
are unpredictable) is called game of inexact or approximate information. In this type of game, player’s next
action depends upon who played last, who won last hand, etc. For example card games like hearts.
Consider following games and see how they are classified into various types of games based on the parameters which
ge
we have learnt in above sections :
io led Table 2.20.2 : Types of game
Now, let us try to learn about few games mention in Table 2.20.2.
2.20.2(A) Chess
at
Pu ch
Chess comes under deterministic and exact/perfect information category. This game is a two person, zero-sum game.
Te
In chess both players can see chess board positions so there is no secrecy and players don’t play at the same time they
play one after the other in an order.
Thus this game has perfect environment to test artificial intelligence techniques. In 1990’s a computer Deep Blue II
defeated Garry Kasparov who was world champion at that time. This example is given to understand how artificial
intelligence can be used in decision making.
Fig. 2.20.2
AI&SC (MU-Sem. 7-Comp) 2-48 Problem Solving
ge
2.20.2(B) Checkers io led
Checkers comes under deterministic and exact/perfect information category. This game is a two person game where
both players can see board positions, so there is no secrecy and players play one after the other in an order.
In 1990’s a computer program name Chinook (also called draughts) was developed which defeated human world
ic ow
ge
single next move is shown.
Possible moves are represented with the help of lines. Last level shows terminal board position, which illustrates end
io led
of the game instance, Here, we get zero as sum of all the play offs. Terminal state gives winning board position. Utility
indicates the play off points gained by the player (– 1, 0, + 1).
In a similar way we can draw a game tree for any artificial intelligence based games.
ic ow
n
bl kn
at
Pu ch
Te
Terminal state Which indicates that all instance of game are over.
Utility It displays a number which indicates if the game was won or lost or it was draw.
From Tic-Tac-Toe game's example you can understand that for a 3 3 grid, two player game, where the game tree is
relatively small (It has 9! terminal nodes), still we cannot draw it completely on one single page.
Imagine how difficult it would be to create a game tree for multi-player games or for the games with bigger grid size.
Many games have huge search space complexity. Games have limitation over the time and the amount of memory
space it can consume. Finding an optimal solution is not feasible most of the times, so there is a need for
approximation.
Therefore there is a need for an algorithm which will reduce the tree size and eventually will help in reducing the
processing time and in saving memory space of the machine.
One method is pruning where, only the required parts, which improve quality of output, of the tree are kept and
reaming parts are removed.
ge
Another method is heuristic method (it makes use of an evaluation function) it does not require exhaustive research.
This method depends upon readily available information which can be used to control problem solving.
io led
2.21 MiniMax Algorithm
ic ow
Minimax algorithm evaluates decision based on the present status of the game. This algorithm needs deterministic
environment with perfect/exact information.
n
Minimax algorithm directly implements the defining equation. Every time based on the successor state minimax value
bl kn
Take example of tic-tac toe game to understand minimax algorithm. We will take a random stage.
Step 1 : Create an entire game tree including all the terminal states.
Fig. 2.21.1
AI&SC (MU-Sem. 7-Comp) 2-51 Problem Solving
ge
Fig. 2.21.2
io led
Next action : 'O'
ic ow
n
bl kn
at
Pu ch
Te
Fig. 2.21.3
Step 2 : For every terminal state find out utility (playoff points gained by every terminal state). Terminal position where
1 means win and 0 means draw.
AI&SC (MU-Sem. 7-Comp) 2-52 Problem Solving
ge
io led
Fig. 2.21.4
ic ow
Step 3 : Apply MIN and MAX operators on the nodes of the present stage and propagate the utility values upward in
the three.
n
bl kn
at
Pu ch
Te
Fig. 2.21.5
Step 4 : With the max (of the min) utility value (payoff value) select the action at the root node using minimax decision.
AI&SC (MU-Sem. 7-Comp) 2-53 Problem Solving
ge
io led
ic ow
Fig. 2.21.6
n
bl kn
at
Pu ch
Te
Fig. 2.21.7
(In case of Steps 2 and 3 we are assuming that the opponent will play perfectly as per our expectation)
AI&SC (MU-Sem. 7-Comp) 2-54 Problem Solving
m
Time complexity of minimax algorithm is indicated as O(b ).
m
Space complexity of minimax algorithm is indicated by O(b )(using depth-first exploration approach).
Pruning means cutting off. In game search it resembles to clipping a branch in the search tree, probably which is not
ge
so fruitful.
io led
At any choice point along the path for max, α is considered as the value of the best possible choice found i.e.,
highest-value. For each “X”, if "X" is worse i.e. lesser value than α value then, MAX will avoid it. Similarly we can define
β value for MIN.
ic ow
α-β pruning is an extension to minimax algorithm where, decision making process need not consider each and every
n
node of the game tree.
bl kn
Only the important nodes for quality output are considered in decision making. Pruning helps in making the search
at
more efficient.
Pu ch
Pruning keeps only those parts of the tree which contribute in improving the quality of the result remaining parts of
Te
Fig. 2.22.1
= Max(4, C, 2) (C <=3)=4
Let us see how to check this step by step.
AI&SC (MU-Sem. 7-Comp) 2-55 Problem Solving
ge
io led
ic ow
n
bl kn
at
Pu ch
Te
AI&SC (MU-Sem. 7-Comp) 2-56 Problem Solving
ge
io led
ic ow
n
bl kn
at
Pu ch
Te
So in this example we have pruned 2 β and 0 α branches. As the tree is very small, you may not appreciate the effect
of branch pruning; but as we consider any real game tree, pruning creates a significant impact on search as far as the
time and space is concern.
Ex. 2.22.1 : Explain Min-Max and Alpha Beta pruning algorithm with following example.
ge
io led
ic ow
(b)
n
bl kn
at
Pu ch
Te
(d)
(c)
(e)
(f)
Fig. P. 2.22.1
AI&SC (MU-Sem. 7-Comp) 2-58 Problem Solving
ge
Ex. 2.22.2 : Perform - cutoff on the following.
io led
ic ow
n
bl kn
at
Fig. P. 2.22.2
Pu ch
Soln. :
Te
Fig. P. 2.22.2(a)
No. of - cuts = 1
No. of - cuts = 2
Ex. 2.22.3 : Apply alpha-beta pruning on example given in Fig. P.2.22.3 considering first node as max.
Fig. P.2.22.3
Soln. :
ge
io led
ic ow
n
bl kn
at
Pu ch
Te
AI&SC (MU-Sem. 7-Comp) 2-60 Problem Solving
ge
m/2
If there is exact/perfect ordering then we can get time complexity as O(b ).
Depth of search is doubled with pruning.
io led
Review Questions
Q. 9 Write algorithm of steepest ascent hill climbing. And compare it with simple hill climbing.
Q. 10 What are the limitations of hill climbing? How can we solve them?
Q. 11 Write algorithm for Best first search and specify its properties.
Q. 12 What is the difference between best first and greedy best first search? Explain with example.
Q. 16 How the definition of heuristic affects the search process? explain with suitable example.
Q. 19 Explain SMA* algorithm with example. When should we choose SMA* given options?
AI&SC (MU-Sem. 7-Comp) 2-61 Problem Solving
Q. 22 Give α-β pruning algorithm with an example and it's properties, also explain why is it called α-β pruning.
Q. 24 Apply alpha-beta pruning on example given in Fig. Q. 24 considering first node as max.
ge
io led
ic ow
n
bl kn
Fig. Q. 24
at
Pu ch
Te
3 Knowledge, Reasoning and Planning
Unit III
Syllabus
3.2 First order logic: syntax and Semantic, Knowledge Engineering in FOL Inference in FOL : Unification, Forward
Chaining, Backward Chaining and Resolution
3.3 Planning Agent, Types of Planning: Partial Order, Hierarchical Order, Conditional Order
ge
Understanding theoretical or practical aspects of a subject is called as knowledge. We can gain knowledge through
io led
experience acquired based on the facts, information, etc. about the subject.
After gaining knowledge about some subject we can apply that knowledge to derive conclusions about various
problems related to that subject based on some reasoning.
ic ow
We have studied various types of agents in chapter 1. In this chapter we are going to see what is “knowledge based
n
agent”, with a very interesting game example.
bl kn
We are also going to study how do they store knowledge, how do they infer next level of knowledge from the existing
set. In turn, we are studying various knowledge representation and inference methods in this chapter.
at
Pu ch
As shown Fig. 3.1.1, a knowledge based agents can be described at different levels : Knowledge Base (KB) and an
Inference Engine.
1. Knowledge level:
Knowledge level is a base level of an agent, which consists of domain-specific content.
In this level agent has facts/information about the surrounding environment in which they are working, it does
not consider the actual implementation.
2. Implementation level:
Implementation level consists of domain independent algorithms. At this level, agents can recognize the data
structures used in knowledge base and algorithms which use them. For example, propositional logic and
resolution. (We will be learning about logic and resolution in this chapter)
Knowledge based agents are crucial to use in partially observable environments. Before choosing any action,
knowledge based agents make use of the existing knowledge along with the current inputs from the environment
in order to infer hidden aspects of the current state.
AI&SC (MU-Sem. 7-Comp) 3-2 Knowledge, Reasoning and Planning
As we have learnt that knowledge base is a set of representations of facts/information about the surrounding
environment (real world). Every single representation in the set is called as a sentence and sentences are
expresses with the help of formal representation language. We can say that sentence is a statement which is a
set of words that express some truth about the real world with the help of knowledge representation language.
Declarative approach of building an agent makes use of TELL and ASK mechanism.
o TELL the agent, about surrounding environment (what it needs to know in order to perform some action).
TELL mechanism is similar to taking input for a system.
o Then the agent can ASK itself what action should be carried out to get desired output. ASK mechanism is
similar to producing output for a system. However, ASK mechanism makes use of the knowledge base to
decide what it should do.
TELL and ASK mechanism involve inference. When you run ASK function, the answer is generated with the help of
knowledge base, based on the knowledge which was added with TELL function previously.
o TELL(K) :Is a function that adds knowledge K to the knowledge base.
o ASK(K) :Is a function that queries the agent about the truth of K.
ge
An agent carries out following operations: First, it TELLs the knowledge base about facts/information it perceives
with the help of sensors. Then, it ASKs the knowledge base what action should be carried out based on the input
io led
it has received. Lastly, it performs the selected action with the help of effectors.
Knowledge based agents can be implemented at three levels namely, knowledge level, logical level and
implementation level.
n
1. Knowledge level 2. Logical level
bl kn
3. Implementation level
at
Pu ch
1. Knowledge level :
Te
It is the most abstract level of agent implementation. The knowledge level describes agent by saying what it
knows. That is what knowledge the agent has as the initial knowledge.
Basic data structures and procedures to access that knowledge are defined in his level. Initial knowledge of
knowledge base is called as background knowledge.
Agents at the knowledge level can be viewed as an agent for which one only need to specify what the agent
knows and what its goals are in order to specify its behaviour, regardless of how it is to be implemented.
For example :A taxi driving agent might know that the Golden Gate Bridge connects San Francisco with the Marin
county.
2. Logical level :
At the logical level, the knowledge is encoded into sentences. This level uses some formal language to represent
the knowledge the agent has. The two types of representations we have are propositional logic and first order or
predicate logic.
Both these representation techniques are discussed in detail in the further sections.
For example: Links(Golden Gate Bridge, San Francisco, Marin County)
3. Implementation level :
In implementation level, the physical representation of logical level sentences is done. This level also describes
data structures used in knowledge base and algorithms that used for data manipulation.
AI&SC (MU-Sem. 7-Comp) 3-3 Knowledge, Reasoning and Planning
Fig. 3.1.2 is the general implementation of knowledge based agent. TELL and ASK are the sub procedures
implemented to perform the respective actions.
ge
The knowledge base agent must be able to perform following tasks :
o Represent states, actions, etc.
io led
o Incorporate new precepts.
o Update internal representations of the world.
ic ow
1972/1973. It was originally written in BASIC (Beginner's All-purpose Symbolic Instruction Code).
WUMPUS is a map-based game. Let's understand the game :
o WUMPUS world is like a cave, which represents number of rooms, rooms, which are connected by passage ways.
We will take a 4 4 grid to understand the game.
o WUMPUS is a monster who lives in one of the rooms of the cave. WUMPUS eats the player (agent) if player
(agent) comes in the same room. Fig. 3.2.1 shows that room (3, 1) where WUMPUS is staying.
o Player (agent) starts from any random position in cave and has to explore the cave. We are starting from (1, 1)
position.
There are various sprites in the game like pit, stench, breeze, gold, and arrow. Every sprite has some feature. Let's
understand this one-by-one :
o Few rooms have bottomless pits, which trap the player (agent) if he comes to that room. You can see in the
Fig. 3.2.1 that room (1,3), (3,3) and (4,4) have bottomless pit. Note that even WUMPUS can fall into a pit.
o Stench experienced in a room, which has a WUMPUS in its neighbourhood room. See the Fig. 3.2.1, here room
(2,1), (3,2) and (4,1) have Stench.
o Breeze is experienced in a room, which has a pit in its neighbourhood room. Fig. 3.2.1 shows that room (1,2),
(1,4), (2,3), (3,2), (3,4) and (4,3) consists of Breeze.
o Player (Agent) has arrows and he can shoot these arrows in straight line to kill WUMPUS.
AI&SC (MU-Sem. 7-Comp) 3-4 Knowledge, Reasoning and Planning
o One of the rooms consists of gold, this room glitters. Fig.3.2.1 shows that room (3,2) has Gold.
Apart from above features player (agent) can accept two types of percepts which are: Bump and scream. A bump is
generated if player (agent) walks into a wall. While a sad scream created everywhere in the cave when the WUMPUS
is killed.
ge
io led
ic ow
An agent receives percepts while exploring the rooms of cave. Every percepts can be represented with the help of five
at
element list, which is [stench, breeze, glitter, bump, scream]. Here player (agent) cannot perceive its own location.
Pu ch
If the player (agent) gets percept as [Stench, Breeze, None, None, None]. Then it means that there is a stench and a
breeze, but no glitter, no bump, and no scream in the WUMPUS world at that position in the game.
Te
Let's take a look at the actions which can be performed by the player(agent) in WUMPUS World :
o Move : To move in forward direction,
o Turn : To turn right by 90 degrees or left by 90 degrees,
o Grab : To pick up gold if it is in the same room as the player(agent),
o Shoot : To Shoot an arrow in a straight line in the direction faced by the player (agent).
These actions are repeated till the player (agent) kills the WUMPUS or if the player (agent) is killed. If the WUMPUS is
killed then it is a winning condition, else if the player(agent) is killed then it is a losing condition and the game is over.
Game developer can keep a restriction on the number of arrows which can be used by the player(agent). So if we
allow agent to have only one arrow, then only the first shoot action will have some effect. If this shoot action kills the
WUMPUS then you win the game, otherwise it reduces the probability of winning the game.
Lastly there is a die action : It takes places automatically if the agent enters in a room with a bottomless pit or in a
room with WUMPUS. Die action is irreversible.
Main aim of the game is that player (agent) should grab the gold and return to starting room (here its (1,1)) without
being killed by the monster (WUMPUS).
AI&SC (MU-Sem. 7-Comp) 3-5 Knowledge, Reasoning and Planning
Award and punishment points are assigned to a player (Agent) based on the actions it performs. Points can be given
as follows:
o 100 points are awarded if player (agent) comes out of the cave with the gold.
o 1 point is taken away for every action taken.
o 10 points are taken away if the arrow is used.
o 200 points are taken away if the player (agent) gets killed.
Q. Give PEAS descriptors for WUMPUS world (May 13, Dec. 14, 3 Marks)
1. Performance measure
ge
+100 for grabbing the gold and coming back to the starting position,
– 200 if the player(agent) is killed.
io led
– 1 per action,
– 10 for using the arrow.
ic ow
2. Environment
n
Empty Rooms.
bl kn
Let's try to understand the WUMPUS world problem in step by step manner. Keep Fig. 3.2.2 as a reference figure.
B – Breeze
3.1 3.2 3.3 3.4 G - Glitter, Gold
OK – Safe square
P - Pit
2.1 2.2 2.3 2.4
S – Stench
V – Visited
1.1 1.2 1.3 1.4 W –Wumpus
A OK OK
ge
Fig. 3.2.2(a) : WUMPUS world with player in room (1,1)
The knowledge base initially contains only the rules (facts) of the WUMPUS world environment.
io led
Step 1 : Initially the player(agent) is in the room (1,1). See Fig. 3.2.2(a).
The first percept received by the player is [none, none, none, none, none]. (remember percept consists of
ic ow
A - Agent
4.1 4.2 4.3 4.4
at
Pu ch
B – Breeze
Te
OK – Safe square
2.1 2.2 2.3 2.4
P - Pit
As room (1,1) is visited you can see “V” mark in that room. The player receives following percept :[none, breeze, none,
none, none].
As breeze percept is received room (1,2) is marked with “B” and it can be predicted that there is a bottomless pit in
the neighboring room.
You can see that room (1,3) and room (2,2) is marked with “P?”. So room (1,3) and (2,2) is not safe to move in. Thus
player should return to room (1,1)and try to find other, safe room to move to.
AI&SC (MU-Sem. 7-Comp) 3-7 Knowledge, Reasoning and Planning
Step 3 :
ge
Fig. 3.2.2(c) WUMPUS world with player moving back to room (1,1) and then moves to other safe room (2,1).
As seen in Fig.3.2.2(c). Player in now in room (2,1), where it receives a percept as follows : [stench, none, none, none,
io led
none] which means that there is a WUMPUS in neighboring room (i.e. either room (2,2) or (3,1) has WUMPUS).
As we did not get breeze percept in this room, we can understand that room (2,2) cannot have any pit and from step 2
we can understand that room (2,2) cannot have WUMPUS because room (1,2) did not show stench percept.
ic ow
can understand that room (2,3) and room (3,2) are safe to move in.
at
Pu ch
Te
Step 5: Let's move to room (3,2). Here, player receives [stench, breeze, glitter, none, none] percept. See Fig. 3.2.2(e).
Field 1 of the percept shows that room (3,1), (3,3) and (4,2) can have WUMPUS. Field 2 of the percept shows
that room (3,1), (3,3), (2,2) and (4,2) can have bottomless pit. Field 3 of the percept shows that room (3,2) has
gold. So, the player grabs the gold first. As the aim of this game is to grab the gold and go back to the starting
position, without being killed by the WUMPUS.
AI&SC (MU-Sem. 7-Comp) 3-8 Knowledge, Reasoning and Planning
ge
Fig. 3.2.2(e) : WUMPUS world with player moving to room (3,2)
Now, we have to go back to the starting position i.e. room (1,1) without getting killed by WUMPUS. From steps 1,
io led
2, 3 and 4 We know that room (1,1), (1,2), (2,1) and (2,2) are safe rooms. so, we can go back to room (1,1) by
following any of the two paths: i.e. (2,2), (2,1), (1,1) or (2,2), (1,2), (1,1).
Step 6: As can be seen in Fig. 3.2.2(f). We will go from room (2,2) to room (2,1) and from room (2,1) to room (1,1).
ic ow
Fig.3.2.2(f) : WUMPUS world with player moving back to room (1,1) with gold
3.3 Logic
Logic can be called as reasoning which is carried out or it is a review based on strict rule of validity to perform a
specified task.
In case of intelligent systems we say that any of logic's particular form cannot bind logical representation and
reasoning, they are independent of any particular form of logic.
Make a note that logic is beneficial only if the knowledge is represented in small extent and when knowledge is
represented in large quantity the logic is not considered valuable.
Fig. 3.3.1 depicts that sentences are physical configurations of an agent, also it is shows that sentences need sentence.
This means that reasoning is a process of forming new physical configurations from old ones.
AI&SC (MU-Sem. 7-Comp) 3-9 Knowledge, Reasoning and Planning
Logical reasoning should make sure that the new configurations represent features of the world that actually follow
the features of the world that the old configurations represented
ge
and most fundamental type of logic is propositional logic.
io led
ic ow
n
bl kn
Propositional logic can be considered at fuzzy logic level, where rules are values between range of 0 and 1. Next level
is also called as probabilistic logic level using which first order predicate logic is implemented.
Te
In this Fig. 3.3.2, there are two more levels above higher order logic which are multi-valued and non-monotonic logic
levels and they consist of modal logic and temporal logic respectively. All these types of logic are basic building blocks
of intelligent systems and they all use reasoning in order to represent sentences. Hence reasoning plays a very
important role in AI.
Q. Explain various methods of knowledge representation with example. (May 13, Dec. 14, May 15, 10 Marks)
ge
Sentence Truth Value Proposition
2. First Order Predicate Logic : These are much more expressive and make use of variables, constants, predicates,
n
functions and quantifiers along with the connectives explained already in previous section.
bl kn
3. Higher Order PredicateLogic : Higher order predicate logic is distinguished from first order predicate logic by
using additional quantifiers and stronger semantics.
at
Pu ch
4. Fuzzy Logic : These indicate the existence of in between TRUE and FALSE or fuzziness in all logics.
5. Other Logic : These include multiple valued logic, modal logics and temporal logics.
Te
One of the widest used methods to represent knowledge is to use production rules, it is also known as IF-THEN rules.
Syntax :
IF condition THEN action
IF premise THEN conclusion
IF proposition p1 and proposition p2 are TRUE
THEN proposition p3 is TRUE
Example :
ge
o Cats love Milk.
o All mammals are animals.
io led
ic ow
n
bl kn
at
Pu ch
Te
Fig. 3.4.1
Conceptual Graph : It is a recent scheme used for semantic network, introduced by John Sowa, has a finite,
connected, bipartite graph. The nodes represent either concepts or conceptual relations. It differs from the previous
method that it does not use labelled arcs. For example: Ram, Laxman and Bharat are Brothers or cat color is grey can
be represented as shown.
Fig. 3.4.2
o ( Ram )
o ( PROFESSION (VALUE professor)
o (AGE(VALUE 50))
o (WIFE(VALUE sita))
o (CHILDREN(VALUE luv kush))
o (ADDRESS (STREET(VALUE 4C gb road)))
o CITY(VALUE banaras))
o (STATE(VALUE mh))
o (ZIP(VALUE400615))
3.4.1 Ontology
Ontology is study about what kind of things or entities exist in the universe. In AI, ontology is the specification of
ge
conceptualizations, used to help programs and humans to share knowledge about a particular domain. In turn,
ontology is a set of concepts, like entity, relationships among entities, events that are expressed in a uniform way in
order to create a vocabulary for information exchange.
io led
An ontology should also enable a person to verify what a symbol means. That is, given a concept, they want to be able
to find the symbol, and, given the symbol, they want to be able to determine what it means.
ic ow
Typically, it specifies what types of individuals will be modelled, specifies what properties will be used, and gives some
axioms that restrict the use of that vocabulary. Ontologies are usually written independently of a particular
n
application and often involve a community to agree on the meanings of symbols
bl kn
For example: Consider a map showing hotels, railway station, buildings, schools, hospitals in a particular locality. In
this map the symbols used to indicate these entities are enough to describe them. Hence the community who knows
at
Pu ch
the meaning of these symbols can easily recognize it. Hence that becomes ontology of that map. In this ontology, it
may define a building as human-constructed artifacts.
Te
It may give some restriction on the size of buildings so that shoeboxes cannot be buildings or that cities cannot be
buildings. It may also state that a building cannot be at two geographically dispersed locations at the same time.
3.5.1 Syntax
Connective symbol Name of the Connective Relationship between Name of the Relationship
symbol Propositional symbols between Propositional symbols
Or AB Disjunction
Not ¬A Negation
ge
Implies AB Implication/ conditional
AB
io led
is equivalent/ if and only if Biconditional
To define logical connectives truth tables are used. Truth table 3.5.2 shows five logical connectives.
ic ow
Table 3.5.2
Take an example, where A B, i.e. Find the value of A B where A is true and B is false. Third row of the Table 3.5.2
shows this condition, now see third row of the third column where, A B shows result as false. Similarly other logical
connectives can be mapped in the truth table.
3.5.2 Semantics
World is set of facts which we want to represent to form propositional logic. In order to represent these facts
propositional symbols can be used where each propositional symbol's interpretation can be mapped to the real world
feature.
Semantics of a sentence is meaning of a sentence. Semantics determine the interpretation of a sentence. For
example :You can define semantics of each propositional symbol in following manner:
1. A means “It is hot”
2. B means “It is humid”, etc.
Sentence is considered true when its interpretation in the real world is true. Every sentence results from a finite
number of usages of the rules. For example, if A and B are sentences then (A B), (A B), (B A) and (A ↔ B) are
sentences. The knowledge base is a set of sentences as we have seen in previous section.
Thus we can say that real world is a model of the knowledge base when the knowledge base is true for that world. In
other words a model can be thought of as a truth assignment to the symbols.
AI&SC (MU-Sem. 7-Comp) 3-14 Knowledge, Reasoning and Planning
If truth values of all symbols in a sentence are given then it can be evaluated for determining its truth value (i.e. we
can say if it is true or false).
3.5.3 What is Propositional Logic ?
A B and BA should have same meaning but in natural language words and sentences may have different meanings.
Say for an example,
1. Radha started feeling feverish and Radha went to the doctor.
2. Radha went to the doctor and Radha started feeling feverish.
Here, sentence 1 and sentence 2 have different meanings.
In artificial intelligence propositional logic is a relationship between the truth value of one statement to that of the
truth value of other statement.
3.5.4 PL Sentence - Example
ge
Take example of a weather problem.
Semantics of each propositional symbol can be defined as follows:
io led
o Symbol A is a sentence, which means “It is hot”.
o Symbol B is a sentence, which means “It is humid”.
ic ow
o HM for“It is humid”.
at
o RN for“It is raining”.
Pu ch
If you have HMHT, then that means “If it is humid, then it is hot”.
If you have(HT HM) RN then it means “If it is hot and humid, then it is raining” and so on.
Te
First we have to create the possible mode ls for a knowledge base. To do this we need to consider all the possible
assignments of true or false values for Sentence A, B and C. Then verify the truth table for the validity. There can be
total 8 possibilities as shown below:
Now, if the knowledge base is [HM, HMHT, (HTHM) RN](i.e. [“It is humid”, “If it is humid, then it is hot” , “If it is
hot and humid, then it is raining”] ), then “True -True - True” is the only possible valid model.
AI&SC (MU-Sem. 7-Comp) 3-15 Knowledge, Reasoning and Planning
ge
knowledge base.
Table 3.5.3 : Inference Rules
io led
Inference Rules Premise (KB) Conclusion
Modus Ponens X, XY Y
ic ow
Transposition
Pu ch
1. Sound inference
Soundness property of inference says that, if “X is derived from the knowledge base” using given set of protocols
of inference, then “X is entailed by knowledge base”. Soundness property can be represented as : “If KB |- X
then KB |= X”.
For Modus Ponens (MP) rule we assume that knowledge base has [A, A B], from this we can conclude that
knowledge base can have B. See following truth table :
A B A→B Valid?
TRUE TRUE TRUE Yes
TRUE FALSE FALSE Yes
FALSE TRUE TRUE Yes
FALSE FALSE TRUE Yes
AI&SC (MU-Sem. 7-Comp) 3-16 Knowledge, Reasoning and Planning
In general,
For atomic sentences pi, pi', and q, where there is a substitution Θ such that
SUBST (Θ,pi) = SUBST (Θ, pi') for all i,
p1 ,p2 , p3 ,...pn , (p1 p2 p3... ... .. pn q)
SUBST ( , q)
N + 1 premises = N atomic sentences + one implication.
Example :
A : It is rainy.
B : I will stay at home.
A→B : If it is rainy, I will stay at home.
ge
Modus Tollens
When B is known to be false, and if there is a rule “if A, then B,” it is valid to conclude that A is also false.
io led
2. Complete inference
Complete inference is converse of soundness. Completeness property of inference says that, if “X is entailed by
ic ow
knowledge base” then “X can be derived from the knowledge base” using the inference protocols.
Completeness property can be represented as: “ If KB |= Q then KB |- Q”.
n
3.5.6 Horn Clause
bl kn
Clauses are generally written as sets of literals. Horn clause is also called as horn sentence. In a horn clause a
at
Pu ch
conjunction of 0 or more symbols is to the left of ““ and 0 or 1 symbols to the right. See following formula :
A1A2A3 ... AnBm where n>=0 and m is in range{0,1}
Te
There can be following special cases in horn clause in the above mentioned formula :
o For n=0 and m=1: A (This condition shows that assert A is true)
o For n>0 and m=0: A B (This constraint shows that both A and B cannot be true)
o For n=0 and m=0: (This condition shows empty clause)
Conjunctive normal form is a conjunction of clauses and by its set of clauses it is determined up to equivalence. For a
horn clause conjunctive normal form can be used where, each sentence is a disjunction of literals with at most one
non-negative literal as shown in the following formula : A1A2A3 ... AnB
This can also be represented as : (A B)= (A B)
From the above equation, we entail if the query atom is false. Equation shows that there are clauses which state that
true X and true Y, so we can assign X and Y to true value (i.e. true X Y).
Then we can say that all premises of XYZ are true, based on this information we can assign Z to true. After that we
can see all premises of ZW are true, so we can assign W to true.
As now all premises of ZW false are true, from this we can entail that the query atom is false. Therefore, the horn
formula is not satisfiable.
ge
o HM for “It is humid”.
o RN for “It is raining”.
io led
1. HM Premise (initial sentence) “It’s humid”
4. (HT HM)RN Premise(initial sentence) “If it’s hot and humid, it’s raining”
at
Pu ch
o PL cannot directly represent properties of individual entities or relations between individual entities. For
example, Pooja is tall.
o PL cannot express specialization, generalizations, or patterns, etc. For example: All rectangles have 4 sides.
Because of the inadequacy of PL discussed above there was a need for more expressive type of logic. Thus First-Order
Logic (FOL) was developed. FOL is more expressive than PL, it can represent information using relations, variables and
quantifiers, e.g., which was not possible with propositional logic.
o “Gorilla is Black” can be represented as :
Gorilla(x) →Black(x)
ge
o “It is Sunday today” can be represented as :
today(Sunday)
io led
First Order Logic(FOL)is also called as First Order Predicate Logic (FOPL). Since FOPL is much more expressive as a
knowledge representation language than PL it is more commonly used in artificial intelligence.
Assuming that “X” is a domain of values, we can define a term with following rules :
1. Constant term: It is a term with fixed value which belongs to the domain.
at
Pu ch
All the terms are generated by applying the above three protocols.
First order predicate logic makes use of propositional logic as a base logic, so the connectives used in PL and FOPL are
common. Hence, it also supports conjunction, disjunction, negation, implication and double implication.
Ground Term: If a term does not have any variables it is called as a ground term. A sentence in which all the variables
are quantified is called as a “well-formed formula”.
o Every ground term is mapped with an object.
o Every condition (predicate) is mapped to a relation.
o A ground atom is considered as true if the predicate’s relation holds between the terms’ objects.
o Rules in FOL:In predicate logic rule has two parts predecessor and successor. If the predecessor is evaluated to
TRUE successor will be true. It uses the implication symbol. Rule represents If-then types of sentences.
Example: The sentence “If the bag is of blue colour, I will buy it.” Will be represented as colour (bag, blue)buy(bag).
Quantifiers
Apart from these connectives FOPL makes use of quantifiers. As the name suggests they quantify the number of
variables taking part in the relation or obeying the rule.
AI&SC (MU-Sem. 7-Comp) 3-19 Knowledge, Reasoning and Planning
2. Existential Quantifier ‘ ∃ ’
Pronounced as “there exists”
“∃x A” means A is true for at least one replacement of x.
Example: “There is a white dog” can be represented as,
x (Dog(X) ^ white(X))
Note :
ge
1. Typically, is the main connective with
Example:“Everyone at MU is smart” is represented as
io led
∀x At(x, MU) smart(x)
2. Typically, is the main connective with
ic ow
ge
initial state, and by looking at the premises of the rules (IF-part), perform the actions (THEN-part), possibly updating
the knowledge base or working memory. This continues until no more rules can be applied or some cycle limit is met.
io led
For example, “If it is raining then, we will take umbrella”. Here, “it is raining” is the data and “we will take umbrella” is
a decision. This means it was already known that it's raining that’s why it was decided to take umbrella. This process is
forward chaining.
ic ow
n
bl kn
at
Given :
Rule : human(A) mortal(A)
Data : human(Mandela)
To prove : mortal(Mandela)
Forward Chaining Solution
Human (Mandela) matches Left Hand Side of the Rule. So, we can get A = Mandela
based on the rule statement we can get : mortal(Mandela)
Forward chaining is used by the “design expert systems”, as it performs operation in a forward direction (i.e. from
start to the end).
Example
Consider following example. Let us understand how the same example can be solved using both forward.
Given facts are as follows:
1. It is a crime for an American to sell weapons to the enemy of America.
2. Country Nono is an enemy of America.
3. Nono has some missiles.
AI&SC (MU-Sem. 7-Comp) 3-21 Knowledge, Reasoning and Planning
ge
4. All the missiles were sold to Nono by Colonel West.
Missile(x)Λ owns(Nono, x)=> Sell(West, x, Nono)
io led
5. Missile is a weapon.
Missile(x)=> weapon(x)
ic ow
The proof will start from the given facts. And as we can derive other facts from those, it will lead us to the solution.
at
Pu ch
Please refer to Fig. 3.8.2 As we observe from the given facts we can reach to the predicate Criminal (West).
Te
If based on the decision the initial data is fetched, then it is called as backward chaining. Backward chaining or goal-
driven inference works towards a final state, and by looking at the working memory to see if goal already there. If not
look at the actions (THEN-parts) of rules that will establish goal, and set up sub-goals for achieving premises of the
rules (IF-part).This continues until some rule can be applied, apply to achieve goal state.
AI&SC (MU-Sem. 7-Comp) 3-22 Knowledge, Reasoning and Planning
For example, If while going out one has taken umbrella. Then based on this decision it can be guessed that it is raining.
Here, “taking umbrella” is a decision based on which the data is generated that “it's raining”. This process is backward
chaining.“Backward chaining” is called as a decision-driven or goal-driven inference technique.
Given :
o Rule : human(A) mortal(A)
o Data : human(Mandela)
To prove : mortal(Mandela)
ge
Backward Chaining Solution
mortal (Mandela) will be matched with mortal (A) which gives human (A) i.e. human (Mandela) which is also a given
io led
fact. Hence proved.
It makes use of right hand side matching. backward chaining is used by the “diagnostic expert systems”, because it
ic ow
The proof will start from the fact to be proved. And as we can map it with given facts, it will lead us to the solution.
Pu ch
Please refer to Fig. 3.8.4. As we observe, all leaf nodes of the proof are given facts that means “West is Criminal”.
Te
Appropriate for Diagnostic, prescription and debugging application Planning, monitoring, control and
interpretation application
Reasoning Top-down reasoning Bottom-up reasoning
Type of Search Depth-first search Breadth-first search
Who determine Consequents determine search Antecedents determine search
search
Flow Consequent to antecedent Antecedent to consequent
ge
Ex. 3.8.1 : Using predicate logic find the course of Anish’s liking for the following :
(i) Anish only likes easy courses.
(ii) Computer courses are hard.
io led
(iii) All electronics courses are easy
(iv) DSP is an electronics course.
Soln. :
ic ow
As we have to find out which course Anish likes. So we will start the proof from the same fact.
Fig. P. 3.8.1
ge
of knowledge acquisition. In this the knowledge engineer needs to extract the domain knowledge either by himself
provided he is the domain expert or needs to work with the real experts of the domain. In this process knowledge
io led
engineer learns how the domain actually works and can determine the scope of the knowledgebase as per the
identified tasks.
3. Defining vocabulary: Defining a complete vocabulary including predicates, functions and constants is a very important
ic ow
step of knowledge engineering. This process transforms the domain level concepts to logic level symbols. It should be
n
exhaustive and precise. This vocabulary is called as ontology of the domain. And once the ontology is defined, it
bl kn
means, the existence of the domain is defined. That is, what kind of things exist in the domain is been decided.
4. Encoding of general knowledge about the domain: In this step the knowledge engineer defines axioms for all the
at
Pu ch
vocabulary terms by define meaning of each term. This enables expert to cross check the vocabulary and the contents.
If he finds any misinterpretations or gaps, it can be fixed at this point by redoing step 3.
Te
5. Encode the problem: In this step, the specific problem instance is encoded using the defined ontology. This step will
be very easy if the ontology is defined properly. Encoding means writing atomic sentences about problem instances
which are already part of ontology. It can be analogues to input data for a computer program.
6. Query the Knowledgebase: Once all the above steps are done, all input for the system is set and now is a time to
generate some output from the system. So, in order to get some interested facts inferred from the provided
knowledge, we can query the knowledgebase. The inference procedure will operate on the axioms and facts to derive
the new inferences. This lessens the task of a programmer to write application specific programs.
7. Debug the knowledgebase: This is the step in which one can prove or check the toughness of the knowledgebase. In
the sense, if the inference procedure is able to give appropriate answers to all the queries asked or it stops in between
because of the incomplete axioms; will be easily identified by debugging process. If one observes the reasoning chain
stopping in between or some of the queries could not be answered then, it is an indication of a missing or a weak
axioms. Then the corrective measures can be taken by repeating the required steps and system can be claimed to
have a complete and precise knowledgebase.
Procedure Unify(t1,t2)
Inputs
t1,t2: atoms Output
most general unifier of t1 and t2 if it exists or ⊥ otherwise
Local
E: a set of equality statements
S: substitution
E ←{t1=t2}
S={}
while (E≠{})
select and remove x=y from E
if (y is not identical to x) then
if (x is a variable) then
replace x with y everywhere in E and S
S←{x/y}∪S
ge
else if (y is a variable) then
replace y with x everywhere in E and S
io led
S←{y/x}∪S
else if (x is f(x1,...,xn) and y is f(y1,...,yn)) then
E←E∪{x1=y1,...,xn=yn}
ic ow
else
return ⊥
n
bl kn
return S
Unification algorithm for Data log
at
Pu ch
King(Ram)
Te
Brave(Ram)
We get an ‘x’ where, ‘x’ is a king and ‘x’ is brave (Then x is noble) then ideally what we want is Θ= {substitution set}
i.e. Θ= {x/ Ram}
3.10.2 Lifting
ge
(iv) There is a barber who shaves all men in town who do not save themselves.
Soln. : io led
(i) x y : person (x) policy (y) buys (x, y) smart (x)
3.11 Resolution
Resolution is a valid inference rule. Resolution produces a new clause which is implied by two clauses containing
complementary literals. This resolution rule was discovered by Alan Robinson in the mid 1960's.
We have seen that a literal is an atomic symbol or a negation of the atomic symbol (i.e.A, ¬A).
Resolution is the only interference rule you need, in order to build a sound (soundness means that every sentence
produced by a procedure will be “true”) and complete (completeness means every “true” sentence can be produced
by a procedure) theorem proof maker.
Take an example where we are given that :
o A clause X containing the literal : Z
o A clause Y containing the literal : ¬Z
Based on resolution and the information given above we can conclude :
ge
(X – {Z}) U (Y – {¬Z})
Take a generalized version of the above problem :
io led
Given:
Let knowledge base be a set of true sentences which do not have any contradictions, and Z be a sentence that we
Te
want to prove.
The Idea is based on the proof by negation. So, we should assume ¬Z and then try to find a contradiction (You must
have followed such methods while solving geometry proofs). Then based on the Intuition that, if all the knowledge
base sentences are true, and assuming ¬Z creates a contradiction then Z must be inferred from knowledge base. Then
we need to convert knowledge base U {¬Z} to clause form.
If there is a contradiction in knowledge base, that means Z is proved. Terminate the process after that.
Otherwise select two clauses and add their resolvents to the current knowledge base. If we do not find any resolvable
clauses then the procedure fails and then we terminate. Else, we have to start finding if there is a contradiction in
knowledge base, and so on.
Q. Explain the steps involved in converting the propositional logic statement into CNF with a suitable example.
(May 16, 10 Marks)
ge
FOL :
Converting to CNF.
A ( B C) A ( C B)
o Our goal is to show that X always wins with the help of resolution.
1. H Win(X)
2. T Loose(Y)
3. H T
4. Loose(Y) Win(X)
1. {H, Win(X)}
2. {T,Loose(Y)}
3. {H,T}
4. {Loose(Y), Win(X)}
AI&SC (MU-Sem. 7-Comp) 3-29 Knowledge, Reasoning and Planning
5. { Win(X)}
6. {T, Win(X)} ….. (From 2 and 4)
3.11.4 Example
Let’s take the same example of forward and backward chaining to learn how to write proofs for resolution.
Step 1 :
ge
American(x) Λ Weapon(y) Λ sell (x,y,z) Λ enemy(z, America) => Criminal (x)
io led
2. Country Nono is an enemy of America.
o Owns (Nono, x)
n
bl kn
o Missile(x)
5. Missile is a weapon.
Te
Missile(x)=> weapon(x)
American (West)
Step 2 :
3. Owns (Nono, x)
4. Missile(x)
6. ~Missile(x) V weapon(x)
7. American (West)
AI&SC (MU-Sem. 7-Comp) 3-30 Knowledge, Reasoning and Planning
Step 3 :
To prove that West is criminal using resolution.
criminal (West) American (x) weapon (y) sell (x, y, z)
enemy (z, America) criminal (x)
x/West
American (West) weapon (y) enemy (Nono, America)
sell (West, y, z) enemy (z, America)
z/Nono
ge
weapon (y) sell (West, y, Nono) Missile (x) owns (Nono, x)
io led Sell (West, x, Nono)
NIL
Hence our assumption was wrong. Hence proved that West is criminal.
Ex. 3.11.2 : Consider following statements :
(a) Ravi Likes all kind of food.
(b) Apple and Chicken are food
(c) Anything anyone eats and is not killed is food.
(d) Ajay eats peanuts and still alive.
(e) Rita eats everything that Ajay eats.
Prove that Ravi Likes Peanuts using resolution. What food does Rita eat?
Soln. :
(A) Proof by Resolution
Step 1 : Negate the statement to be proved.
(a)
– x, food (x) likes (Ravi, x)
(b) food (Apple)
AI&SC (MU-Sem. 7-Comp) 3-31 Knowledge, Reasoning and Planning
(d)
–x
– y : eats (x, y) killed (x) food (y)
(e) eats (Ajay, Peanuts) alive (Ajay)
(f)
– x : eats (Ajay, x) eats (Rita, x)
In this case we have to add few common sense predicate which are always true.
(g)
– x : killed (x) alive (x)
(h)
– x : alive (x) killed (x)
Step 3 : Converting FOLs to CNF
ge
(b) food (Apple) io led
(c) food (Chicken)
As the result of this resolution is NIL, it means out assumption is wrong. Hence proved that “Ravi likes Peanuts”.
AI&SC (MU-Sem. 7-Comp) 3-32 Knowledge, Reasoning and Planning
ge
not jump” is false.
(a) Ram went to temple.
(b) The way to temple is, walk till post box and take left or right road.
io led
(c) The left road has a ditch.
(d) Way to cross the ditch is to jump
(e) A log is across the right road.
ic ow
jump (Ram)
at
Pu ch
(b1)
– x : At (x, temple) At (x, PostBox) take left (x)
(b2)
– x : At (x, temple) At (x, PostBox) take right (x)
(c)
– x: take left (x) cross (x, ditch)
(d)
– x cross (x, ditch) jump (x)
(e)
– x : take right (x) at (x, log)
(f)
– x : at (x, log) jump (x)
Step 3 : Converting FOLs to CNF
x/Ram
ge
x/Ram
at
Hence proved.
Pu ch
Barks (Rimi)
Barks (Rimi)
hungry (Rimi)
hungry (Rimi)
hungry
(Rimi)
ge
io led
This shows that our assumption is Wrong. Hence proved that Raja is Angry.
Ex. 3.11.5 : Consider following facts.
1. If maid stole the jewellery then butler was not guilty.
ic ow
To prove that
ge
Hence proved.
io led
Ex.3.11.6 : Consider following axioms.
All people who are graduating are happy.
All happy people smile.
ic ow
Someone is graduating.
n
(i) Represent these axioms in FOL.
bl kn
Soln. :
Te
Proof by resolution
AI&SC (MU-Sem. 7-Comp) 3-36 Knowledge, Reasoning and Planning
happy (x1)
smile (x3)
smile (R1)
x3 | x1
happy (x1)
graduating (x)
happy(x)
x1 | x
graduating (x) graduating | (x2)
x | x2
ge
Hence our assumption is wrong.
io led
Hence proved.
3.12 Planning
ic ow
Planning various tasks is a part of day to day activities in real world. Say, you have tests of two different subjects on
n
one day then, you will plan your study timetable as per your strengths and weaknesses in those two subjects.
bl kn
Also you must have learnt about various scheduling algorithms (e.g. first in first out) in operating systems subject and
how printers plan/schedule their printing tasks based on tasks importance while printing.
at
Pu ch
These examples illustrate how planning is important. We have seen that artificially intelligent systems are rational
systems. So, devising a plan to perform actions becomes a part of creating an artificially intelligent agent. When we
think about giving intelligence to a system or device, we have to make sure that it priorities between given activities or
Te
tasks.
In this we are going to learn about how the machines can become more intelligent by using planning while performing
various actions.
Planning in Artificial Intelligent can be defined as a problem that needs decision making by intelligent systems to
accomplish the given target.
The intelligent system can be a robot or a computer program.
Take example of a driver who has to pick up and drop people from one place to another. Say he has to pick up two
people from two different places then he has to follow some sequence, he cannot pick both passengers at same time.
There is one more definition of planning which says that, Planning is an activity where agent has to come up with a
sequence of actions to accomplish target.
Now, let us see what information is available while formulating a planning problem and what results are expected.
We have information about the initial status of the agent, goal conditions of agent and set of actions an agent can
take.
AI&SC (MU-Sem. 7-Comp) 3-37 Knowledge, Reasoning and Planning
Aim of an agent is to find the proper sequence of actions which will lead from starting state to goal state and produce
an efficient solution.
ge
io led
ic ow
sensors and effectors/actuators. When a task comes to this agent it has to decide the sequence of actions to be taken
and then accordingly execute these actions.
at
Pu ch
Te
We have seen in above section what information is available while formulating a planning problem and what results
are expected. Also it is understandable here that, states of an agent correspond to the probable surrounding
environments while the actions and goals of an agent are specified based on logical formalization.
AI&SC (MU-Sem. 7-Comp) 3-38 Knowledge, Reasoning and Planning
Also we have learnt about various types of intelligent agents in chapter 1. Which shows that, to achieve any goal an
agent has to answer few questions like “what will be the effect of its actions”, “how it will affect the upcoming
actions”, etc. This illustrates that an agent must be able to provide a proper reasoning about its future actions, states
of surrounding environments, etc.
Consider simple Tic-Tac-Toe game. A Player cannot win a game in one step, he/she has to follow sequence of actions
to win the game. While taking every next step he/she has to consider old steps and has to imagine the probable future
actions of an opponent and accordingly make the next move and at the same time he/she should also consider the
consequences of his/her actions.
A classical planning has the following assumptions about the task environment:
o Fully Observable :Agent can observe the current state of the environment.
o Deterministic: Agent can determine the consequences of its actions.
o Finite: There are finite set of actions which can be carried out by the agent at every state in order to achieve the
goal.
o Static : Events are steady. External event which cannot be handled by agent is not considered.
ge
o Discrete :Events of the agent are distinct from starting step to the ending(goal) state in terms of time.
So, basically a planning problem finds the sequence of actions to accomplish the goal based on the above
io led
assumptions.
Also goal can be specified as a union of sub-goals.
Take example of ping pong game where, points are assigned to opponent player when a player fails to return the ball
ic ow
within the rules of ping-pong game. There can a best 3 of 5 matches where, to win a match you have to win 3 games
and in every game you have to win with a minimum margin of 2 points.
n
bl kn
Q. How planning problem differs from search problem? (Dec. 12, 2.5M)
Q. How planning defers from searching ? (Dec. 13, 5 Marks)
Te
Q. Compare and contrast problem solving agent and planning agent. (Dec. 15, 5 Marks)
Generally problem solving and planning methodologies can solve similar type of problems. Main difference between
problem solving and planning is that planning is a more open process and agents follow logic-based representation.
Planning is supposed to be more powerful than problem solving because of these two reasons.
Planning agent has situations (i.e. states), goals (i.e. target end conditions) and operations (i.e. actions performed).All
these parameters are decomposed into sets of sentences and further in sets words depending on the need of the
system.
Planning agents can deal with situations/states more efficiently because of its explicit reasoning capability also it can
communicate with the world. Agents can reflect on their targets and we can minimize the complexity of the planning
problem by independently planning for sub-goals of an agent. Agents have information about past actions, presents
actions and the important point is that it can predict the effect of actions by inspecting the operations.
Planning is a logical representation, based on situation, goals and operations, of problem solving.
Planning = Problem solving + Logical representation
We plan activities in order to achieve some goals. Main goal can be divided into sub-goals to make planning more
efficient.
AI&SC (MU-Sem. 7-Comp) 3-39 Knowledge, Reasoning and Planning
Take example of a grocery shopping at supermarket, suppose you want to buy milk, bread and egg from supermarket,
then your initial state will be – “at home” and goal state will be – “get milk, bread and egg”.
Now if you look at the Fig. 3.14.1 you will understand that branching factor can be enormous depending upon the set
of actions, for e.g. Watch TV, read book, etc., available at that point of time.
ge
io led
Fig. 3.14.1 : Supermarket examples to understand need of planning
Thus branching factor can be defined as a set of all probable actions at any state, set can be very large, such as in the
ic ow
supermarket example or block problem. If the domain of probable actions increases, the branching factor will also
increase as they are directly proportional to each other, this will result in the increase of search space.
n
To reach the goal state you have to follow many steps, if you consider using heuristic functions, then you have to
bl kn
remember that it will not be able to eliminate states; these functions will be helpful only for guiding the search of
states.
at
Pu ch
So it becomes difficult to choose best actions. (i.e. even if we go to supermarket we need to make sure that all three
listed items are picked, only then goal state can be achieved).
Te
As there are many possible actions and it is difficult to describe every state, and there can be combined goals (as seen
in supermarket example) searching is inadequate to achieve goals efficiently. In order to be more efficient planning is
required.
In above sections, we discussed that planning requires explicit knowledge that means in case of planning we need to
know the exact sequence of actions which will be useful in order to achieve the goal.
Advantage of planning is that, the order of planning and the order of execution need not be same. For example, you
can plan how to pay bills for grocery before planning to go to supermarket.
Another advantage of planning is that you can make use of divide and conquer policy by dividing /decomposing the
goal into sub goals.
There are many approaches to solve planning problems. Following are few major approaches used for planning :
o Planning with state space search.
o Partial ordered planning.
o Hierarchical planning / hierarchical decomposition (HTN planning).
o Planning situation calculus/ planning with operators.
o Conditional planning.
o Planning with operators.
AI&SC (MU-Sem. 7-Comp) 3-40 Knowledge, Reasoning and Planning
ge
GRAPHPLAN can be used to extract a solution directly.
Planning graphs work only for propositional problems without variables. You have learnt Gantt charts. Similarly, in
io led
case of planning graphs there are series of levels which match to time ladder in the plan. Every level has set of literals
and a set of actions. Level 0 is the initial state of planning graph.
ic ow
n
bl kn
at
Pu ch
Te
Fig. 3.15.2
AI&SC (MU-Sem. 7-Comp) 3-41 Knowledge, Reasoning and Planning
Start at level L0 and determine action level A0 and next level L1.
o A0 >> all actions whose prerequisite is satisfied at previous level.
o Connect precondition and effect of actions L0 – >L1.
o Inaction is represented by persistence actions.
o Level A0 contains the possible actions.
o Conflicts between actions are shown by mutual exclusion links.
o Level L1 contains all literals that could result from picking any subset of actions in Level A0.
o Conflicts between literals which cannot occur together (as a effect of selection action) are represented by mutual
exclusion links.
o L1 defines multiple states and the mutual exclusion links are the constraints that define this set of states.
o Continue until two consecutive levels are the same or contain the same amount of literals.
A mutual exclusion relation holds between A mutual exclusion relation holds between
two actions when : two literals when :
ge
One action cancels out the effect of If one literal is the negation of the other
another action. literal OR.
io led
One of the effects of action is negation If each possible action pair that could
of preconditions of other action. achieve the literal is mutually exclusive.
ic ow
goals GOALS[problem]
loop do
ifgoals all non-mutex in last level of graph then do
solutionEXTRACT-SOLUTION(graph, goals,
LENGTH(graph))
ifsolution failure then returnsolution
else if NO-SOLUTION-POSSIBLE(graph) then return failure
graph EXPAND-GRAPH(graph, problem)
Properties of planning graph
If goal is absent from last level then goal cannot be achieved!.
If there exists a path to goal then goal is present in the last level.
If goal is present in last level then there may not exist any path.
You can understand from Fig. 3.16.1 that the office agent is at location 250 on the state space grid. When he gets a
task he has to decide which task can be performed more efficiently in lesser time.
ge
io led
ic ow
n
bl kn
at
If it finds some input and output locations nearer on state space grid for example in case of printing task then the
probability of performing that task will increase.
Te
But to do this it should be aware of its own current location, the locations of people who are assigning tasks and the
locations of the required devices.
State space search is unfavourable for solving real-world problems because, it requires complete description of every
searched state, also search should be carried out locally.
There can be two ways of representations for a state:
1. Complete world description
2. Path from an initial state
Drawback of these types is that it does not explicitly specify “What holds in every state”. Because of this it can be
difficult to determine whether two states are same.
ge
Let us take an example of a water jug problem
We have two jugs, a 4- gallon one and a 3- gallon one. Neither has any measuring markers on it. There is a pump that
can be used to fill the jugs with water how can you get exact 2 gallons of water into the 4- gallon jug?
io led
The state space for this problem can be described as of ordered pairs of integers (x, y), such that x = 0,1,2,3 or 4,
representing the number of gallons of water in the 4- gallon jug and y = 0,1,2 or 3, representing the quantity of water
in the 3- gallon jug. The start state is (0, 0). The goal state is (2, n) for any value of n, since the problem does not
ic ow
Rule set
Pu ch
If x<4
2. (x,y) (x,3) fill the 3-gallon jug
If x<3
3. (x,y) (x-d,y) pour some water out of the 4- gallon jug
If x>0
4. (x,y) (x-d,y) pour some water out of the 3- gallon jug
If y>0
5. (x,y) (0,y) empty the 4- gallon jug on the ground
If x>0
6. (x,y) (x,0) empty the 3- gallon jug on the ground
If y>0
7. (x,y) (4,y-(4-x)) pour water from the 3- gallon jug into the 4- gallon
9. (x,y) (x+y,0) pour all the water from the 3 -gallon jug into
ge
0 0
io led 0 3 2
3 0 9
3 3 2
4 2 7
ic ow
0 2 5 or 12
n
2 0 9 or 11
bl kn
Fig. 3.16.3
AI&SC (MU-Sem. 7-Comp) 3-45 Knowledge, Reasoning and Planning
ge
3.18 Progression Planners
“Forward state-space search” is also called as “progression planner”. It is a deterministic planning technique, as we
io led
plan sequence of actions starting from the initial state in order to attain the goal.
With forward state space searching method we start with the initial state and go to final goal state. While doing this
we need to consider the probable effects of the actions taken at every state.
ic ow
Thus the prerequisite for this type of planning is to have initial world state information, details of the available actions
of the agent, and description of the goal state.
n
Remember that details of the available actions include preconditions and effects of that action.
bl kn
See Fig. 3.18.1 it gives a state-space graph of progression planner for a simple example where flight 1 is at location A
and flight 2 is also at location A. these flights are moving from location A to location B. In 1st case only flight 1 moves
at
Pu ch
from location A to location B, so the resulting state shows that after performing that action flight 1 is at location B
where as flight 2 is at its original location – A. similarly In 2nd case only flight 2 moves from location A to location B
and the resulting state shows that after performing that action flight 2 is at location B while flight 1 is at its original
Te
location – A.
If preconditions are satisfied then the actions are favoured i.e. if the preconditions are satisfied then positive
effect literals are added for that action else the negative effect literals are deleted for that action.
Perform goal testing by checking if the state will satisfy the goal.
Lastly keep the step cost for each action as 1.
2. Consider example of A* algorithm. A complete graph search is considered as a complete planning algorithm. Functions
are not used.
3. Progression planner algorithm is supposed to be inefficient because of the irrelevant action problem and requirement
of good heuristics for efficient search.
“Backward state-space search” is also called as “regression planner” from the name of this method you can make out
that the processing will start from the finishing state and then you will go backwards to the initial state.
ge
So basically we try to backtrack the scenario and find out the best possibility, in-order to achieve the goal to achieve
this we have to see what might have been correct action at previous state.
io led
In forward state space search we used to need information about the successors of the current state now, for
backward state-space search we will need information about the predecessors of the current state.
Here the problem is that there can be many possible goal states which are equally acceptable. That is why this
ic ow
approach is not considered as a practical approach when there are large numbers of states which satisfy the goal.
Let us see flight example, here you can see that the goal state is flight 1 is at location B and flight 2 is also at location B.
n
We can see in Fig. 3.19.1. If this state is checked backwards we have two acceptable states in one state only flight 2 is
bl kn
at location B, but flight 1 is at location A and similarly in 2nd possible state flight 1 is already at location B, but flight 2
is at location A.
at
Pu ch
As we search backwards from goal state to initial state, we have to deal with partial information about the state, since
we do not yet know what actions will get us to goal. This method is complex because we have to achieve a
Te
conjunction of goals.
In this Fig. 3.19.1, rectangles are goals that must be achieved and lines shows the corresponding actions.
Regression algorithm
2. Actions must be consistent it should not undo preferred literals. If there are positive effects of actions which appear
in goal then they are deleted. Otherwise Each precondition literal of action is added, except it already appears.
3. Main advantage of this method is only relevant actions are taken into consideration. Compared to forward search,
backward search method has much lower branching factor.
Progression or Regression is not very efficient with complex problems. They need good heuristic to achieve better
efficiency. Best solution is NP Hard (NP stands for Non-deterministic and Polynomial-time).
There are two ways to make state space search efficient:
o Use linear method :Add the steps which build on their immediate successors or predecessors.
o Use partial planning method :As per the requirement at execution time ordering constraints are imposed on
ge
agent. io led
3.20 Total Order Planning (TOP)
We have seen in above section that forward and regression planners impose a total ordering on actions at all stages of
ic ow
In case of Total Order Planning (TOP), we have to follow sequence of actions for the entire task at once and to do this
n
bl kn
we can have multiple combinations of required actions. Here, we need to remember one most important thing which
should be taken care of, is that TOP should take care of preconditions while creating the sequence of actions.
at
Pu ch
For example, we cannot wear left shoe without wearing the left sock and we cannot wear the right shoe without
wearing the right sock. So while creating the sequence of actions in total ordered planning, wearing left sock action
Te
should be executed before wearing the left shoe and wearing the right sock action should be executed before wearing
the right shoe. As you can see in Fig.3.20.1.
If there is a cycle of constraints then, total ordered planning cannot give good results. TOP can fail in non-cooperative
environments. So we have Partial Ordered Planning method.
In case of Partial Ordered Planning (POP),ordering of the actions is partial. Also partial ordered planning does not
specify which action will come first out of the two actions which are placed in plan.
With partial ordered planning, problem can be decomposed, so it can work well in case the environment is
non-cooperative.
ge
Take same example of wearing shoe to understand partial ordered planning.
io led
ic ow
n
bl kn
at
Pu ch
Te
o In this case to wear a left shoe, wearing left sock is the precondition, similarly.
o Second branch covers right-sock and right-shoe.
o Here, wearing a right sock is the precondition for wearing the right shoe.
Once these actions are taken we achieve our goal and reach the finish state.
Q. Define partial order planner. (Dec. 14, May 15, Dec. 15, 5/6 Marks)
AI&SC (MU-Sem. 7-Comp) 3-49 Knowledge, Reasoning and Planning
If we considered POP as a search problem, then we say that states are small plans.
States are generally unfinished actions. If we take an empty plan then, it will consist of only starting and finishing
actions.
Every plan has four main components, which can be given as follows:
1. Set of actions
These are the steps of a plan. Actions which can be performed in order to achieve goal are stored in set of actions
component.
For example : Set of Actions={ Start, Rightsock, Rightshoe, Leftsock, Leftshoe, Finish}
Here, wearing left sock, wearing left shoe, wearing right sock, wearing right shoe are set of actions.
ge
action "y")
For example : Set of Ordering ={Right-sock < Right-shoe; Left-sock < Left-shoe} that is In order to wear shoe, first
io led
we should wear a sock.
So the ordering constraint can beWear Left-sock < wear Left-shoe (Wearing Left-sock action should be taken
before wearing Left-shoe)Or Wear right-sock < wear right-shoe(Wearing right-sock action should be taken before
ic ow
wearing right-shoe).
If constraints are cyclic then it represents inconsistency.
n
If we want to have a consistent plan then there should not be any cycle of preconditions.
bl kn
Fig.3.21.2 : (a) Causal Link Partial Order Planning (b) Causal Link Example
From Fig.3.21.2(b) you can understand that if you buy an apple it’s effect can be eating an apple and the
precondition of eating an apple is cutting apple.
There can be conflict if there is an action C that has an effect ¬E and, according to the ordering constraints it
comes after action A and before action B.
Say we don’t want to eat an apple instead of that we want to make a decorative apple swan. This action can be
between A and B and It does not have effect "E".
o For example: Set of Causal Links = {Right-sock->Right-sock-on Right-shoe, Leftsock Leftsockon
Leftshoe, Rightshoe Rightshoeon Finish, leftshoe leftshoeon Finish }.
o To have a consistent plan there should not be any conflicts with the causal links.
Preconditions are called open if it cannot be achieved by some actions in the plan. Least commitment strategy
can be used by delaying the choice during search.
To have a consistent plan there should not be any open precondition
AI&SC (MU-Sem. 7-Comp) 3-50 Knowledge, Reasoning and Planning
As consistent plan does not have cycle of constraints; it does not have conflicts in the causal links and does not have
open preconditions so it can provide a solution for POP problem.
While solving POP problem operators can add links and steps from existing plans to open preconditions in order to
fulfil the open preconditions and then steps can be ordered with respect to the other steps for removing the potential
conflicts. If the open precondition is unattainable, then backtrack the steps and try solving the problem with POP.
Partial ordered planning is a more efficient method, because with the help of POP we can progress from vague plan to
complete and correct solution in a faster way. Also we can solve a huge state space plan in less number of steps, this is
because search takes place only when sub-plans interact.
Hierarchical planning is also called as plan decomposition. Generally plans are organized in a hierarchical format.
ge
Complex actions can be decomposed into more primitive actions and it can be denoted with the help of links between
various states at different levels of the hierarchy. This is called as operator expansion.
io led
For example :
ic ow
n
Fig. 3.22.1 : Operator expansion
bl kn
Fig. 3.22.2 shows, how to create a hierarchical plan to travel from some source to a destination. Also you can observe,
at every level how we follow some sequence of actions.
at
Pu ch
Te
ge
io led
ic ow
In terms of major and minor actions, hierarchy of actions can be decided. Minor activities would cover more precise
activities to accomplish the major activities. In case of above example, we can have railway Ticket Booking, Hotel
at
Pu ch
Booking, Reaching Rajasthan, Staying and enjoying there, coming back are the major activities.
While take a taxi to reach railway station, Have candle light dinner in palace, Take photos, etc are the Minor activities.
Te
3.22.3 Planner
1. First identify a hierarchy of major conditions.
2. Construct a plan in levels (Major steps then minor steps), so we postpone the details to next level.
3. Patch major levels as detail actions become visible.
4. Finally demonstrate.
AI&SC (MU-Sem. 7-Comp) 3-52 Knowledge, Reasoning and Planning
ge
io led
ic ow
n
bl kn
at
Pu ch
Te
Get taxi(2)
Reach railway station(3)
Pay-driver(1)
Check in(1)
Boarding train(2)
Reach Rajasthan (3)
st
1 level plan
Finding train (2), Buy ticket (3), Get taxi(2), Reach railway station (3), Boarding train(2), Reach Rajasthan (3).
rd
3 level plan (final)
ge
Opening [Link] (1), Finding train (2), Buy ticket (3), Get taxi(2), Reach Railway station (3), Pay-driver(1), Check
in(1), Boarding train(2), Reach Rajasthan (3).
io led
3.23 Planning Languages
MU - May 13, Dec. 14
ic ow
Q. Explain STRIPS representation of planning problem. (May 13, Dec. 14, 5 Marks)
n
Language should be expressive enough to explain a wide variety of problems and restrictive enough to allow efficient
bl kn
Later on this name was given to a formal planning language. STRIPS is foundation for most of the languages in order to
express automated planning problem instances in current use.
Action Description Language (ADL)
ADL is an advancement of STRIPS. Pednault proposed ADL in 1987.
ge
io led
Fig. 3.23.1
4. Then Stack Y on Z
5. Grab X and Pickup X
Te
6. Stack X on Y
Elementary problem is that framing problem in AI is concerned with the question of what piece of knowledge or
information is pertinent to the situation.
To solve this problem we have tom make an Elementary Assumption which is a Closed world assumption. (i.e. If
something is not asserted in the knowledge base then it is assumed to be false, this is also called as “Negation by
failure”)
Standard sequence of actions can be given as for the block world problem :
on(Y, table) on(Z, table)
on(X, table) on(Y, Z)
on(Z, X) on(X, Y)
hand empty hand empty
clear(Z) clear(X)
clear(Y)
Fig. 3.23.2
AI&SC (MU-Sem. 7-Comp) 3-55 Knowledge, Reasoning and Planning
We can write 4 main rules for the block world problem as follows :
Rule Precondition and Deletion List Add List
Rule 1 pickup(X) hand empty, on(X,table), holding(X)
clear(X)
Rule 2 putdown(X) holding(X) hand empty, on(X,table),
clear(X)
Rule 3 stack(X,Y) holding(X), on(X,Y),
clear(Y) clear(X)
Rule 4 unstack(X,Y) on(X,Y), holding(X),
clear(X) clear(Y)
Based on the above rules, plan for the block world problem:Start goalcan be specified as follows:
1. unstack(Z,X) 2. putdown(Z)
ge
3. pickup(Y) 4. stack(Y,Z)
5. pickup(X) 6. stack(X,Y)
io led
Execution of this plan can be done by making use of a data structure called "Triangular Table".
on (C, A) clear (C) unstuck
1
handempty (C, A)
ic ow
Fig. 3.23.3
In a triangular table there are N + 1 rows and columns. It can be seen from the Fig. 3.23.4 that rows have 1 n+1
condition and for columns 0 n condition is followed. The first column of the triangular table indicates the starting
state and the last row of the triangular table indicates the goal state.
With the help of triangular table a tree is formed as shown below to achieve the goal state :
Fig. 3.23.4
An agent (in this case robotic arm) can have some amount of fault tolerance. Fig. 3.23.5 shows one such example.
AI&SC (MU-Sem. 7-Comp) 3-56 Knowledge, Reasoning and Planning
Fig. 3.23.5
ge
The ADL description of the problem is shown. Notice that it is purely propositional. It goes beyond STRIPS in that it
uses a negated precondition, ¬At(Flat, Axle), for the PutOn(Spare, Axle) action. This could be avoided by using Clear
io led
(Axle) instead, as we will see in the next example.
Action(Remove(Flat, Axle),
Pu ch
Action(PutOn(Spare, Axle),
PRECOND: At(Spare, Ground) ∧ ¬ At(Flat, Axle)
EFFECT: ¬ At(Spare, Ground) ∧ At(Spare, Axle))
Action(LeaveOvernight)
PRECOND
EFFECT: ¬ At(Spare, Ground) ∧ ¬ At(Spare, Axle) ∧ ¬ At(Spare, Trunk ) ∧ ¬ At(Flat, Ground) ∧ ¬ At(Flat, Axle))
It should be noted that real world itself is not uncertain but human perception related to world is uncertain. In
artificial intelligence, we try to give human perception ability to machine, and hence machine also receives the
perception of uncertainty about the real world. So machine has to deal with incomplete and incorrect information like
human does.
Determining the condition of state depends on available knowledge. In real world, knowledge availability is always
limited, so most of the time, conditions are non deterministic.
The amount or degree of indeterminacy depends upon the knowledge available. The inter determinacy is called
“bounded indeterminacy” when actions can have unpredictable effects.
Four planning strategies are there for handling indeterminacy :
(i) Sensorless planning
(ii) Conditional planning
(iii) Execution monitoring and replanning
ge
(iv) Continuous planning
Conditional plannings are sometimes termed as contingency planning and deals with bounded indeterminacy
n
discussed earlier. Agent makes a plan, evaluate the plan and then execute it fully or partly depending on the
bl kn
condition.
In this kind of planning agent can employ any of the strategy of planning discussed earlier. Additionally it observes the
plan execution and if needed, replan it and again executes and observes.
Te
Continuous planning does not stop after performing action. It persist over time and keeps on planning on some
predefined events. These events include any type unexpected circumstance in environment.
(i) Co-operation
In co-operation strategy agents have joint goals and plans. Goals can be divided into sub goals but ultimately
combined to achieve ultimate goal.
(ii) Multibody planning
Multi body planning is the strategy of implementing correct joint plan.
(iii) Co-ordination mechanisms
These strategies specify the co-ordination between co-operating agents. Co-ordination mechanism is used in several
co-operating plannings.
(iv) Competition
Competition strategies are used when agents are not co-operating but competing with each other. Every agent wants
to achieve the goal first.
ge
3.26 Conditional Planning
Conditional planning has to work regardless of the outcome of an action.
io led
Conditional Planning can take place in Fully Observable Environments (FOE) where the current state of the agent is
known environment is fully observable. The outcome of actions cannot be determined so the environment is said to
be nondeterministic.
ic ow
n
bl kn
at
Pu ch
Te
In vacuum agent example if the dirt is at Right and agent knows about Right, but not about Left. Then, in such cases
Dirt might be left behind when the agent, leaves a clean square. Initial state is also called as a state set or a belief
state.
Sensors play important role in Conditional Planning for partially observable environments. Automatic sensing can be
useful; with automatic sensing an agent gets all the available percepts at every step. Another method is Active
sensing, with which percepts are obtained only by executing specific sensory actions.
ge
io led
ic ow
n
bl kn
at
Pu ch
Te
Review Questions
Q. 2 Describe WUMPUS WORLD Environment. Specify PEAS properties and type of environment for the same.
Q. 4 What is propositional logic? Write syntax and semantics and example sentences for propositional logic.
Q. 5 Explain the inference process in case of propositional logic with suitable examples.
Q. 7 What is first order logic? Write syntax and semantics of FOL with example.
Q. 14 What are the major approaches of planning? Explain conditional planning with example.
ge
Q. 20 Explain partial order planning with example.
4 Fuzzy Logic
Unit IV
Syllabus
4.1 Introduction to Fuzzy Set: Fuzzy set theory, Fuzzy set versus crisp set, Crisp relation & fuzzy relations,
membership functions,
4.2 Fuzzy Logic: Fuzzy Logic basics, Fuzzy Rules and Fuzzy Reasoning
4.3 Fuzzy inference systems: Fuzzification of input variables, defuzzification and fuzzy controllers.
ge
4.1 Introduction to Fuzzy Set
io led
Fuzzy logic was introduced by Prof. Lofti A. Zadeh in 1965.
The word fuzzy means “Vagueness”.
ic ow
Most of our traditional tools for formal modelling, reasoning and computing are crisp, deterministic and precise.
at
Pu ch
While designing the system using classical set, we assume that the structures and parameters of the model are
definitely known and there are no doubts about their values or their occurrence.
Te
But in real world there exists much fuzzy knowledge; knowledge that is vague, imprecise, uncertain, ambiguous,
inexact or probabilistic in nature.
There are two facts ;
1. Real situations are very often not crisp and deterministic and they cannot be described precisely.
2. The complete description of a real system often would require more detailed data than a human being could ever
recognize simultaneously, process and understand.
Because of these facts, modelling the real system using classical sets often do not reflect the nature of human
concepts and thoughts which are abstract, imprecise and ambiguous.
The classical (crisp) sets are unable to cope with such unreliable and incomplete information.
We want our systems should also be able to cope with unreliable and incomplete information and give expert opinion.
Fuzzy set theory has been introduced to deal with such unreliable, incomplete, vague and imprecise information.
Fuzzy set theory is an extension to classical set theory where element have degree of membership.
Fuzzy logic uses the whole interval between 0 (false) and 1 (true) to describe human reasoning.
A classical set (or conventional or crisp set) is a set with a crisp boundary.
AI&SC (MU-Sem. 7-Comp) 4-2 Fuzzy Logic
For example, a classical set A of real numbers greater than 6 can be expressed as
A = {x | x 6}
Where there is a clear, unambiguous boundary ‘6’ such that if x is greater than this number, then x belongs to the set
A, otherwise x does not belong to the set.
Although classical sets are suitable for various approximations and have proven to be an important tool for
mathematics and computer science, they do not reflect the nature of human concepts and thoughts, which are
abstract, imprecise and ambiguous.
For example, mathematically we can express the set of all tall persons as a collection of persons whose height is more
than 6 ft.
A = {x | x 6}
ge
Where A = “tall person” and x = “height”.
The problem with the classical set is that it would classify a person 6.001 ft. tall as a tall person, but a person 5.999 ft.
io led
tall as “not tall”. This distinction is intuitively unreasonable.
The flaw comes from the sharp transition between inclusion and exclusion in a set.(Fig. 4.2.1)
ic ow
n
bl kn
at
Pu ch
Te
Fuzzy logic uses the "degrees of truth" rather than the usual "true or false" (1 or 0) Boolean logic.
Fuzzy logic includes 0 and 1 as extreme cases of truth but also includes the various states of truth in between so that,
for example, the result of a comparison between two things could be not "tall" or "short" but "0.38 of tallness."
As shown in Fig 4.2.2 the fuzzy logic defines smooth transition from ‘not tall’ to ‘tall’. A person’s height may now
belong to both the groups “tall’ and ‘Not all’ but now it will have the degree of membership associated with it for each
group.
A person has 0.30 membership in ‘Not tall’ group and ‘0.95’ membership in ‘tall’ group, so definitely the person is
categorized as a tall person.
1. A fuzzy set follows the infinite-valued logic A crisp set is based on bi-valued logic.
2. A fuzzy set is determined by its indeterminate A crisp set is defined by crisp boundaries, and
ge
boundaries, there exists an uncertainty about the set contain the precise location of the set
boundaries boundaries.
io led
3. Fuzzy set elements are permitted to be partly crisp set elements can have a total
accommodated by the set (exhibiting gradual membership or non-membership.
ic ow
membership degrees).
4. Fuzzy sets are capable of handling uncertainty and Crisp set requires precise , complete and finite
n
bl kn
If X is a collection of objects denoted generally by x, then a fuzzy set A in X is defined as a set of ordered pairs :
A = { (x, A
(x)) | x X }
Where A(x) is called the Membership Function (MF) for the fuzzy set A .
Note : Classical sets are also called ordinary sets, crisp sets, non fuzzy sets or just sets.
Here, X is referred to as the Universe of discourse or simply the Universe and it may consist of discrete objects or
continuous space.
For example,
Let X = {San Francisco, Boston, Los Angeles} be the set of cities one may choose to live in.
The fuzzy set “desirable city to live in” may be described as follows:
= {(San Francisco, 0.9), (Boston, 0.8), (Los Angeles, 0.6)}
A
Here, the universe of discourse X is discrete and it contains non-ordered objects, in this case three big cities in
United States.
ge
Here we have a discrete ordered universe X.
The MF is shown in Fig. 4.3.1.
io led
ic ow
n
bl kn
at
Pu ch
Te
Note : Membership grades of the fuzzy set are subjective measures. (For example, the height 5’5” may be
considered tall in Japan, but in Australia, it may be considered medium).
ge
Fig. 4.3.2 : MF for “about 50 years old”
X 40 42 45 48 50 52 53 55 56 58 60
B(x) 0.5 0.71 0.94 0.99 1 0.99 0.99 0.94 0.89 0.71 0.5
n
bl kn
specified for the same concept by different persons may vary considerably. Therefore, the subjectivity
and non randomness of fuzzy sets is the primary difference between the fuzzy sets and probability
theory.
We define,
A = {(x, A
(x)) | x X},
x 10
1
0
A (x) = 1
; x 10
1 + (x – 10) 2
3) Using + Notation :
Fuzzy set for “comfortable type of house for a four person family” may be described as,
0.2 0.5 0.8 1 0.7 0.3
A = + + + + +
1 2 3 4 5 6
i.e. we define A as
ge
A = A (x1) / x1 + A (x2) / x2 + …..
n
io led = A (xi) / xi
i=1
4) Using Venn diagrams :
ic ow
Sometimes it is more convenient to give the graph that represents membership function.
n
bl kn
at
Pu ch
Te
5) Other notations :
A = {(3, 0.1) + (4, 0.3)} + …..}
or
0.1 0.3 0.6
A = , , , …..
3 4 5
Suppose that X = “age”. Then, we can define fuzzy sets “young”, “middle aged” and “old” that are characterized by
MFs young (x), middle aged (x) and old (x).
A linguistic variable (“age”) can assume different linguistic values such as “young”, “middle aged” and “old” in this
case.
Note that, the universe of discourse is totally covered by these MFs (MFs for young, middle aged and old) and
transition from one MF to another is smooth and gradual.
AI&SC (MU-Sem. 7-Comp) 4-7 Fuzzy Logic
Q. Define supports, core, normality, crossover points and – cut for fuzzy set. (Dec. 11, 5 Marks)
1. Support :
A support of a fuzzy set A is the set of all points x in X such that, A(x) 0.
Support (A ) = {x | A
(x) 0}
2. Core / Nucleus :
The core of a fuzzy set A is the set of all points x in X such that A (x) = 1.
Core A = {x | A
(x) = 1}
3. Normality :
ge
A fuzzy set A is normal if its core is non-empty. In other words there must be at least one point x X such that
A(x) = 1.
io led
4. Crossover points :
A cross over point of a fuzzy set A is a point x X at which A(x) = 0.5.
ic ow
Crossover (A ) = {x | A
(x) = 0.5}
n
5. Fuzzy singleton :
bl kn
A fuzzy set whose support is a single point in X with A(x) = 1 is called a fuzzy singleton.
at
Pu ch
Te
A = {x | A
(x) }
AI&SC (MU-Sem. 7-Comp) 4-8 Fuzzy Logic
A = {x | A
(x) }
Using the above notations, we can express support and core of a fuzzy set A as,
Support (A ) = A0 Here =0
Core (A ) = A1 Here = 1
8. Convexity :
ge
io led
ic ow
n
bl kn
A fuzzy set A is convex if and only if for any x1 and x2 X and any [0, 1].
Pu ch
or
A is convex if all its -level sets are convex.
9. Fuzzy numbers :
A fuzzy number A is a fuzzy set in the real line (R) that satisfies the conditions for normality and convexity.
Where A
(x1) = (x2) = 0.5
A
11. Symmetry :
A fuzzy set A is symmetric if its MF is symmetric around a certain point x = c,
ge
A fuzzy set A is open right if,
io led
lim
(x) = 0
x– A
and
lim
(x) = 1
x+ A
ic ow
n
bl kn
at
Pu ch
A fuzzy set A is closed if,
lim
x+ A
(x) = 0 and
lim
x– A
(x) = 0
13. Cardinality :
Cardinality of a fuzzy set A is defined as
|A | = A (x)
xX
AI&SC (MU-Sem. 7-Comp) 4-10 Fuzzy Logic
ge
io led
ic ow
n
bl kn
Fuzzy sets follow the same properties as crisp set except for the law of excluded middle and law of contradiction.
That is, for fuzzy set A
– –
A ∪A U ; A ∩A
The following are the properties of fuzzy sets,
1. Commutativity :
A ∪B = B ∪A and
A ∩B = B ∩A
2. Associativity :
A ∪ (B ∪C ) = (A ∪B ) ∪C
A ∩ (B ∩C ) = (A ∩B ) ∩C
AI&SC (MU-Sem. 7-Comp) 4-11 Fuzzy Logic
3. Distributivity :
A ∪ (B ∩C ) = (A ∪B ) ∩ (A ∪C )
A ∩ (B ∪C ) = (A ∩B ) ∪ (A ∩C )
4. Identity :
A ∪ = A ; A∪U=U
A ∩ = ; A∩U=A
5. Involution :
––
A = A
6. Transitivity :
ge
If A ⊂B ⊂C , then A ⊂ C
7. De Morgan’s law :
io led
–––––– – –
A∪B = A ∩B
ic ow
–––––– – –
A∩B = A ∪B
n
4.3.7 Operations on Fuzzy Sets
bl kn
1. Containment or Subset :
at
Pu ch
Fuzzy set A is contained in fuzzy set B if and only if A(x) B(x) for all x.
Te
A B A
(x) (x)
B
Union of A and B is denoted by (A B ) or (A or B )
ge
io led
ic ow
–
The complement of a fuzzy set A , denoted by A is defined as,
at
Pu ch
A + B(x) = A (x) + B (x) – A (x) .B (x)
2. Algebraic product :
ge
We can represent the relation in the form of matrix.
io led
An n-dimensional relation matrix represents on n-ary relation.
So, binary relation is represented by 2 dimensional matrices.
Ex : Consider the following two universe,
ic ow
X Y = {(a, 1), (a, 2), (a, 3), (b, 1), (b, 2), (b, 3), (c, 1), (c, 2), (c, 3)}
at
R = {(a, 1), (b, 2), (b, 3), (c, 1), (c, 3)}
Te
R 1 2 3
a 1 0 0
b 0 1 1
c 1 0 1
The relation between set X and Y can also be represented as coordinate diagram as shown in Fig. 4.4.1.
The relation R can also be expressed by mapping representation as shown in Fig. 4.4.2.
A characteristic function is used to assign values of relationship in the mapping of X Y to the binary values and is
given by,
1 (x, y) R
fR (x, y) =
0 (x, y) R
ge
Let X and Y be two universe and n elements of X are related to m elements of y.
io led
Let the Cardinality of X is X and cardinality of Y is Y, then the cardinality of relation R between X and Y is,
X Y = X Y
The Cardinality of the power set P (X Y) is given as,
ic ow
(XY)
P (X Y) = 2
n
4.4.1(B) Operations on Classical Relations
bl kn
0 0 0
A = 0 0 0
Te
0 0 0
And complete relation is defined as,
1 1 1
EA = 1 1 1
1 1 1
The following operations can be performed on two relations A and B.
1. Union
2. Intersection
3. Complement
–
A fA– (x, y) : fA–(x, y) = 1 – fA– (x, y)
4. Containment
5. Identity
A and X EA
The properties of classical set such as commutativity, associativity, involution, distributivity and idempotency hold
good for classical relation also.
Also De Morgan’s law and excluded middle laws hold good for crisp relations.
Composition is a process of combining two compatible binary relations to get a single relation.
Let A be a relation that maps elements from universe X to universe Y.
ge
The two binary relations A and B are said to be compatible if,
Ex. :
ic ow
Z = {c1 , c2 , c3}
at
ge
yY
2. The max-product composition is defined as,
io led T = AB
fT (x, z) = [fA (x, y) fB (y, z)]
yY
ic ow
Note : In the above equations represents max operation, represents min operation and represents
product operation.
n
bl kn
2. Commutative : ABBA
–1 –1 –1
3. Inverse : (A B) = A B
Te
Definition:
Let U and V be continuous universe, and R : U V [0, 1] then,
R = R (u,v) / (u,v)
uv
Is a binary fuzzy relation on U V.
If U and V are discrete universe, then
R = R (u, v)/(u,v)
UV
AI&SC (MU-Sem. 7-Comp) 4-17 Fuzzy Logic
R= {(1,1)
1
+ (2,2) + (3,3) + (1,2) + (1,3) + (2,1) + (2,3) + (3,1) + (3,2) }
1 1 0.8 0.3 0.8 0.8 0.3 0.8
ge
Soln. : The membership function R of this relation can be described as,
io led
1 when x=y
R (x, y) = 0.8 when |x–y|=1
ic ow
0.3 when | x – y | = 2
n
The matrix notation is,
bl kn
Y
1 2 3
at
Pu ch
1 1 0.8 0.3
X 2 0.8 1 0.8
Te
3 0.3 0.8 1
Q. Explain cylindrical extension and projection operations on fuzzy relation with example. (May 13, 5 Marks)
Fuzzy relations are very important in fuzzy controller because they can describe interaction between variables.
Four types of operations can be performed on fuzzy relation.
1. Intersection
Let R and S be binary relations defined on X Y. The intersection of R and S is defined by,
(x, y) ϵ X Y : R S (x, y) = min (R(x, y), s(x, y))
Instead of the minimum, any T - Norm can be used.
AI&SC (MU-Sem. 7-Comp) 4-18 Fuzzy Logic
2. Union
The union of R and S is defined as,
(x, y) ϵ X Y : R ∪ S (x, y) = max (R(x, y), s(x, y)).
Instead of maximum, any S – norm can be used.
Given two relations R and S
y1 y2 y3 y1 y2 y3
x1 0.3 0.2 0.1 x1 0.4 0 0.1
R= x2 0.4 0.6 0.1 S= x2 1 0.2 0.8
x3 0.2 0.3 0.5 x3 0.3 0.2 0.4
R∪S =
ge
0.4 0.2
io led 0.1
1 0.6 0.8
0.3 0.3 0.5
Suppose a simple s - norm
ic ow
1 0.68 0.84
0.44 0.44 0.7
Te
This operation is more optimistic than the max operation. All the membership degrees are at least as high as in
the max operation.
Now, using min operation
RS =
0.3 0 0.1
0.4 0.2 0.1
0.2 0.2 0.4
ab
Suppose a simple T-norm T (a, b) = is used then,
a+b–ab
RS =
0.20 0 0.1
0.4 0.17 0.10
0.13 0.13 0.28
The above operation is more optimistic than the min operation. All the membership degrees are less than in the min
operation.
AI&SC (MU-Sem. 7-Comp) 4-19 Fuzzy Logic
3. Projection
The projection relation brings a ternary relation back to a binary relation, or a binary relation to a fuzzy set, or a
fuzzy set to a single crisp value.
ge
x3 is assigned the maximum of the third row.
Thus,
io led
1 0.8 1
Proj. R on X = + +
x1 x2 x3
Similarly
ic ow
4. Cylindrical Extension
The projection operation is almost always used in combination with cylindrical extension.
at
Pu ch
Cylindrical extension is more or less opposite of projection. It converts fuzzy set to a relation.
Ex : Consider a fuzzy set,
Te
ce (A) =
y1 y2 y3 y4
x1 1 1 1 1
x2 0.8 0.8 0.8 0.8
x3 1 1 1 1
Consider the fuzzy set
0.9 0.8 0.7 0.8
B = proj of R on X = + + +
y1 y2 y3 y4
ce (B) =
y1 y2 y3 y4
x1 0.9 0.8 0.7 0.8
x2 0.9 0.8 0.7 0.8
x3 0.9 0.8 0.7 0.8
AI&SC (MU-Sem. 7-Comp) 4-20 Fuzzy Logic
Let R, S and T be fuzzy relations defined on the universe X Y. Then, the properties of fuzzy relations are stated
below :
1. Commutativity
R∪S = S∪R
R∩S = S∩R
2. Associativity
R ∪ (S ∪ T) = (R ∪ S) ∪ T
R ∩ (S ∩ T) = (R ∩ S) ∩ T
3. Distributivity
ge
R ∪ (S ∩ T) = (R ∪ S) ∩ (R ∪ T)
R ∩ (S ∪ T) = (R ∩ S) ∪ (R ∩ T)
io led
4. Idempotency
R∪R = R
ic ow
R∩R = R
n
5. Identity
bl kn
R ∪R = R , R ∩R = R
R ∪ ER = ER , R ∩ ER = R
at
Pu ch
Where R and ER are null relation (null matrix) and complete relation (unit matrix of all 1s) respectively.
Te
6. Involution
––
R = R
7. De-Morgan’s law
––––– – –
R∩S = R ∪S
––––– – –
R∪S = R ∩S
8. Law of excluded middle and law of contradiction are not satisfied.
–
i.e. R ∪R ER
–
and R ∩R R
ge
max
OR R1 o R2 (x, z) = {R1(x, y) .R2(y, z)}
yY
The following are the properties of fuzzy composition. Assuming R, S and T are binary relations defined on X Y, Y Z
io led
and Z W respectively.
1. Associativity R o (S o T) (R o S) o T
Monotonicity S T R o S R o T
ic ow
2.
3. Distributivity R o (S ∪ T) (R o S) ∪ (R o T)
n
–1 –1 –1
4. Inverse (R o S) =S oR
bl kn
Q. Explain the different Fuzzy membership function. (Dec. 12, Dec. 14, 5/8 Marks)
Te
Q. Explain standard fuzzy membership functions. (May 12, Dec. 13, 8 Marks)
One way to represent a fuzzy set is by stating its Membership Function (MF). MFs can be represented using any
mathematical equation as per requirement or using one of the standard MFs available.
There are several different standard MFs available.
AI&SC (MU-Sem. 7-Comp) 4-22 Fuzzy Logic
ge
A decreasing MF is specified by two parameters (a, b) as follows :
1 ; xa
io led
L (x ; a, b) = (b – x)/(b – a) ; a x b
0 ; xb
ic ow
0
Pu ch
; axb
∧ (x ; a, b, c) =
(c – x)/(c – b) ; bxc
0
Te
; xa
(x – a)/(b – a)
0
; axb
Trapezoid (x ; a, b, c, d) = 1 ; bxc
ge
Gaussian (x ; c, ) = e
1
bell (x ; a, b, c) =
– c 2b
1+
x
a
at
Pu ch
Fig. 4.5.7 illustrates the effect of changing these parameters on the shape of the curve.
The bell MF is direct generalization of Cauchy distribution used in probability theory; so it is also referred to as the
Cauchy MF. The bell MF has more parameter than Gaussian MF, so it has more degree of freedom to adjust the
steepness at the crossover point.
ge
Although Gaussian and bell MFs achieves smoothness, they are unable to specify asymmetric MFs.
Asymmetric MFs
io led
Asymmetric and close MFs can be achieved by using either the absolute difference or the product of two sigmoidal
functions.
ic ow
Sigmoidal MFs
n
A sigmoidal MF is defined by,
bl kn
1
sig (x ; a, c) =
1 + exp [– a (x – c)]
at
Pu ch
Fuzzy logic handles the concept of partial truth, where the range of truth value is in between completely true and
completely false, that is in between 0 and 1. In other words , Fuzzy logic can be considered as multi-valued logic.
In other words, fuzzy logic replaces Boolean truth-values with some degree of truth.
The basic elements of fuzzy logic are fuzzy sets, linguistic variables and fuzzy rules.
Usually in mathematics, variables take numerical values whereas fuzzy logic allows the non-numeric linguistic
variables to be used to form the expression of rules and facts.
AI&SC (MU-Sem. 7-Comp) 4-25 Fuzzy Logic
The linguistic variables are words, specifically adjectives like “small,” “little,” “medium,” “high,” and so on. A fuzzy set
is a collection of couples of elements.
A linguistic variable is a variable whose values are words or sentences in a natural or artificial language.
Consider the variable X = “age”.
A linguistic variable (“age”) can assume different linguistic values such as “young”, “Middle aged”, “Mature” and “old”
in this case.
Then, ‘age’ can be considered as a linguistic variable whose values can be “young”, “Middle aged” “Mature” and “old”
and these values can be characterized by MFs young (x), middle aged (x) , mature (x)and old (x).
The universe of discourse is totally covered by these MFs (MFs for young, middle aged and old) and transition from
one MF to another is smooth and gradual.
ge
io led
ic ow
n
Fig 4.6.1 : Linguistic variable “age” as membership functions
bl kn
Fuzzy inference is the process of obtaining a new knowledge from an existing knowledge.
Pu ch
The basic rule of inference in traditional two-valued logic is modus ponens, according to which, we can infer the truth
of a proposition B from the truth of A and the implication A B.
Ex. : If A is identified with - “the tomato is red” and B with “the tomato is ripe” then if it is true that
“the tomato is red” it is also true that “the tomato is ripe”.
i.e. Premise 1 (fact) X is A
Premise 2 (rule) if X is A then Y is B
Consequence (conclusion) : Y is B
However, in most of the human reasoning, modus ponens is employed in an approximate manner.
For e.g. : if we have the same implication rule, “if the tomato is red, then it is ripe” and we know that,
“the tomato is more or less red” then we may infer that
“the tomato is more or less ripe”
. Premise 1 (fact) X is A
Premise 2 (rule) if X is A then Y is B
Conclusion Y is B
Where A is close to A and
B is close to B
AI&SC (MU-Sem. 7-Comp) 4-26 Fuzzy Logic
When A, B, A and B are fuzzy sets of appropriate universes, the inference procedure is called “approximate
reasoning” or fuzzy reasoning, it is also called Generalized Modus Ponens (GMP).
Definition : Approximate reasoning / fuzzy reasoning
Let A, A and B be fuzzy sets of X, X and Y respectively. Assume that the fuzzy implication A B is expressed as a fuzzy
relation R on X Y. Then the fuzzy set B induced by “x is A” (fact) and the fuzzy rule “if x is A then y is B” is defined by,
B (y) = max min [A (x), R (x, y)] = Vx [A (x) R (x, y)]
or = A (A B)
if x is A, y is B
ge
io led
Fig. 4.6.2 : Graphic interpretation of GMP using Mamdani’s fuzzy implication and max-min composition
ic ow
= w B(y)
Thus, we first find the degree of match w which is the maximum of A (x) A (x).
at
Pu ch
Consequence : z is C
w1
AI&SC (MU-Sem. 7-Comp) 4-27 Fuzzy Logic
w2
= (w1 w2) C (z)
firing strength
When w1 and w2 are the maxima of the MFs of A A and B B respectively.
Thus, w1 denotes the degree of compatibility between A and A, similarly for w2.
Since the antecedent parts of the fuzzy rule is constructed using and connective, w1w2 is called firing strength or
degree of fulfilment of the fuzzy rule.
The firing strength represents the degree to which the antecedent part of the rule is satisfied.
The MF of the resulting C is equal to the MF of clipped by the firing strength w (when w = w1w2)
ge
c. Multiple rules with multiple antecedents
io led
The GMP problem for multiple rules with multiple antecedents can be written as,
Consequence : z is C
at
Pu ch
Te
Fig. 4.6.4 : Fuzzy reasoning for multiple rules with multiple antecedents
AI&SC (MU-Sem. 7-Comp) 4-28 Fuzzy Logic
Here C1 and C2 are the inferred fuzzy sets for rule 1 and rule 2 respectively.
When a given fuzzy rule assumes the form “if x is A or y is B” then firing strength is given as the maximum of degree of
match on the antecedent part for a given condition.
Ex.
If x is A1 or y is B1 then z is C1.
Fig. 4.6.5
In the above example, because two antecedents are connected using or, we take maximum of w1 and w2 as a firing
ge
strength.
Since w2 w1, we take w2 as a firing strength and then we apply min implication operator on the output MF C1.
io led
4.7 Fuzzy Inference Systems
MU - May 12, May 13, Dec. 15
ic ow
Q. Explain the three types of Fuzzy Inference Systems in detail. (May 12, 10 Marks)
Q. Compare Mamdani and Sugeno fuzzy models. (May 13, 10 Marks)
n
bl kn
Fuzzy Inference System is the key unit of a fuzzy logic system. Fuzzy inference (reasoning) is the actual process of
at
Pu ch
ge
io led
Fig 4.7.1(b) : Fuzzy Inference using If-Then rules
3. Rule Base
4. Data Base
at
Pu ch
The basic function of the inference unit is to compute the overall value of the control output variable based on the
individual contribution of each rule in the rule base.
The output of the fuzzification module representing the crisp input is matched to each rule-antecedent.
The degree of match of each rule is established. Based on this degree of match, the value of the control output
variable in the rule-consequent is modified. The result is, we get the “clipped” fuzzy set representing the control
output variable.
The set of all clipped control output values of the matched rules represent the overall fuzzy value of control output.
3. A rule base
4. A database
Data Base defines the membership functions of the fuzzy sets used in the fuzzy rules.
AI&SC (MU-Sem. 7-Comp) 4-30 Fuzzy Logic
5. Defuzzification Unit
It performs defuzzification which converts the overall control output into a single crisp value.
The rule base and the database are jointly referred to as the knowledge base .
Working
The input to the FIS may be a Fuzzy or crisp value.
1. Fuzzification Unit converts the crisp input into fuzzy input by using any of the fuzzification methods.
2. The next, rule base is formed. Database and rule base are collectively called knowledgebase.
ge
3. Finally, defuzzification process is carried out to produce crisp output.
Methods of FIS io led
The most important two types of fuzzy inference method are :
1) Mamdani FIS
ic ow
2) Sugeno FIS
n
Mamdani fuzzy inference is the most commonly seen inference method. This method was introduced by Mamdani
bl kn
Another well-known inference method is the so- called Sugeno or Takagi–Sugeno–Kang method of fuzzy inference
Pu ch
process. This method was introduced by Sugeno (1985). This method is also called as TS method.
The main difference between the two methods lies in the consequent of fuzzy rules.
Te
1. Mamdani FIS
Mamdani FIS was proposed by Ebahim Mamdani in the year 1975 to control a steam engine and boiler combination.
To compute the output of this FIS given the inputs, six steps has to be followed.
1. Determining a set of fuzzy rules.
2. Fuzzifying the inputs using the input membership functions.
3. Combining the fuzzified inputs according to the fuzzy rules to establish a rule strength (Fuzzy Operations).
4. Finding the consequence of the rule by combining the rule strength and the output membership function
(implication).
5. Combining the consequences to get an output distribution (aggregation).
6. Defuzzifying the output distribution (this step is only if a crisp output (class) is needed).
In Mamdani FIS, The fuzzy rules are formed using IF-THEN statements and AND/OR connectives.
The consequent of the rule can be obtained in two steps.
o By computing the strength of each rule
o By clipping the output membership function at the rule strength.
AI&SC (MU-Sem. 7-Comp) 4-31 Fuzzy Logic
The outputs of all the fuzzy rules are then combined to obtain the aggregated fuzzy output. Finally, defuzzification is
applied on to the aggregated fuzzy output to obtain a crisp output value.
Consider two inputs, two rule Mamdani fuzzy inference system.
Assume two inputs are crisp value x and y.
Assume the following two rules :
Rule 1 : if x is A1 and y is B1 then z is C1
Rule 2 : if x is A2 and y is B2 then z is C2
Fig. 4.7.2 (a) shows Mamdani fuzzy inference system using min – max decomposition.
Fig. 4.7.2 (a) illustrates a procedure of deriving overall output z when presented with two crisp inputs x and y. In the
above Mamdani inference system, we have used min as T - norm and max as T - conorm operators.
The T-norm operator is used for inferencing antecedent part of the rule. And co-norm operator used to aggregate
ge
outputs resulting form of each rule.
Mamdani model also supports max - product composition to derive overall output z. Here the algebraic product is
io led
used as T-norm operator and max is used as T-conorm operator.
ic ow
n
bl kn
at
Pu ch
Te
Fig. 4.7.2 (a) : Mamdani fuzzy inference systems using max - min decomposition
AI&SC (MU-Sem. 7-Comp) 4-32 Fuzzy Logic
ge
io led
ic ow
Fig. 4.7.2 (b): Mamdani fuzzy inference systems using max - product decomposition
n
2. Takagi-Sugeno-Kang (TSK) FIS
bl kn
Takagi - Sugeno FIS was proposed by Takagi, Sugeno and Kang in the year 1985.
A typical fuzzy rule in TSK model has the form ,
at
Pu ch
Where c is a constant
In this case the output of each fuzzy rules is a constant and hence the overall output is obtained via weighted average
method.
The output level zi of each rule is weighted by the firing strength wi of the rule.
ge
io led
Fig. 4.7.3 : Reasoning in Sugeno FIS
ic ow
Aggregation and Defuzzification Procedure : The difference between them also lies in the consequence of fuzzy rules
and due to the same their aggregation and defuzzification procedure also differs.
at
Pu ch
Mathematical Rules : More mathematical rules exist for the Sugeno rule than the Mamdani rule.
Adjustable Parameters : The Sugeno controller has more adjustable parameters than the Mamdani controller.
Te
Fig. 4.7.4
AI&SC (MU-Sem. 7-Comp) 4-34 Fuzzy Logic
1. Intuition
As the name suggest, this method is based upon the common intelligence of human. The human develops
membership functions based on their own understanding capability.
As shown in Fig. 4.7.5 each triangular curve is a membership function corresponding to various fuzzy (linguistic)
variables such as cold, coal, warm etc.
ge
2. Inference
io led
In inference method we use knowledge to perform deductive reasoning. To deduce or infer a conclusion, we use the
facts and knowledge on that particular problem. Let us consider the example of Geometric shapes for the
identification of a triangle.
ic ow
The membership values for five types of triangle can be defined as,
1
R (A, B, C) = 1 – | A – 90 |
90
1
I (A, B, C) = 1 – min {(A – B), (B – C)}
60
1
E (A, B, C) = 1 – |A–C|
80
I R (A, B, C) = I ∩ R (A, B, C)
= min {I (A, B, C), R (A, B, C)}
– –
– ∩I ∩
T (A, B, C) = (R ∪ I ∪) = R
AI&SC (MU-Sem. 7-Comp) 4-35 Fuzzy Logic
Ex :
1 1 1
= min
9 4 4
ge
1
=
4
3. Rank ordering
io led
In rank ordering method, preferences are assigned by a single individual, committee, a poll and other opinion methods
can be used to assign membership values to fuzzy variables.
ic ow
Here the preferences are determined by pair wise comparisons and they are used to determine ordering of the
n
membership.
bl kn
Example :
at
Let’s suppose 1000 people respond to a questionnaire and their pair wise preferences among the colors red, orange,
Pu ch
Angular fuzzy sets differ from normal fuzzy sets only in their coordinate description.
Angular fuzzy sets are defined on a universe of angles; hence they are of repeating shapes for every 2 cycles.
Angular fuzzy sets are used in the quantitative description of the linguistic variables, which are known as “truth
values”.
Example :
Let’s consider that pH values of water samples are taken from a contaminated pond. We know that,
If Pn value is 7 means it’s a neutral solution.
AI&SC (MU-Sem. 7-Comp) 4-36 Fuzzy Logic
Levels of Pn between 14 and 7 are labelled as Absolute Basic (AB), Very Basic (VB), Basic (B), Fairly Basic (FB),
Neutral (N) drawn from = 2 to = – 2
Levels of Pn between 7 to 0 are called neutral, Fairly Acidic (FA), Acidic (A), Very Acidic (VA), Absolutely Acidic (AA), are
drawn from = 0 to = – 2 .
Linguistic values vary with and their membership values are given by equation,
E = t tan
ge
io led
ic ow
n
bl kn
at
Pu ch
Te
4.7.3 Defuzzification
MU - Dec. 12, Dec. 14
Q. Explain any four defuzzification methods with suitable example. (Dec. 12, 10 Marks)
Q. Explain different methods of defuzzification. (Dec. 14, 10 Marks)
ge
Fig. 4.7.7 : Defuzzification methods
io led
1. Max-membership principle / Height method
This method is limited to peak output functions. It uses the individual clipped or scaled central outputs.
ic ow
n
bl kn
at
Pu ch
Te
c (x) x dx n
x* = c (xi) xi
c (x) dx x* =
i=1
n
c (xi)
i=1
AI&SC (MU-Sem. 7-Comp) 4-38 Fuzzy Logic
ge
The idea is to consider the contribution of the area of each output membership curve.
In contrast, the centre of area/gravity method considers the union of all output fuzzy sets.
io led
In COS method, we take overlapping areas. If such overlapping areas exist, they are reflected more than once.
ic ow
n
bl kn
at
Pu ch
The weighted average method is formed by weighting each membership function in the output by its respective
maximum membership value.
The two functions shown in Fig. 4.7.11 would result in the following general form of defuzzification.
(a 0.5) + (b 0.9)
ge
x* =
0.5 + 0.9
The centre of largest area method is used when the combined output fuzzy set is non-convex i.e. it consists of at
least two convex fuzzy subsets.
Then the method determines the convex fuzzy subset with the largest area and defines crisp output value x* to
be the centre of area of the largest fuzzy subset.
cm (x) x dx
x* =
cm (x) dx
Where, cm is the convex fuzzy subset that has the largest area.
AI&SC (MU-Sem. 7-Comp) 4-40 Fuzzy Logic
This method uses the overall output (i.e. union of all individual output MF).
First of maxima is determined by taking the smallest value of the domain with maximized membership degree.
ge
Last of maxima is determined by taking the greatest value of the domain with maximized membership degree.
io led
ic ow
n
bl kn
8. Bisector method
This method uses the vertical line that divides the region into two equal areas as shown in Fig. 4.7.15 This line is called
Te
bisector.
1. Fuzzification module :
Te
(a) Normalization
This block performs a scale transformation which maps the physical values of input variables in to a normalized
universe of discourse. This block is optional. If a non-normalized domain is used then this block is not required.
(b) Fuzzification
This block performs a fuzzification which converts a crisp input in to a fuzzy set. Here we need to decide the proper
fuzzification strategy.
The basic function of the inference engine is to compute the overall value of the control output variable based on the
individual contribution of each rule in the rule base.
The output of the fuzzification module representing the crisp input is matched to each rule-antecedent.
The degree of match of each rule is established. Based on this degree of match, the value of the control output
variable in the rule-consequent is modified. The result is, we get the “clipped” fuzzy set representing the control
output variable.
The set of all clipped control output values of the matched rules represent the overall fuzzy value of control output.
AI&SC (MU-Sem. 7-Comp) 4-42 Fuzzy Logic
3. Defuzzification module :
(a) Defuzzification
It performs defuzzification which converts the overall control output into a single crisp value.
This block maps the crisp value of the control output into the physical domain. This block is optional. It is used only if
normalization is performed during the fuzzification phase.
The knowledge base basically consists of a database and a rule base.
The database provides the necessary information for proper functioning of the fuzzification module, the rule base
and the defuzzification module.
The information in the database includes :
ge
o Fuzzy MFs for the input and output control variables
o The physical domains of the actual problems and their normalized values along with the scaling factors.
io led
4.8.1 Steps in Designing FLC
ic ow
1. Identification of variables : Here, the input, output and state variables must be identified of the plant which is under
consideration.
2. Fuzzy subset configuration : The universe of information is divided into number of fuzzy subsets and each subset is
assigned a linguistic label. Always make sure that these fuzzy subsets include all the elements of universe.
3. Obtaining membership function : Now obtain the membership function for each fuzzy subset that we get in the above
step.
4. Fuzzy rule base configuration : Now formulate the fuzzy rule base by assigning relationship between fuzzy input and
output.
5. Fuzzification :The fuzzification process is initiated in this step.
AI&SC (MU-Sem. 7-Comp) 4-43 Fuzzy Logic
6. Combining fuzzy outputs : By applying fuzzy approximate reasoning, locate the fuzzy output and merge them.
7. Defuzzification : Finally, initiate defuzzification process to form a crisp output
ge
There is no systematic approach to fuzzy system designing.
They are understandable only when simple.
io led
They are suitable for the problems which do not need high accuracy.
Ex. 4.9.1 : Model the following as fuzzy set using suitable membership function. “Numbers close to 6”.
n
MU - Dec. 12, Dec. 13, Dec. 14,/6 Marks
bl kn
X = Integers
Pu ch
x A (x)
2 0.05
3 0.1
4 0.2
5 0.5
6 1
7 0.5
8 0.2
9 0.1
10 0.05
Ex. 4.9.2 : Model the following fuzzy set using the suitable fuzzy membership function “Number close to 10”.
Soln. :
X = Integers
Fig. P. 4.9.2 shows the plot of degree of membership for each element.
Table P. 4.9.2 : x and corresponding A (x)
X A (x)
ge
5 0.03
io led 6 0.05
7 0.1
8 0.2
9 0.5
ic ow
10 1
11 0.5
n
bl kn
12 0.2
13 0.1
at
Pu ch
14 0.05
15 0.03
Te
Ex. 4.9.3 : Model the following fuzzy set using suitable membership function. “Integer number considerably larger
than 6”.
Soln. :
Here universe of discourse is set of all integer numbers.
X = Integers
Then fuzzy set for “Number considerably larger than 6” can be defined as.
1
A (x) = 1
1+ 2
(x – 6)
Fig. P.4.9.3 shows the plot of x A(x)
So membership function for “Number considerably larger than 6” is defined as
x6
0
=
1
A (x) x6
1
1+
(x – 6)
2
AI&SC (MU-Sem. 7-Comp) 4-45 Fuzzy Logic
x A (x)
6 0
7 0.5
8 0.8
9 0.90
10 0.94
11 0.96
12 0.97
13 0.98
ge
Fig. P. 4.9.3 : Plot of x A (x)
io led
Ex. 4.9.4 : Determine all -level sets and strong -level sets for the following fuzzy set
A = {(1, 0.2), (2, 0.5), (3, 0.8), (4, 1), (5, 0.7), (6, 0.3)} MU - Dec. 13, Dec. 15, 5/6 Marks
Soln. :
ic ow
A0.3 = {2, 3, 4, 5, 6}
at
A0.5 = {2, 3, 4, 5}
Pu ch
A0.7 = {3, 4, 5}
Te
A0.8 = {3, 4}
A1 = {4}
Following are strong -level sets.
A0.2 = {2, 3, 4, 5, 6}
A0.3 = {2, 3, 4, 5}
A0.5 = {3, 4, 5}
A0.7 = {3, 4}
A0.8 = {4}
A1 =
Ex. 4.9.5 : Find out all -level sets and strong -level sets for the following fuzzy set
A = {(3, 0.1), (4, 0.2), (5, 0.3), (6, 0.4), (7, 0.6), (8, 0.8), (10, 1), (12, 0.8), (14, 0.6)}
Soln. :
-level sets
A0.1 = {3, 4, 5, 6, 7, 8, 10, 12, 14}
A0.2 = {4, 5, 6, 7, 8, 10, 12, 14}
AI&SC (MU-Sem. 7-Comp) 4-46 Fuzzy Logic
ge
A0.8 = {10}
A1 =
io led
Ex. 4.9.6 : A realtor wants to classify the houses he offers to his clients. One indicator of comfort of these houses is the
number of bedrooms in them. Let the available types of houses be represented by the following set.
ic ow
U = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10}
The houses in this set are specified by the number of bedrooms in a house. Describe comfortable house for
n
4-person family” using a fuzzy set.
bl kn
Soln. :
at
The fuzzy set for “comfortable type of house for a 4-person family” may be described as,
Pu ch
A = {(1, 0.2), (2, 0.5), (3, 0.8), (4, 1), (5, 0.7), (6, 0.3)}
Te
Ex. 4.9.7 : Assume A = “x considerably larger than 10” and B = “ x approximately 11” characterized by
A = {x, A (x) | x X} Draw the plot for both the sets and show A B and A B in a plot.
Soln. :
Fuzzy set A can be defined as,
x 10
0
A (x) = 1
1
x 10
1+
(x – 10)
2
Set B can be defined as,
1
B (x) =
ge
2
1 + (x – 11)
Then,
io led
min [(1 + (x – 10) ) (1 + (x – 11) ) ] x 10
–2 –1 2 –1
That is, intersection operation on fuzzy set A and B represents a new fuzzy set “x considerably larger than 10 and
n
approximately 11”.
bl kn
and
at
Pu ch
A B (x) = {max [(1 + (x – 10)– 2)– 1, (1 + (x – 11)2)– 1], xX
Te
Fig. P. 4.9.7 : Plot of A B
AI&SC (MU-Sem. 7-Comp) 4-48 Fuzzy Logic
ge
Fig. P. 4.9.7(a) : Plot of A B
Ex. 4.9.8 : Model the following as fuzzy set using trapezoidal membership function “Number close to 10”.
io led
Soln. :
xa
(x – a) / (b – a)
0
axb
n
(d – x)/1(d – c)
bl kn
Trapezoid (x ; a, b, c, d) = bxc
0 cxd
at
Pu ch
x > 15
x5
(x – 5) / 3
0
5x8
Trapezoid (x ; 5, 8, 12, 15) =
(15 –1x)/ 3 8 x 12
0 12 x 15
x > 15
Ex. 4.9.9 : Let A = {a1, a2} , B = {b1, b2, b3}, C = {c1, c2}. Let R be a relation from A to B defined by matrix.
b1 b2 b3
a1 0.4 0.5 0
a2 0.2 0.8 0.2
b1 0.2 0.7
b2 0.3 0.8
b3 1 0
ge
Find : (1) Max-min composition of R and S. (2) Max-product composition of R and S.
Soln. : io led
(1) Max-min composition
T c1 c2
ic ow
T (a1, c1) = max (min (0.4, 0.2), min (0.5, 0.3), min (0, 1) )
T (a1, c2) = max (min (0.4, 0.7), min (0.5, 0.8), min (0, 0) )
Te
T c1 c2
Ex. 4.9.10 : High speed rail monitoring devices sometimes make use of sensitive sensors to measure the deflection of the
earth when a rail car passes. These deflections are measured with respect to some distance from the rail car
and, hence are actually very small angles measured in micro-radians. Let a universe of deflection be
A = [1, 2, 3, 4] where A is the angle in micro-radians, and let a universe of distance be D = [1, 2, 5, 7] where D
is distance in feet, suppose a relation between these two parameters has been determined as follows :
R = D1 D2 D3 D4
A1 1 0.3 0.1 0
ge
io led A3 0 0.7 1 0.2
A4 0 0.1 0.4 1
Now let a universe of rail car weights be W = [1, 2], where W is the weight in units of 100,000 pounds.
ic ow
S = W1 W2
n
bl kn
A1 1 0.4
A2 0.5 1
at
Pu ch
A3 0.3 0.1
Te
A4 0 0
T
Using these two relations, find the relation R oS = T.
A1 A2 A3 A4 W1 W2
D1 1 0.2 0 0 A1 1 0.4
T
R = D2 0.3 1 0.7 0.1 and S = A2 0.5 1
D4 0 0.1 0.2 1 A4 0 0
AI&SC (MU-Sem. 7-Comp) 4-51 Fuzzy Logic
W1 W2
D1 1 0.4
T= D2 0.5 1
D3 0.3 0.3
D4 0.2 0.1
ge
T (D3, W1) = max (0.1, 0.3, 0.3, 0) = 0.3
T (D3, W2) = max (0.1, 0.3, 0.1, 0) = 0.3
io led
T (D4, W1) = max (0, 0.1, 0.2, 0) = 0.2
T (D4, W2) = max (0, 0.1, 0.1, 0) = 0.1
ic ow
W1 W2
n
bl kn
D1 1 0.4
T= D2 0.5 1
at
Pu ch
D3 0.3 0.3
D4 0.06 0.1
Te
Ex. 4.9.11 : Model the following fuzzy set using trapezoidal membership function, “Middle age”. MU - May 13, 5 Marks
Soln. :
0 60 x 70
x > 70
ge
io led
ic ow
n
bl kn
at
Ex. 4.9.12 : Represent the set of old people as a fuzzy set using appropriate membership function.
Te
Soln. :
Let X = (0,120) set of all possible ages.
0 0 x 60
old (x) = (x – 60)/20 60 x 80
1 x 80
AI&SC (MU-Sem. 7-Comp) 4-53 Fuzzy Logic
Ex. 4.9.13 : Develop graphical representation of membership function to describe linguistic variables “cold”, “warm” and
“hot”. The temperature ranges from Oc to 100c. Also show plot for “cold and warm” and “warm or hot”
temperature.
Soln. :
ge
io led
ic ow
n
Fig. P. 4.9.13 (a) : MF for “cold and warm”
bl kn
A= {0.11 +0.22 + 0.33 }
B = {0.61 + 0.52 + 0.43 + 0.54 }
Perform following operations on A and B .
(1) Union
(2) Intersection
Soln. :
1. Union
0.6 0.5 0.4 0.5
AB = { + + + }
1 2 3 4
2. Intersection
0.1 0.2 0.3
AB = { + + }
1 2 3
3. Set difference
A|B = A B
0.1 0.2 0.3
A = { + + }
1 2 3
ge
0.4 0.5 0.6 0.5
B = { + + + }
io led 1 2 3 4
0.1 0.2 0.3
AB = { + + }
1 2 3
= BA
ic ow
B|A
0.6 0.5 0.4 0.5
B = { + + + }
n
1 2 3 4
bl kn
0.9 0.8 0.7
A = { + + }
1 2 3
at
Pu ch
0.6 0.5 0.4
B A = { + + }
1 2 3
Te
A = {0.91 + 0.82 + 0.73 + 14 }
AI&SC (MU-Sem. 7-Comp) 4-55 Fuzzy Logic
B = {0.41 + 0.52 + 0.63 + 0.54 }
AB = {0.41 + 0.52 + 0.63 + 0.54 } ...(2)
ge
R.H.S. : A B
{0.91 + 0.82 + 0.73 + 14 }
io led AB = ...(2)
Ex. 4.9.15 : For the given membership functionas shown in Fig. P.4.9.15, determine the defuzzified output value by any
two methods. MU - May 13, 10 Marks
n
bl kn
at
Pu ch
Te
Fig. P. 4.9.15
Soln. :
In weighted average method we first find the centre of each individual furzy set
Centre of A1 = 2.5 Centre of A2 = 4.0
Next we find the membership value at the centre
Membership value of centre of A1 = 0.7
Membership value of centre of A2 = 1
(0.7 * 2.5) + (1 4.0) 1.75 + 4
x* = = = 3.38
0.7 + 1 1.7
(1 + 3) * 0.7
Therefore, Area of A1 = 2 = 1.4
(2 + 4) * 1 6
Similarly, Area of A2 = 2 = 2 =3
Ex. 4.9.16 : Consider three fuzzy sets C1 , C2 and C3 given below. Find defuzzified value using :
ge
(1) mean of max (2) centroid
Pu ch
Fig. P. 4.9.16 (a) : Fuzzy set C2
Fig. P. 4.9.16(b) : Fuzzy set C3
AI&SC (MU-Sem. 7-Comp) 4-57 Fuzzy Logic
Soln. :
Fig. P. 4.9.16(c) : Aggregated fuzzy set of C1 , C2 and C3
ge
Since C3 is the maximizing MF, we take the mean (average) of all the elements having maximum membership value in
C3
io led
6 + 7 13
x* = = = 6.5
2 2
ic ow
z* =
(z)dz
at
1 3.6 4 5.5 6 7 8
Pu ch
1 3.6 4 5.5 6 7 8
(0.3z) dz +
(0.3) dz + z – 3
dz + (0.5) dz + (z – 5) dz + dz + (8 – z) dz
0 1 3.6
2 4 5.5 6 7
= 4.9
3) Using centre of sum method
ge
Find membership values of these centres.
io led
Membership value of centre of C1 = 0.3
Membership value of centre of C2 = 0.5
ic ow
x* = 5.146
at
Pu ch
{ 0.11 + 0.32 + 0.83 + 14 + 51 + 0.86 }
Te
A=
Find core and support of fuzzy set A
Soln. :
Core of A = { 4, 5 } Membership value equal to 1
Support of A = {1, 2, 3, 4, 5, 6} Membership value > 0
A = { 0.21 + 0.32 + 0.43 + 0.54 } B = { 0.11 + 0.22 + 0.23 + 14 }
Find : 1) Algebraic sum 2) Algebraic product
1) Algebraic sum
2) Algebraic product
=
0.02 0.06 0.08 0.5
1 + 2 + 3 + 4
3) Bounded sum
ge
4) Bounded difference
+ +
x
1 x2 x3 y1 y2
at
Soln. :
0.3 0.4 0.3 0.9 0.7 0.4 0.7 0.9 1 0.4 1 0.9
AB = + + + + +
(x
1 1 y ) (x 1 y 2 ) (x2 y1 ) (x 2 y 2 ) (x 3 y 1 ) (x 3 y2 )
Temperature ranges are 130 F to 140 F and pressure limit is 400 psi to 1000 psi. Find the following
membership functions :
ge
0 0.36 0.64 0.84 0.96 1
= 131 + 132 + 133 + 134 + 135 + 136
io led
2. Temperature not very high
2
SVery high = high
0 0.04 0.16 0.36 0.64 1
Temp very high = 134 + 135 + 136 + 137 + 138 + 139
ic ow
n
Temp not very high = 1 – very high
bl kn
1/2
Pressure slightly high = (high pressure)
0.1 0.2 0.4 0.6 0.8 1
= 400 + 600 + 700 + 800 + 900 + 1000
0.31 0.44 0.63 0.77 0.89 1
= + + + + +
400 600 700 800 900 1000
4. Pressure very very high
4
Very very high = (high)
4
Pressure very very high = (high pressure)
0.0001 0.0016 0.025 0.12 0.40 1
= 400 + 600 + 700 + 800 + 900 + 1000
Ex. 4.9.21 : Two fuzzy relations are given by
x1
y1 y2 R = 0.6 0.3
x2 0.2 0.9
y1
z1 z2 z3 S = 1 0.5 0.3
y2 0.8 0.4 0.7
Obtain fuzzy relation T as a max-min composition and max-product composition between the fuzzy relations.
MU-May 16, 10 Marks
AI&SC (MU-Sem. 7-Comp) 4-61 Fuzzy Logic
Soln. :
ge
T(x2, z1) = max (min (0.2, 1) , min (0.9, 0.8) )
= max (0.2, 0.8)
io led
= 0.8
T(x2, z2) = max (min (0.2, 0.5) , min (0.9, 0.4) )
ic ow
= 0.7
X1 0.6 0.5 0.3
T = ROS
Te
Ex. 4.9.22 : Given two fuzzy relations R1 and R2 defined on X Y and Y Z respectively, where
X = {1, 2, 3}, Y = {, , , } and z = {a, b}. Find :
1. Max - min composition.
ge
2. Max - product composition.
0.1 0.9
0.1 0.2 0.3 0.5
io led 0.2 0.3
R1 = 0.4 0.3 0.7 0.9 R2 =
0.5 0.6
0.6 0.1 0.8 0.2
0.7 0.3
ic ow
Soln. :
at
Pu ch
1
0.1 0.2 0.3 0.5
R1 = 2 0.4 0.3
0.7 0.9
3
Te
So, a b
0.1 0.9
R2 =
0.2 0.3
0.5 0.6
0.7 0.3
So composition of R1 and R2 will be defined on X Z where X = {1, 2, 3} and Z = {a, b}.
a b
1
0.5 0.3
R1 o R2 = 2 0.7 0.6
3
0.5 0.6
We compute each element of R1o R2 as follows :
R1 o R2(1, a) = max (min (0.1,0.1), min (0.2,0.2), min(0.3,0.5), min (0.5,0.7))
AI&SC (MU-Sem. 7-Comp) 4-63 Fuzzy Logic
ge
R1 o R2 (3, a) = max (min (0.6, 0.1), min (0.1, 0.2), min (0.8, 0.5), min (0.2, 0.7))
= max (0.1, 0.1, 0.5, 0.2)
io led
= 0.5
R1 o R2 (3,b) = max (min (0.6, 0.9), min (0.1, 0.3), min (0.8, 0.6), min (0.2, 0.3))
ic ow
a b
at
Pu ch
1
0.35 0.18
R1 o R2 = 2 0.63 0.42
3
Te
0.40 0.54
R1 o R2 (1, a) = max (0.1 0.1, 0.2 0.2, 0.3 0.5, 0.5 0.7)
= max (0.01, 0.04, 0.15, 0.35)
= 0.35
R1 o R2 (1, b) = max (0.1 0.9, 0.2 0.3, 0.3 0.6, 0.5 0.3)
= max (0.09, 0.06, 0.18, 0.15)
= 0.18
R1 o R2 (2, a) = max (0.4 0.1, 0.3 0.2, 0.7 0.5, 0.9 0.7)
= max (0.04, 0.06, 0.35, 0.63)
= 0.63
R1 o R2 (2, b) = max (0.4 0.9, 0.3 0.3, 0.7 0.6, 0.9 0.3)
= max (0.36, 0.09, 0.42, 0.27)
= 0.42
R1 o R2 (3, a) = max (0.6 0.1, 0.1 0.2, 0.8 0.5, 0.2 0.7)
= max (0.06, 0.02, 0.40, 0.14)
AI&SC (MU-Sem. 7-Comp) 4-64 Fuzzy Logic
= 0.40
R1 o R2 (3, b) = max (0.6 0.9, 0.1 0.3, 0.8 0.6 0.2 0.3)
= max (0.54, 0.03, 0.48, 0.06) = 0.54
Ex. 4.9.23 : Let R be the relation that specifies the relationship between the ‘color of a fruit’ and ‘grade of maturity’.
Relation S specifies the relationship between ‘grade of maturity’ and ‘taste of a fruit’, where color, grade and
taste of a fruit are characterized by crisp sets x, y, z respectively as follows.
Consider following relations R and S and find the relationship between ‘color and taste’ of a fruit using
ge
1. Max - min composition 2. Max - product composition
T (green, sour) = max (min (1, 1), min (0.5, 0.7), min (0, 0))
= max (1, 0.5, 0)
= 1
T (green, tasteless) = max (min (1, 0.2), min (0.5, 1), min (0, 0.7))
= max (0.2, 0.5, 0)
= 0.5
T (green, sweet) = max (min (1, 0), min (0.5, 0.3), min (0, 1))
= max (0, 0.3, 0)
= 0.3
T (yellow, sour) = max (min (0.3, 1), min (1, 0.7), min (0.4, 0.7))
= max (0.3, 0.7, 0.4)
= 0.7
T (yellow, tasteless) = max (min (0.3, 0.2), min (1, 1), min (0.4, 0.7))
AI&SC (MU-Sem. 7-Comp) 4-65 Fuzzy Logic
ge
= max (0, 0.2, 1)
= 1
io led
2. Max - product composition
= 1
T (green, tasteless) = max (1 0.2, 0.5 1, 0 0.7)
= max (0.2, 0.5, 0)
= 0.5
T (green, sweet) = max (1 0, 0.5 0.3, 0 1)
= max (0, 0.15, 0)
= 0.15
T (yellow, sour) = max (0.3 1, 1 0.7, 0.4 0.7)
= max (0.3, 0.7, 0.28)
= 0.7
T (yellow, tasteless) = max (0.3 0.2, 1 1, 0.4 0.7)
= max (0.06, 1, 0.28)
= 1
T (yellow, sweet) = max (0.3 0, 1 0.3, 0.4 1)
= max (0, 0.3, 0.4)
AI&SC (MU-Sem. 7-Comp) 4-66 Fuzzy Logic
= 0.4
T (red, sour) = max (0 1, 0.2 0.7, 1 0)
= max (0, 0.14, 0)
= 0.14
T (red, tasteless) = max (0 0.2, 0.2 1, 1 0.7)
= max (0, 0.2, 0.7)
= 0.7
T (red, sweet) = max (0 0, 0.2 0.3, 1 1)
= max (0, 0.06, 1)
= 1
ge
Note : All the problems based on controller design have been solved using Mamdani Fuzzy Inference model
io led
and mean of max defuzzification method.
Ex. 4.10.1 : Design a fuzzy controller to regulate the temperature of a domestic shower. Assume that:
n
(a) The temperature is adjusted by single mixer tap.
bl kn
The design should clearly mention the descriptors used for fuzzy sets and control variables, set of rules to
generate control action and defuzzification. The design should be supported by figures where ever possible.
Te
Soln. :
Step 1 : Identify input and output variables and decide descriptors for the same.
Here input is the position of mixer tap. Assume that position of mixer tap is measured in degrees (0 to 180). It
represents opening of the mixer tap in degrees. 0 indicates tap is closed and 180indicates tap is fully opened.
Output is temperature of water according to the position of mixer tap. It is measured inC. We take five descriptors
for each input and output variables.
Descriptors for input variable (position of mixer tap) are given below.
EL - Extreme Left
L - Left
C - Centre
R - Right
ER - Extreme Right
{ EL, L, C, R, ER }
Descriptors for output variable (Temperature of water) are given below :
VCT - Very Cold Temperature
CT - Cold Temperature
AI&SC (MU-Sem. 7-Comp) 4-67 Fuzzy Logic
WT - Warm Temperature
HT - Hot Temperature
VHT - Very Hot Temperature
{ VCT, CT, WT, HT, VHT }
Step 2 : Define membership functions for input and output variables.
We use triangular MFs because of its simplicity.
1. Membership functions for input variable – position of mixer tap.
ge
Fig. P. 4.10.1 : Membership function for position of mixer tap
io led 45 – x
EL (x) = , 0 x 45
45
x
0 x 45
45
90 – x
ic ow
L (x) =
45 45 x 90
n
45 45 x 90
bl kn
x – 45
c (x) = 135 – x
45 90 x 135
at
Pu ch
45 90 x 135
x – 90
Te
R (x) = 180 – x
45 135 x 180
x – 135
ER (x) = , 135 x 180
45
2. Membership functions for output variable - temperature of water.
y
0 y 10
10
CT (y) = 40 – y
30 10 y 40
AI&SC (MU-Sem. 7-Comp) 4-68 Fuzzy Logic
30 10 y 40
y – 10
WT (y) = 80 – y
40 40 y 80
40 40 y 80
y – 40
Input Output
(Mixer tap position) (Temperature of water)
ge
io led EL VCT
L CT
C WT
ic ow
R HT
ER VHT
n
bl kn
We can read the rule base shown in Table P. 4.10.1 in terms of If-then rules.
Rule 1 : If mixer tap position is EL (Extreme Left) then temperature of water is VCT (Very Cold Temperature).
at
Pu ch
= max
1 2 2
33=3
Thus, Rule 3 has the maximum strength.
According to Rule 3, If mixer tap position is C (center) then water temperature is Warm. So, we use Output MFs of
warm water temperature for defuzification. We have following two equations for warm water temperature.
y – 10
WT (y) = and
30
80 – y
WT (y) =
40
2 2
ge
Since, the strength of rule 3 is 3 , substitute WT (y) = 3 in the above two equations.
y – 10 2
= y = 30
io led 30 3
80 – y 2
= y = 53
40 3
ic ow
Ex. 4.10.2 : Design a controller to determine the wash time of a domestic washing machine. Assume that input is dirt and
Pu ch
grease on cloths. Use three descriptors for input variables and five descriptors for output variables. Derive set
of rules for controller action and defuzzification. The design should be supported by figures wherever possible.
Te
Show that if the cloths are soiled to a larger degree the wash time will be more and vice-versa.
MU - May 12, Dec. 12, Dec. 13, Dec. 14, 20 Marks
Soln. :
Step 1 : Identify input and output variables and decide descriptors for the same.
Here inputs are ‘dirt’ and ‘grease’. Assume that they are measured in percentage (%). That is amount of dirt and
grease is measured in percentage.
Output is ‘wash time’ measured in minutes.
We use three descriptors for each of the input variables.
Descriptors for dirt are as follows :
SD - Small Dirt
MD - Medium Dirt
LD - Large Dirt
{ SD, MD, LD }
Descriptors for grease are { NG, MG, LG }
NG - No Grease
MG - Medium Grease
AI&SC (MU-Sem. 7-Comp) 4-70 Fuzzy Logic
LG - Large Grease
We use five descriptors for output variable.
So, descriptors for wash time are {VS, S, M, L, VL}
VS - Very Short
S - Short
M - Medium
L - Large
VL - Very Large
Step 2 : Define membership functions for each of the input and output variables.
ge
io led
ic ow
x
0 x 50
50
MD (x) = 100 – x
at
Pu ch
50 50 x 100
x – 50
Te
50
y
0 y 50
MG (y) = 100 – y
50 50 y 100
y – 50
LG (y) = , 50 y 100
50
AI&SC (MU-Sem. 7-Comp) 4-71 Fuzzy Logic
z
0 z 10
10
S (z) = 25 – z
15 10 < z 25
ge
15
z – 10
10 z 25
40 – z
io led M (z) =
15 25 < z 40
15
z – 25
ic ow
25 z 40
L (z) = 60 – z
20
n
40 < z 60
bl kn
z – 40
VL (z) = , 40 z 60
20
at
Pu ch
SD VS M L
MD S M L
LD M L VL
The above matrix represents in all nine rules. For example, first rule can be “If dirt is small and no grease then wash
time is very short” similarly all nine rules can be defined using if --- then.
Step 4 : Rule Evaluation
Assume that dirt = 60 % and grease = 70%
dirt = 60 % maps to the following two MFs of “dirt” variable
100 – x x – 50
MD(x) = and LD (x) =
50 50
Similarly grease = 70 % maps to the following two MFs of “grease” variable.
100 – y y – 50
MG (y) = and LG (y) =
50 50
Evaluate MD (x) and LD (x) for x = 60, we get
100 – 60 4
MD (60) = = … (1)
50 5
AI&SC (MU-Sem. 7-Comp) 4-72 Fuzzy Logic
60 – 50 1
LD (60) = = …(2)
50 5
Similarly evaluate MG (y) and LG (y) for y = 70, we get
100 – 70 3
MG (70) = = …(3)
50 5
70 – 50 2
LG (70) = = … (4)
50 5
The above four equation leads to the following four rules that we are suppose to evaluate.
(1) Dirt is medium and grease is medium.
(2) Dirt is medium and grease is large.
(3) Dirt is large and grease is medium.
(4) Dirt is large and grease is large.
Since the antecedent part of each of the above rule is connected by and operator we use min operator to evaluate
strength of each rule.
ge
Strength of rule 1: S1 = min (MD (60), MG (70) ) = min (4/5, 3/5) = 3/5
Strength of rule 2: S2 = min (MD (60), LG (70) ) = min (4/5, 2/5) = 2/5
io led
Strength of rule 3: S3 = min (LD (60), MG (70) ) = min (1/5, 3/5) = 1/5
Strength of rule 4: S4 = min (LD (60), LG (70) ) = min (1/5, 2/5) = 1/5
ic ow
n
bl kn
at
Pu ch
Step 5 : Defuzzification
Since, we use “Mean of max” defuzzification technique, we first find the rule with the maximum strength.
= Max (S1, S2, S3, S4)
= Max (3/5, 2/5, 1/5, 1/5)
= 3/5
This corresponds to rule 1.
This rule 1 – “dirt is medium and grease is medium” has maximum strength 3/5.
The above rule corresponds to the output MF M (z). This mapping is shown in Fig. P. 4.10.2(c).
To find out the final defuzzified value, we now take average (mean) of M (z).
z – 10 40 – z
M (z) = and M (z) =
15 15
z – 10 40 – z
3/5 = 3/5 =
15 15
Z = 19 z = 31
19 + 31
z =
2
= 25 min
AI&SC (MU-Sem. 7-Comp) 4-73 Fuzzy Logic
ge
io led
ic ow
n
bl kn
at
Pu ch
Te
Ex. 4.10.3 : Design a fuzzy controller to control the feed amount of purifier for the water purification plant. Raw water is
purified by injecting chemicals. Assume input as water temperature and grade of water. Output as amount of
purifier. Use three descriptors for input and output variables. Design rules to control action and
defuzzification. Design should be supported by figures whenever necessary. Clearly indicate that when
temperature is low, grade is low then chemical used is in large amount. MU - May 13, 20 Marks
Soln. :
Step 1 : Identify input and output variables and decide descriptors for the same.
Here input variables are water temperature and grade of water.
AI&SC (MU-Sem. 7-Comp) 4-74 Fuzzy Logic
ge
M - Medium
L - Large
io led
Step 2 : Fuzzification
Define membership functions for each of the input and output variables.
ic ow
50
x
0 x 50
M (x) = 100 – x
50 50 < x 100
x – 50
H (x) = , 50 x 100
50
50 – y
L (y) = 50 , 0 y 50
50
y
0 y 50
M (y) = 100 – y
50 50 < y 100
y – 50
H (y) = 50 y 100
50 ,
ge
io led
Fig. P. 4.10.3(b) : Membership functions for purifier
5–z
S (z) = , 0z5
5
ic ow
z
0 z 5
5
M (z) = 10 – z
n
5 5 < z 10
bl kn
z–5 5 < z 10
L (z) =
at
,
Pu ch
5
Step 3 : Form a rule base
Te
Temp grade L M H
C L M S
M L M M
H M S S
First rule can be, “If temperature is cold and grade is low then amount of purifier required is large.”
Similarly all nine rules can be defined using if-then rules.
Step 4 : Rule Evaluation
Assume water temperature = 5 and grade = 30
We get,
50 – 5 9
c (5) = = = 0.9 …(1)
50 10
5 1
M (5) = = = 0.1 …(2)
50 5
Evaluate L (y) and M (y) for y = 30
50 – 30 2
L (30) = = = 0.4 …(3)
50 5
30 3
M (30) = = = 0.6 …(4)
50 5
The above four equation represents following for rules that we need to evaluate.
ge
1. If temperature is cold and grade is low.
Since the antecedent part of each rule is connected by and operator we use min operator to evaluate strength of each
rule
n
bl kn
Strength of rule1 : S1 = min (c (5), L (30)) = min (0.9, 0.4) = 0.4
Strength of rule2 : S2 = min (c(5), M(30)) = min (0.9, 0.6) = 0.6
at
Pu ch
Step 5 : Defuzzification
Since, we use “mean of max” defuzzification technique, we first find the rule with maximum strength.
= max (S1, S2, S3, S4) = max (0.4, 0.6, 0.1, 0.1) = 0.6
To find out final defuzzified value, we now take average (i.e. mean) of M(z).
AI&SC (MU-Sem. 7-Comp) 4-77 Fuzzy Logic
10 – z z
M(z) = and M(z) =
5 5
10 – z z
0.6 = 0.6 =
5 5
z = 13 z = 3
13 + 3
z* = = 8 gms
2
ge
io led
ic ow
n
bl kn
at
Pu ch
Te
Ex. 4.10.4 : Design a fuzzy controller for a train approaching or leaving a station. The inputs are distance from a station
and speed of the train. The output is brake power used. Use,
(i) Triangular membership functions
(ii) Four descriptors for each variables
(iii) Five to six rules.
(iv) Appropriate deffuzication method.
Soln. :
Step 1 : Identify input and output variables and decide descriptors for the same.
Here inputs are
Distance of a train from the station, measured in meters and
Speed of train measured in km/hr.
Output variable is brake power measured in %.
ge
As mentioned, we take four descriptors for each of the input and output variables.
For distance {VSD, SD, LD, VLD}
io led
VSD : Very Short Distance
SD : Short Distance
ic ow
LD : Large Distance
VLD : Very Large Distance
n
For speed {VLS, LS, HS, VHS}
bl kn
LS : Low Speed
Pu ch
HS : High Speed
VHS : Very High Speed
Te
100 – x
VSD(x) = 100 , 0 x 100
x
SD (x) = 100 , 0 x 100
400 – x
300 , 100 < x 400
x – 100
LD (x) = 300 , 100 x 400
500 – x
100 , 400 < x 500
x – 400
VLD (x) = 100 , 400 x 500
ge
io led
ic ow
n
bl kn
y
LS = 10 , 0 y 10
Te
50 – y
40 , 10 < y 50
y – 10
HS (y) = 40 , 10 y 50
60 – y
10 , 50 < y 60
y – 50
VHS (y) = 10 , 50 y 60
20 – z
VLP (z) = 20 , 0 z 20
z
LP (z) = 20 , 0 z 20
80 – z
60 , 20 < z 80
z – 20
HP (z) = 60 , 20 z 80
100 – z
20 , 80 < z 100
z – 80
VHP(z) = 20 , 80 z 100
ge
Dist
io led VSD HP HP VHP VHP
SD LP LP HP VHP
LD VLP VLP LP HP
VLD VLP VLP LP LP
ic ow
ge
Strength of rule: S4 = min (LD(110), VHS(52))
Step 5 : Defuzzification
We use “mean of max” defuzzification technique.
We first find the rule with maximum strength
= max (S1, S2, S3, S4)
= max (0.8, 0.2, 0.33, 0.2) = 0.8
This corresponds to rule 1.
Thus rule 1 – “If dist is short and speed is high” has maximum strength 0.8.
The above rule corresponds to the output MF HP(z). This mapping is shown in Fig. P. 4.10.4(c).
To compute the final defuzzified value, we take average of HP(z).
z – 20 100 – z
HP(z) = HP(z) =
60 20
z – 20 100 – z
0.8 = 0.8 =
60 20
z = 68 z = 84
68 + 84
z* = = 76%
2
AI&SC (MU-Sem. 7-Comp) 4-82 Fuzzy Logic
ge
io led
ic ow
n
bl kn
at
Pu ch
Te
Ex. 4.10.5 : Design a fuzzy controller for maintaining the temperature of water in the tank at a fixed level. Input variables
are the cold water flow into the tank and steam flow into the tank. For cooling, cold water flow is regulated and
for raising the temperature steam flow is regulated. Define the fuzzification scheme for input variables. Device
a set of rules for control action and defuzzification. Formulate the control problem in terms of fuzzy inference
rules incorporating the degree of relevance for each rule. Design a scheme which shall regulate the water and
steam flows properly.
AI&SC (MU-Sem. 7-Comp) 4-83 Fuzzy Logic
Soln. :
Step 1 : Identify input and out variables and decide descriptors for each variables.
Here inputs are,
1. Amount of valve opening for cold water.
2. Amount of valve opening for steam.
Valve opening is measured in degree from 0 to 180. we take five descriptors for each of the input variables.
Descriptors for value opening for cold water flow are
{ELCV, LCV, CCV, RCV, ERCV}
ELCV : Extreme Left Cold Valve
LCV : Left Cold Valve
CCV : Centre Cold Valve
ge
RCV : Right Cold Valve
io led ERCV : Extreme Right Cold Valve
Descriptors for valve opening of steam flow are {ELSV, LSV, CSV, RSV, ERSV}
ELSV : Extreme Left Steam Valve
LSV : Left Steam Value
ic ow
We use seven descriptors for temperature of water {VVCT, VCT, CT, WT, HT, VHT, VVHT}
VVCT : Very Very Cold Temperature
Te
45 – x
ELCV(x) = 45 , 0 x 45
x
LCV(x) = 45 , 0 x 45
90 – x
45 , 45 < x 90
x – 45
CCV(x) = 45 , 45 x 90
135 – x
45 , 90 < x 135
x – 90
RCV(x) = 45 , 90 x 135
180 – x
45 , 135 < x 180
ge
x – 135
io led ERCV(x) = 45 , 135 x 180
Fig. P.4.10.5(a) : Membership functions for value opening for steam flow
Te
45 – y
ELSV(y) = 45 , 0 y 45
y
LSV(y) = 45 , 0 y 45
90 – y
45 , 45 y 90
y – 45
ESV(y) = 45 , 45 y 90
135 – y
45 , 90 y 135
y – 90
RSV(y) = 45 , 90 y 135
180 – y
45 , 135 y 180
y – 135
ERSV(y) = 45 , 135 y 180
AI&SC (MU-Sem. 7-Comp) 4-85 Fuzzy Logic
ge
z
io led VCT(z) = 10 , 0 z 10
30 – z
20 , 10 z 30
z – 10
CT(z) = 20 , 10 z 30
ic ow
50 – z
20 , 30 z 50
n
bl kn
z – 30
WT(z) = 20 , 30 z 50
70 – z
at
Pu ch
20 , 50 z 70
z – 50
HT (z) = 20 , 50 z 70
Te
90 – z
20 , 70 z 90
z – 70
VHT(z) = 20 , 70 z 90
100 – z
10 , 90 z 100
z – 90
VVHT(z) = 10 , 90 z 100
ge
= 0.11 …(2)
45
Evaluate LSV(y) and CSV(y) for y = 50, we get
io led 90 – 50
LSV(50) = = 0.88 ...(3)
45
50 – 45
CSV(50) = = 0.11 ..(4)
ic ow
45
Above four equations lead to the following four rules that we need to evaluate.
n
1. Cold water valve is in center and steam flow valve is in left.
bl kn
Step 5 : Defuzzification
We find the rule with maximum strength
= max (0.88, 0.11, 0.11, 0.11)
= 0.88
Which corresponds to rule 1.
The rule 1 corresponds to the output MF CT(z). To compute final defuzzified value we take average of CT(z).
AI&SC (MU-Sem. 7-Comp) 4-87 Fuzzy Logic
z – 10
CT(z) = 0.88
20
z – 10
= z = 27.7
20
50 – z
CT(z) = 0.88
20
50 – z
= z = 32.3
20
27.7 + 32.3
z* =
2
= 30C
Review Questions
ge
Q. 1 Explain support and core of a fuzzy set with examples.
Q. 2 Model the following as fuzzy set using trapezoidal membership function : “Numbers close to 10”.
io led
Q. 3 Using Mamdani fuzzy model, Design a fuzzy logic controller to determine the wash time of a domestic washing
machine. Assume that the inputs are dirt and grease on cloths. Use three descriptors for each input variables and five
ic ow
descriptor for the output variable. Derive a set of rules for control action and defuzzification. The design should be
supported by figures wherever possible.
n
bl kn
b1 b2 b3
a1 0.4 0.5 0
Te
b1 0.2 0.7
b2 0.3 0.8
b3 1 0
Q. 5 Define Supports, Core, Normality, Crossover paints and – cut for fuzzy set.
Q. 6 High speed rail monitoring devices sometimes make use of sensitive sensors to measure the deflection of the earth
when a rail car passes. These deflections are measured with respect to some distance from the rail car and, hence are
actually very small angles measured in micro-radians. Let a universe of deflection be A = [1, 2, 3, 4] where A is the
angle in micro-radians, and let a universe of distance be D = [1, 2, 5, 7] where D is distance in feet, suppose a relation
between these two parameters has been determined as follows :
AI&SC (MU-Sem. 7-Comp) 4-88 Fuzzy Logic
R = D1 D2 D3 D4
A1 1 0.3 0.1 0
A2 0.2 1 0.3 0.1
A3 0 0.7 1 0.2
A4 0 0.1 0.4 1
Now let a universe of rail car weights be W = [1, 2], where W is the weight in units of 100,000 pounds. Suppose the
fuzzy relation of W to A is given by,
S = W1 W2
A1 1 0.4
A2 0.5 1
ge
A3 0.3 0.1
A4 0 0
io led
T
Using these two relations, find the relation R oS = T.
Q. 7 Design a fuzzy logic controller for a train approaching or leaving a station. The inputs are the distance from the station
n
and speed of the train. The output is the amount of brake power used. Use four descriptors for each variable use
bl kn
Q. 12 Determine all - level sets and strong - level sets for the following fuzzy set.
A = {(1, 0.2) (2, 0.5), (3, 0.8), (4, 1) (5, 0.7), (6, 0.3)}.
Q. 13 Design a fuzzy controller to determine the wash time of a domestic washing machine. Assume that the inputs are dirt
and grease on clothes. Use three descriptors for each I/P variable. Device a set of rules for control action and
defuzzification. The design should be supported by figures wherever possible. Clearly indicate that if the clothes are
soiled to a larger degree the wash time required will be more.
Q. 15 Explain cylindrical extension and projection operations on fuzzy relation with example.
Q. 16 Model the following as fuzzy set using trapezoidal membership function “Middle age”.
AI&SC (MU-Sem. 7-Comp) 4-89 Fuzzy Logic
Q. 17 For the given membership function as shown in Fig. Q. 17, determine the defuzzified output value by any 2 methods.
Fig. Q. 17
Q. 19 Design fuzzy logic controller for water purification plant. Assume the grade of water and temperature of water as the
inputs and the required amount of purifier as the output. Use three descriptors for input and output variables. Derive
set of rules for control the action and deffuzzification. The design should be supported by figures. Clearly indicate that
ge
if water temperature is low and grade of water is low, then amount of purifier required is large.
x1 0.6 0.3
R =
x2 0.2 0.9
n
bl kn
z1 z2 z3
y1 1 0.5 0.3
at
S =
Pu ch
Obtain fuzzy relation T as a max-min composition and max-product composition between the fuzzy relations.
Te
Q. 22 Describe in detail the formation of inference rules in a Mamdani Fuzzy inference system.
5 Artificial Neural Network
Unit V
Syllabus
5.1 Introduction : Fundamental concept : Basic Models of Artificial Neural Networks : Important Terminologies of
ANNs : McCulloch-Pitts Neuron
5.2 Neural Network Architecture: Perceptron, Single layer Feed Forward ANN, Multilayer Feed Forward ANN,
Activation functions, Supervised Learning : Delta learning rule, Back Propagation algorithm.
ge
5.3 Un-Supervised Learning algorithm : Self Organizing Maps
Artificial Neural Networks (ANNs) are simplified models of the biological nervous system. They basically mimic the
working of a human brain.
n
An ANN, in general, is a highly interconnected network of a large number of processing elements called neurons.
bl kn
An ANN can be considered as a highly parallel network. Distributed processing is a typical feature of a neural network.
Neural networks work on the principle of learning by examples. They are presented with known examples of a
at
Pu ch
problem called ‘training set’, to acquire knowledge about the problem. After training, the network can be effectively
employed in solving instances of the problem previously unknown to the network.
Te
As shown in Fig 5.1.1, There are two input neurons and one output neuron.
Each neuron has an internal state of its own called ‘activation of a neuron’.
The activation signal of one neuron is transmitted to another neuron.
This Activation of a neuron can be considered as a function of the inputs a neuron receives.
Consider a set of neurons; say X1 and X2, transmitting signals to another neuron, Y.
As shown in Fig 5.1.1 there are two input neurons X1 and X2. They transmit signals to output neuron Y.
Input neurons X1 and X2 are connected to the output neuron Y, over a weighted interconnection links (W1 and W2).
For the above simple neural net architecture, the net input is calculated as :
Yin = W1 x1+ W2 x2
AI&SC (MU-Sem. 7-Comp) 5-2 Artificial Neural Network
The function that we apply over the net input is called activation function.
ge
Information flow in nervous system
The huge neural network in the brain is made up of extremely complex interconnections and has a very complicated
io led
structure.
Fig. 5.1.2 shows information flow in the human nervous system.
ic ow
As shown in Fig. 5.1.2, sensory receptors provide input to the network. Receptors deliver stimuli both from within the
body and from sensory organs, when the stimuli originate in the external world.
n
The stimuli are in the form of electrical impulses that convey information into the network of neurons.
bl kn
at
Pu ch
Te
As shown in Fig. 5.1.2, effectors are controlled on account of information processing in the Central Nervous System
(CNS). These effectors then produce human responses in the form of various actions.
When necessary, commands are generated and transmitted to the motor organs.
Motor organs are monitored in the CNS by both internal and external feedback.
The overall nervous system structure has the characteristics of a closed loop system.
Biological Neuron
A typical cell has 3 major regions: the cell body, called the soma, the axon and the dendrites. A schematic of a typical
biological neuron is shown in Fig.5.1.3.
Dendrites form a dendrite tree, which is a very fine bush of thin fibres around the neuron’s body.
AI&SC (MU-Sem. 7-Comp) 5-3 Artificial Neural Network
ge
io led
ic ow
n
bl kn
at
Pu ch
Te
Incoming impulses can be excitatory if they cause the firing, or inhibitory if they hinder the firing of the response.
The condition for firing is that the excitation should exceed the inhibition by the amount called the threshold of the
neuron, typically a value of about 40 mV.
The incoming impulse to a neuron can only be generated by neighbouring neurons and by the neuron itself.
Hence, impulses that are closely spaced in and arrive synchronously are more likely to cause the neuron to fire.
The characteristic feature of the biological neuron is that the signals generated do not differ significantly in
magnitude; the signal in the nerve fibre is either absent or has the maximum value.
The information is transmitted between the nerve cells by means of binary signals.
After an axon fibre finishes generating a pulse, it goes into a state of total non-excitability for a certain amount of
time, called the refractory period. The nerve does not conduct any signals during the refractory period, even if the
excitation intensity is very high.
The time for modelling biological neurons is of the order of milliseconds. However, the refractory period is not
uniform over the cells.
AI&SC (MU-Sem. 7-Comp) 5-4 Artificial Neural Network
Speed Low compared to ANN. The speed is in milliseconds. Human Speed is High (in nanoseconds). They do
Brain may fatigue if too much information and stress is not experience ‘Fatigue’.
presented.
11
Size The no. of neurons in Brains is approximately 10 and total The size of ANN depends on the
15.
no. of connections are 10 Thus the complexity of human application (usually hundreds or
brain is very high . thousands) chosen and the developed who
develops the application.
Complexity Can perform Massive parallel computation simultaneously. Can also perform Massive parallel
ge
computation simultaneously but much
more faster than Human brain.
Storage Stores the information in the synapse. Stores the information in continuous
io led
capacity memory locations.
Tolerance Biological neuron networks due to their topology are fault- Artificial neural networks are not modelled
ic ow
tolerant. Information is stored redundantly so minor failures for fault tolerance or self-regeneration.
will not result in memory loss.
n
bl kn
Control No specific control mechanism external to the computing There is a control unit for controlling
Mechanism task. computing activities
at
Pu ch
Power The brain consumes about 20% of all the human body’s A single Nvidia GPU runs on 250
consumption energy . An adult brain operates on about 20 watts. watts alone, and requires a power
supply. Machines are way less efficient
Te
Learning In human brain, Brain fibres grow and reach out to connect Artificial neural networks in the other
to other neurons; neuroplasticity allows new connections to hand, have a predefined model, where no
be created and synapses may strengthen or weaken based further neurons or connections can be
on their importance added or removed
The dendrites in the Biological Neural Network are analogous to the weighted inputs based on their synaptic
interconnection in the Artificial Neural Network.
The cell body is comparable to the artificial neuron unit in the ANN which also comprises of summation and threshold
unit.
Axon carries output that is analogous to the output unit in case of Artificial Neural Network. So, ANN is modeled using
the working of basic biological neurons.
1. Adaptive learning : An ANN possess the ability to learn from the existing environment and also adapt to the new
environment.
2. Self-organization : An ANN can create its own organization or representation of the information it receives during
learning time.
ge
3. Real-time operation : Since ANN computations may be carried out in parallel, they can be used for real time
applications. Special hardware devices are being designed and manufactured to take advantage of this capability of
ANNs and to reduce the response time.
io led
4. Fault tolerance : Neural networks are fault tolerant in the sense even if some portions of the neural net is removed
(For example some connections are removed), there will be only a small degradation in the neural network
ic ow
performance.
5. Generalization : After learning from the available inputs and their relationships, ANN has capability to infer unseen
n
relationships on unseen data, thus making the model generalize.
bl kn
6. Non-linearity : ANNs have the ability to learn and model non-linear and complex relationships. In most of the real-life
at
problems, many of the relationships between inputs and outputs are non-linear as well as complex.
Pu ch
7. Parallel distributed processing : ANNs have massive parallelism which makes them very efficient.
1. The best-known disadvantage of Neural Networks is their “black box” nature. This means that you don’t know how
and why your NN came up with a certain output. For example, when you put in an image of a cat into a neural
network and it predicts it to be a car, it is very hard to understand what caused it to came up with this prediction.
2. Neural Networks usually require much more data than traditional Machine Learning algorithms, as in at least
thousands if not millions of labeled samples.
3. Computationally Expensive : Usually, Neural Networks are also more computationally expensive than traditional
algorithms. State of the art deep learning algorithms can take several weeks to train completely from scratch. Most
traditional Machine Learning Algorithms take much less time to train.
Neural networks have been successfully applied to a broad spectrum of data-intensive applications. Few of them are
listed below.
1. Forecasting
Neural network can be used very effectively in forecasting exchange rates, predicting stock values, inflation and cash
forecasting, forecasting weather conditions etc. Researchers have proved that the forecasting accuracy of NN systems
tend to excel over that of the linear regression model.
AI&SC (MU-Sem. 7-Comp) 5-6 Artificial Neural Network
2. Image compression
Digital images require a large amount of memory for storage. As a result, the transmission of image from one
computer to another can be very expensive in terms of time and bandwidth required.
With the explosion of Internet, more sites are using images. Image compression is a technique that removes
some of the redundant information present in the image without affecting its perceptibility, thus, reducing the
storage size required to store the image.
NN can be effectively used to compress the image. Several NN techniques such as Kohonen’s self-organizing
maps, Back propagation algorithm, Cellular neural network etc. can be used for image compression.
Neural networks have been applied successfully in industrial process control of dynamic systems.
Neural networks (especially multi- layer perceptrons) have been proved to be the best choice for modelling
ge
non-linear systems and implementing general – purpose non-linear controllers, due to their universal
approximation capabilities. For example control and management of agriculture machinery.
io led
4. Optical Character Recognition
Well known application using image recognition is the Optical Character Recognition (OCR) tools that are
available with the standard scanning software for the home computer.
ic ow
Scansoft has had great success in combining NN with a rule based system for correctly recognizing both
n
characters and words, to get a high level of accuracy.
bl kn
at
Customer Relationship Management requires key information to be derived from raw data collected for each
individual customer. This can be achieved by building models using historical data information.
Te
Many companies are now using neural technology to help in their day to day business processes. They are doing
this to achieve better performance, greater insight, faster development and increased productivity.
By using Neural Networks for data mining in the databases, patterns can be identified for the different types of
customers, thus giving valuable customer information to the company.
Also, NN could be useful for important tasks related to CRM, such as forecasting call centre loading, demand and
sales levels, monitoring and analysing the market, validating, completing and enhancing databases, clustering and
profiling client base etc.
One example is the airline reservation system AMT, which could predict sales of tickets in relation to destination,
time of year and ticket price.
6. Medical science
Medicine is the field that has always taken benefits from the latest and advanced technologies.
Artificial Neural Networks (ANN) is currently the next promising area of interest in medical science.
It is believed that neural networks (also Deep networks) will have extensive application to biomedical problems in
the next few years.
ANN has already been successfully applied in medical applications such as diagnostic systems, bio chemical
analysis, disease detection, image analysis and drug development.
AI&SC (MU-Sem. 7-Comp) 5-7 Artificial Neural Network
5.2.1 Connections
The arrangement of neurons in layers and connection between them defines the neural network architecture. There
are four basic types of neuron connection architectures. They are:
1. Single Layer feed forward network
2. Multi-layer feed forward network
ge
3. Single layer Recurrent network
4. Multi-layer Recurrent network
io led
[We will discuss above points in Section 5.5]
5.2.2 Learning
MU - Dec. 11, May 12, Dec. 13, Dec. 14
ic ow
Q. What is learning in neural networks ? Differentiate between supervised and unsupervised learning.
(Dec. 13, Dec. 14, 10 Marks)
at
Pu ch
There are two types of learning used to train the neural network: batch learning and incremental learning.
Batch learning takes place when the network weights are adjusted in a single training step.
In this mode, the complete set of input training data is needed to determine weights. Feedback information produced
by the network is not involved in developing the network. This learning technique is called recording.
Incremental learning is most commonly used and can be broadly classified into three basic types.
In this type of learning, each input pattern has a corresponding output pattern associated with it, which is the target
or the desired pattern.
Here, a comparison is made between the actual output of the network and the desired output to find out the ‘error’.
The computed error can be used to adjust network parameters (like connection weights, threshold etc.). As the
network parameters are modified, the performance of the network improves.
The training could continue until the network is able to produce the expected or desired response.
AI&SC (MU-Sem. 7-Comp) 5-8 Artificial Neural Network
As shown in Fig. 5.2.1, the distance function [d, o] takes two values as an input. Actual network output o and desired
output d for an input X and then computes error measure.
Since we have assumed adjustable weights, the weights can be adjusted to improve network performance that is to
reduce the error.
ge
This is analogous to classroom learning with the teacher’s questions answered by students and corrected, if needed by
the teacher.
Q. What is unsupervised learning ? Compare different learning rules. (May 13, 6 Marks)
Q. Distinguish between supervised and un-supervised learning. (Dec. 15, 5 Marks)
n
bl kn
In this learning method, the desired output is not presented to the network. It is as if there is no teacher to present
the desired or expected output. Hence the system learns on its own by discovering and adapting to structural features
at
Pu ch
Since no external instructions regarding potential clusters is available, suitable weights self-adaption mechanism
needs to be embedded in the trained network. One possible network adaptation rule that can be applied is “A pattern
added to a cluster has to be closer to the cluster centre than to the other clusters’ centres”.
Thus in unsupervised learning, the network must itself discover possibly existing patterns, regularities or separating
properties.
This type of learning is analogous to learning the subject from a videotape lecture covering the material but not
including any other teacher’s involvement. Here the teacher delivers the lecture but is not available for clarification of
unclear questions or to check answers.
Table 5.2.1 : Difference between supervised and unsupervised learning
ge
2. A supervised learning algorithm analyses the Unsupervised learning uses procedures that attempt to find
training data and produces an inferred function, natural partitions of patterns given to the neural network.
io led
which can be used for mapping new examples.
3. Supervised learning is achieved by means of Unsupervised learning is achieved by means of clustering
classification or regression
ic ow
4. Supervised learning is known as learning with a Unsupervised learning is known as learning without a teacher’s
n
teacher’s presence as the desired output is known presence as desired output is unknown
bl kn
beforehand
5. Supervised learning is limited to learning small and With unsupervised learning it is possible to learn larger and more
at
Pu ch
Basically, this learning attempts to learn the input-output mapping through trial and error. Here the system knows
whether the output is correct or not, but does not know the correct output.
Table 5.2.1 shows the difference between supervised and unsupervised learning.
The output response of a neuron is calculated using an activation function (also called transfer function).
The sum of weighted inputs (net input to the neuron) is applied with an activation to obtain the neuron’s response.
Neurons placed in the same layer use same activation.
There are basically two types of activation functions: linear and non-linear.
The non-linear activation functions are used in a multi-layer net.
ge
1. Unipolar Binary
1 net > 0
ic ow
2. Bipolar Binary
It’s bipolar in nature meaning it generates two values +1 or -1.
It is used only at output layer.
Its computed as,
+ 1 net > 0
f(net) = sgn (net) = – 1 net < 0
The two functions bipolar binary and unipolar binary are called hard limiting activation functions.
AI&SC (MU-Sem. 7-Comp) 5-11 Artificial Neural Network
ge
io led
ic ow
n
bl kn
It is computed as,
n
2
f(net) =
1+e
(– net) – 1 Where, net = wi xi
i=1
This function is related to the hyperbolic tangent function. is called the steepness parameter.
Usually used in hidden layers of a neural network as it’s values lies between -1 to 1.
It’s output is zero centered because its range in between -1 to 1. Hence in practice it is always preferred over
Sigmoid function .
- We can easily observe that at in bipolar continuous (sigmoidal) function it becomes the sgn (net) function.
- The two activation functions bipolar continuous and unipolar continuous have sigmoidal characteristics and
hence called soft limiting activation functions.
Note : Both sigmoidal and tanh functions suffer from Vanishing gradient problem.
ge
io led
ic ow
n
bl kn
6. Softmax Function
The softmax function is also a type of sigmoid function.
Te
The sigmoid function can handle only two classes. What if there are multiple classes.
Softmax is commonly used for multi - classification problems.
It is non-linear in nature.
The softmax function would squeeze the outputs for each class between 0 and 1 and would also divide it by the
sum of the outputs. This gives the probability of the input being in a particular class.
The softmax function is ideally used in the output layer of the classifier where we actually try to attain the
probabilities to define the class of each input.
Linear O = gI
g = (tan )
AI&SC (MU-Sem. 7-Comp) 5-13 Artificial Neural Network
ge
io led
Note :
Sigmoidal functions (unipolar continuous and bipolar continuous) are used in multilayer nets like
ic ow
Hence the network is unable to determine which of the input weights should be increased and which ones
should not.
Te
On the other hand, soft-limiting activation functions (sigmoidal functions) give us some information on the
inputs, so that the network will be able to determine when to strengthen or weaken the relevant weights.
w11 w12 …. w1m
t
W1 w21 w22. … w2m
t .
W = W2 =
t
.
Wn wn1 wn2 … wnm
T
Where Wi = [wi1, wi2, …, wim] for i = 1, 2, 3, …., n is the weight vector for processing element.
th
wij is the individual element of weight vector that represents the weight on the communication link from i neuron to
th
j neuron.
AI&SC (MU-Sem. 7-Comp) 5-14 Artificial Neural Network
5.3.2 Bias
A bias is a weight on a connection from an additional input unit whose activation is always 1.
Increasing the bias increases the net input to the unit.
ge
For the above example, net is computed as,
io led
net = b + w1 x1 + w2 x2
Similar to initialization of weights, bias should also be initialized to some specific value.
For example, the bias value allows us to shift the activation function to the left or right which may be critical for
successful learning.
at
Pu ch
Usually, when a bias is not used, the separating line (or plane) passes through the origin. This may not be appropriate
for solving a particular problem. Adding an adjustable bias helps us to shift the separating line (or plane) either to the
Te
o = sigmoid (1 x + b)
= sigmoid (x + b)
where b is the bias.
Using different values of bias, we can shift the entire curve to the left or to the right as shown in the Fig. 5.3.4.
AI&SC (MU-Sem. 7-Comp) 5-15 Artificial Neural Network
ge
5.3.3 Threshold
Threshold is a set value based upon which the final output of the network is calculated.
io led
Usually a net input to the neuron is calculated and then compared with the threshold value.
If the net value is greater than the threshold, then the neuron fires, otherwise, it does not fire.
ic ow
of learning.
The effectiveness and convergence of learning algorithm depends significantly on the value of the learning constant.
at
Pu ch
However, the optimum value of learning rate depends on the problem being solved and therefore there is no single
learning constant that can be used for different training cases.
For example, to solve a problem with broad minima, a large value of learning constant will result in a more rapid
Te
convergence. However, for problems with steep and narrow minima, a small value of learning constant must be
chosen to avoid overshooting the solution. But choosing a small value of learning constant also increases the total
number of steps in training.
Q. Explain Mc-Culloch Pitts Neuron Model with help of an example. (Dec. 11, May 12, 5/6 Marks)
Q. Explain with an example McCulloch-Pitts neuron model. (Dec. 12, 6 Marks)
The first formal definition of a synthetic neuron model based on highly simplified considerations of the biological
model was formulated by McCulloch and Pitts in 1943.
The model is shown in the Fig. 5.4.1.
ge
The inputs xi for i = 1, 2, 3, ...., n are either 0 or 1, depending on the absence or presence of the input impulse at
instant k. io led
ic ow
n
Fig. 5.4.1 : McCulloch Pitts Neuron model
bl kn
Pu ch
negative).
There is a fixed threshold for each neuron, and if the net input to the neuron is greater than the threshold, then the
neuron fires. If the net input to the neuron is less than the threshold, then the neuron does not fire i.e., It goes into an
inhibition state.
Accordingly, the firing rule for the neuron is defined as follows,
n
k
1 if wix i T
i=1
k+1
o =
n
k
0 if wix i < T
i=1
And T is the neuron’s threshold value, which needs to be exceeded by the weighted sum of input signals for the
neuron to fire.
Although, this neuron model is very simplistic, it has substantial computing potential. It can perform basic logical
operations such as AND, OR, NOT etc. provided its weights and threshold are appropriately selected.
The McCulloch Pitts neuron does not have any particular training algorithm. An analysis has to be performed to
determine the values of weights and the threshold.
Ex. 5.4.1 : Design two input AND logic using McCulloch Pitts Neuron model.
Soln.: Consider the truth table for the AND logic function shown in Table P. 5.4.1.
Here we need to find the appropriate values of w1, w2 and threshold T such that, they satisfy the truth table of AND
logic.
ge
Here, the firing rule is,
w1 x1 + w2 x2 T …(1)
io led
And inhibition rule is,
w1 x1 + w2 x2 < T …(2)
ic ow
From the truth table of AND logic, we get following four inequalities.
n
0 < T
bl kn
w2 < T
at
w1 < T
Pu ch
w1 + w2 T
Te
x1 x2 Output y
0 0 0
0 1 0
1 0 0
1 1 1
In order to get above four inequalities, use Equation (1) when neuron must fire and use Equation (2) when neuron
must be inhibitory.
For x1 = 0, x2 = 0 y = 0 (i.e. inhibitory)
So substitute x1 = 0, x2 = 0 in Equation (2), we get,
0 < T
For x1 = 0, x2 = 1 y = 0 (i.e. inhibitory)
AI&SC (MU-Sem. 7-Comp) 5-18 Artificial Neural Network
w1 = 1
ge
w2 = 1
T = 2
io led
ic ow
n
Fig. P.5.4.1(a) : Final weights and threshold for AND logic
bl kn
Ex. 5.4.2 : Design two inputs OR logic using McCulloch Pitts Neuron model.
at
Pu ch
Soln.:
Consider the truth table for OR logic function shown in the Table P.5.4.2.
Te
x1 x2 Output y
0 0 0
0 1 1
1 0 1
1 1 1
Fig. P.5.4.2 :MCP Neuron Model for Logical OR
We need to find the appropriate values of w1, w2 and T such that they satisfy the truth table of OR logic.
Here, the firing rule is,
w1 x1 + w2 x2 T …(1)
and the inhibitory rule is,
w1 x1 + w2 x2 < T …(2)
From the truth table of OR logic, we get the following four inequalities.
0 < T
w2 T
AI&SC (MU-Sem. 7-Comp) 5-19 Artificial Neural Network
w1 T
w1 + w2 T
To derive the above four inequalities, use Equation (1) when the neuron must fire and use Equation (2) when the
neuron must be inhibitory.
For x1 = 0, x2 = 0 y = 0 (i.e. inhibitory), so substitute x1 = 0, x2 = 0 in Equation (2), we get,
0 < T
For x1 = 0, x2 = 1 y = 1 (i.e. firing), so substitute x1 = 0, x2 = 1 in Equation (1),we get,
w2 T
For x1 = 1, x2 = 0y = 1 (i.e. firing) so substitute x1 = 1, x2 = 0 in Equation (1) we get,
w1 T
For x1 = 1, x2 = 1 y = 1 (i.e. firing).
So, substitute x1 = 1, x2 = 1 in Equation (1) we get,
ge
w1 + w2 T
For w1 = 1, w2 = 1 and T = 1, all of the above inequalities are satisfied. Hence, the solution is,
io led
w1 = 1
w2 = 1
T = 1
ic ow
n
bl kn
at
Pu ch
Ex. 5.4.3 : Implement the logic function shown in Table P.5.4.3 using McCulloch-Pitts neuron.
Soln. :
Table P.5.4.3 : Truth table
x1 x2 Output y
0 0 0
0 1 0
1 0 1
1 1 0
We need to find the appropriate values of w1,w2 and T such that they satisfy the truth table.
0 < T
w2 < T
w1 T
w1 + w2 < T
So for w1 = 1, w2 = – 1 and T = 1, all of the above inequalities are satisfied. So, the solution is,
w1 = 1
w2 = – 1
T = 1
ge
io led
Fig. P.5.4.3(a) : Final weights and threshold for given logic
Ex. 5.4.4 : Design NOT logic using McCulloch Pitts neuron model.
ic ow
Soln.:
n
Consider the truth table of NOT logic function.
bl kn
x y
Pu ch
0 1
1 0
Te
Here, we need to find out the appropriate values of w and T, such that, they satisfy the truth table of NOT logic.
wx T ...(1)
And inhibitory rule can be,
wx < T …(2)
From the truth table of NOT logic, we get following two inequalities.
0 T
w < T
So, for T = 0 and w = – 1, the above mentioned two inequalities are satisfied.
w = –1
T = 0
AI&SC (MU-Sem. 7-Comp) 5-21 Artificial Neural Network
ge
Fig 5.5.1 shows the architecture of a single layer feed forward network.
A single layer feed forward network consists of neurons arranged in two layers.
The first layer is called the input layer and the second layer is called the output layer.
io led
The input signals are fed to the neurons of the input layer and neurons of the output layer produce output signals.
Every input neuron is connected to every output neuron via synaptic links or weights.
ic ow
n
bl kn
at
Pu ch
Te
O = [o1 o2 o3.......om]t
And input vector is,
X = [x1x2x3.....xn]t
Weight wij connects the i output neuron with the j input neuron.
th th
th
Then the activation value of the i neuron is given as,
n
neti = wij xj , For i = 1, 2, 3, …., m
j=0
The transformation performed by each of the m neurons in the network, is non-linear mapping and expressed as,
t
Oi = f ( W X ), for i = 1,2,3,...,m
i
AI&SC (MU-Sem. 7-Comp) 5-22 Artificial Neural Network
th
Where weight vector Wi contains weights leading toward the i output node and is defined as,
t
Wi = [ wi1 wi2 wi3 … win ]
In spite of having two layers, the network is still named as ‘single layer’ because out of the two layers, only the output
layer performs computations. The input signal is simply transmitted to the output layer by the input layer.
Thus, feed forward networks are characterised by the lack of feedback. That is, the output of any given neuron is not
fed back to itself directly or indirectly or through other neurons. Thus, present output does not influence future
output.
ge
5.5.2 Multilayer Feed Forward Networks
In a multilayer feed forward network, there are multiple layers. Thus, besides having input layer and output layer, this
io led
network has one or more intermediary layers called hidden layers.
ic ow
n
bl kn
at
Pu ch
Te
xi Input neurons i = 1, 2, 3, …, n
zk Output neurons k = 1, 2, 3, … p
Similarly, the hidden layer neurons are connected to the output layer neurons and the corresponding weights are
referred to as “Hidden-output layer weights”.
A multi-layer feed forward network with m input neurons, n1 neurons in the first hidden layer, n2 neurons in the
second hidden layer and k output neurons is written as m – n1 – n2 – k network.
The feedback networks differ from the feed forward network in the sense that there exists at least one feedback loop.
When the output of the output neurons is fed back as an input to the same or preceding layer nodes, then this type of
network is called feedback network.
Fig. 5.5.4 shows one such type of feedback network.
ge
io led
ic ow
n
bl kn
at
Pu ch
Te
1. The present output, say o(t), controls the output at the following instant, o(t + Δ).
2. Δ indicates the time elapsed between t and t + Δ. Here the time delay Δ is in an analogy to the refractory period of a
basic biological neuron model.
3. Thus, the mapping of o(t) into o(t+ Δ) can be written as,
o(t + Δ) = Г [Wo(t)]
AI&SC (MU-Sem. 7-Comp) 5-24 Artificial Neural Network
ge
Fig. 5.5.6 : Multiplayer recurrent network (Redraw Figure)
Also, in these networks, a processing element output can be directed back to the processing element itself and to
other processing elements in the same layer.
n
bl kn
As discussed earlier, supervised learning works on labeled data meaning, for every input, the corresponding output (or
Pu ch
class) is known.
Te
As shown in Fig.5.6.1, the network consists of 3 units, the Sensory unit S, Association unit A and Response unit R.
The S unit receives input images and generates 0 or 1 electrical signal as output .It contains 400 photo-detectors.
These input signals are then compared with threshold. If the input signals exceed a threshold, then the photo-detector
outputs 1 otherwise 0.
The photo-detectors are randomly connected to the Association unit A. The association unit A comprises of feature
demons or predicates.
AI&SC (MU-Sem. 7-Comp) 5-25 Artificial Neural Network
rd
The 3 unit R contains pattern recognizers (also called perceptrons) which receive the results of the predicate.
The weights of S and A units are fixed while those of R are adjustable.
The output of R is 1 if the weighted sum of its inputs is greater than 0, otherwise it is 0.
ge
5.6.1(B) The Perceptron Model
io led
The perceptron model is based on the perceptron learning rule in which the learning signal is the difference between
the desired output and the neuron’s actual output.
Here learning is obtained in supervised environment and the learning signal is equal to:
ic ow
r = di– oi
n
t
Where oi = sgn WiX ( )
bl kn
Here the weights are adjusted using following weight update formula.
Pu ch
t
[
∆Wi = c di – sgn ( W X)] X
i
Te
t
∆Wij = c [d – sgn ( W X)] X ; for j = 1, 2, ….n
i i j
t
( )
The weight adjustment is inherently zero, when di= oi(i.e. di= sgn Wi X ) . Thus, when the desired output is the same
as the actual output, there will not be any change in weights.
ge
{X1, d1, X2, d2, X3, d3, ..., Xp, dp} where Xi is (n 1), di is (1 1), i = 1, 2, ... P
Here augmented input vectors are used.
io led
Yi = i
X
for i = 1, 2, …, P
1
i.e. there is an additional input for the bias and it is 1.
ic ow
In the following, k denotes the training steps and p denotes the step counter within a cycle.
n
Step 1 : c > 0 is chosen
bl kn
Step 3 : The training cycle begins here. Input is presented and output is computed
YYp, d dp
t
Osgn (W Y)
Step 4 : Weights are updated
1
W W + 2 c (d – o) Y
c. One (or more) layers between input layer and output layer. These layers are called hidden layers.
For perceptrons in the input layer, we use linear transfer function and for the perceptrons in the hidden and output
layers, we use sigmoid (continuous) function.
Here, since the single layer perceptron’s model is modified by adding a hidden layer and changing the transfer
(activation) function from linear function to a nonlinear function, we need to alter the learning rule as well. So now
the multilayer perceptron network should be able to recognize more complex things.
ge
The input-output mapping of multilayer perceptron is shown in Fig. 5.6.3 and is represented by,
O = f2 [W [f1[XV]] Where, f1 and f2 represent non-linear mapping.
io led
ic ow
n
bl kn
at
Pu ch
Te
x1 x2 Output (y)
1 1 –1
1 –1 1
–1 1 1
–1 –1 –1
The task of the network is to classify a binary input vector to output class = – 1, if the vector has similar inputs (Both
the inputs +1 Or both the inputs -1) or assign it to output class = 1 otherwise.
Let us try to solve this problem using a single-layer perceptron model. Consider the following single-layer perceptron
model shown in Fig. 5.6.4.
AI&SC (MU-Sem. 7-Comp) 5-28 Artificial Neural Network
ge
b – w1 + w2 > 0
b – w1 – w2 0
io led
which must be satisfied.
However, the set of inequalities is clearly self-contradictory when
considered as a whole.
ic ow
This is because the EX-OR problem is not linearly separable. This can
easily be observed from the plot given in Fig. 5.6.5. Fig. 5.6.5 : EX-OR function plot
n
bl kn
In other words, we cannot use a single layer perceptron to construct a straight line to partition the 2D input space into
two regions (class 1 and class 2).
at
Pu ch
Q. Explain linearly separable and non-linearly separable patterns with examples. (Dec. 10, 5 Marks)
Q. Explain with examples linearly separable and non-linearly separable pattern classification.
(May 12, Dec. 13, Dec.15, 10 Marks)
To achieve this, a step function can be used as an activation function. That is, the output is + 1, if the net input to
the neuron is positive, and -1 if the net input to the neuron is negative. Since the net input to the neuron is
computed as
n
net = b + xiwi
i=1
It is clear that the boundary between the region where net > 0 and the region where net < 0 is determined by the
relation.
n
b+ xiwi = 0
i=1
This boundary is also called decision boundary.
Depending on the number of input units in the network, this equation represents a line, a plane or a hyperplane.
ge
If there exist weights (and a bias) such that all of the training input vectors for which correct response is + 1 lie on
one side of the decision boundary and all of the training input vectors for which the correct response is – 1 lie on
the other side of the boundary then we say that the problem is “linearly separable”.
io led
ic ow
n
bl kn
The problem shown in Fig. 5.6.7 (a) is a linearly separable problem and can be solved using the network shown in
Fig. 5.6.7 (b).
Te
Fig. 5.6.7 (b) : Network to solve linearly separable problem in Fig. 5.6.7 (a)
The region where the output is positive is separated from the region where it is negative by the line,
– w1 b
x2 = x –
w2 1 w2
Here,
∵ net is computed as
n
net = b + wixi
i=1
Here,
net = b + w1 x1 + w2 x2
So, the decision boundary is
b + w1 x1 + w2 x2 = 0
AI&SC (MU-Sem. 7-Comp) 5-30 Artificial Neural Network
– w1 b
x2 = w x 1 – w
2 2
In the above example, there are many different lines that will serve to separate the input vectors into two classes.
There could also be many choices of w1, w2 and b that give exactly the same line.
To understand the concept of linear separability further, consider simple logic gates AND and OR.
The AND gate can be represented by the truth-table shown in Table 5.6.2.
Table 5.6.2 : AND Truth Table
x1 x2 y(output)
1 1 1
1 –1 –1
–1 1 –1
–1 –1 –1
ge
Then, the desired response for the AND function can be represented as shown in Fig. 5.6.8(a).
io led
ic ow
n
bl kn
Pu ch
The possible decision boundary for AND function could be as shown in Fig. 5.6.8 (b).
Te
ge
io led
ic ow
the 2D input space into two regions (class1 and class2). This can easily be observed from the plot given in
Fig. 5.6.10.
at
Pu ch
Fig. 5.6.11 shows examples of linearly separable and non-linearly separable patterns.
Te
The solution to this problem is to use non-linear (continuous) activation function such as sigmoidal or radial basis
function.
Hence Widrow-Hoff introduced the Delta learning rule that uses non-linear differentiable activation function for
training the multilayer perceptron networks.
ge
io led
ic ow
n
bl kn
The learning rule can be readily derived from the condition of least squared error between oi and di.
Te
Calculating the gradient vector with respect to Wi of the squared error defined as
1 2
E = ( d– o) …(5.6.1)
2 i i
which is equivalent to
1 t 2
E =
2 [ ( )]
di – f WiX …(5.6.2)
We obtain the error gradient vector value which is obtained by taking partial derivative of E with respect to each of
the weights.
= – ( di– oi) f
t
E (W X) X i
…(5.6.3)
Wij (W X) x i j ; j = 1, 2, …, n …(5.6.4)
Since the minimization of the error requires the weight changes to be in the negative gradient direction, we take
Wi = – E …(5.6.5)
Thus,
Wi = c (di– oi) f (neti) X
5.6.2(B) Proofs
MU – Dec. 12, Dec. 13, Dec. 15
ge
(i) For unipolar continuous activation function f’ (net) = o (1-o)
2
(ii) For bipolar continuous activation function f’(net) = o (1-o )/2
io led
Where o is output. (Dec. 12, Dec. 13 10 Marks)
Q. Prove that the first order derivative of a unipolar continuous activation function is
f’ (net) = O (1-O) (Dec. 15, 5 Marks)
ic ow
Depending on whether the activation function is bipolar continuous or unipolar continuous, and further assuming that
n
= 1, we can express f (net) i n te r m s o f th e output f ( net ) o r o .
bl kn
1
f (net)
2
= (1 – o ) …(5.6.8)
2
Te
Proofs
1
f (net)=2 (1 – o ) …[Bipolar continuous]
2
(I)
Proof :
2
o = f(net) = –1
1 + exp(– net)
Assume = 1
2
o = f(net) = –1
1 + exp(–net)
L.H.S = f (net)
d 2 d 2
= –1= – (1)
d(net) 1 + exp(– net) d(net) 1+ exp(– net)
d 1
= 2
d(net) 1+ exp(– net)
AI&SC (MU-Sem. 7-Comp) 5-34 Artificial Neural Network
– d (1 + exp(– net))
d(net)
= 2 2
(1 + exp(– net))
– 2(– exp(– net))
= 2
(1 +exp(– net))
2 exp(– net)
L. H.S = 2 …(5.6.10)
(1 + exp(– net))
1 2
R. H. S = (1 – o )
2
1
– 1
2 2
= 1–
2 1+ exp (– net)
2
1 1
– 1
2
= –
2 2 1 + exp(– net)
1 1
+ 1
4 2
ge
= – – (2)
2 2 (1 + exp(– net))2 (1 + exp(– net) )
–2 2
io led = 2 +
1 + exp(– net)
(1 + exp(– net))
= 2
– 1 + (1 + exp(– net))
2
(1 + exp(– net))
ic ow
2 exp(– net)
= 2 …(5.6.11)
n
(1 + exp(– net))
bl kn
L.H.S = R.H.S.
at
Pu ch
1
Hence proved f (net) = 2 (1 – o )
2
Te
o = f(net)
1
=
1 + exp(– net)
Assume = 1
o = f(net)
1
=
1 + exp(– net)
L. H. S = f (net)
d 1
=
d(net) 1 + exp(– net)
d
–1 (1 + exp(– net))
d(net)
= 2
(1 + exp(–net))
– (– exp(– net))
= 2
(1 + exp(– net))
AI&SC (MU-Sem. 7-Comp) 5-35 Artificial Neural Network
exp(– net)
L. H.S. = 2 …(5.6.12)
(1 + exp(– net))
R.H.S. = o(1 – o)
=
1 1 – 1
1 + exp(– net) 1 + exp(– net)
1 + exp(– net) – 1
= 2
(1 + exp(– net))
exp(– net)
= 2 …(5.6.13)
(1 + exp(– net))
ge
5.6.2(C) Algorithm-Delta Learning io led MU – Dec. 15
Q. Explain Single Continuous Perceptron Training Algorithm (SCPTA). (Dec. 15, 5 Marks)
ic ow
i = 1, 2, 3 …, P
at
Pu ch
Yi = 1 (n + 1) 1 for i = 1, 2, …, P
In the following, k denotes the no. of training steps and p denotes the no. of steps within a training cycle.
Step 1 : > 1, = 1, Emax > 0 chosen
W is (n + 1) 1
Counters an initialized
p 1 , k1 , E0
kk + 1
p p + 1 and go to Step 3
Otherwise, go to Step 7.
Step 7 : The training cycle is completed. For E < Emax terminate the training session. Output weights, k and E.
If E Emax then,
ge
E 0
Q. Explain Error back Propagation Algorithm with the help of flowchart. (Dec. 10, 12 Marks)
n
Q. Explain Error back propagation training algorithm with the help of a flowchart.
bl kn
(Dec. 11, May 12, Dec. 12, Dec. 13, Dec. 14, 10 Marks)
Q. Explain Error Back Propagation Training Algorithm (EBPTA). (May 13, 10 Marks)
at
Pu ch
The back propagation algorithm is one of the most important developments in neural networks. This learning
algorithm is applied to multilayer feed-forward networks. Multi layer feed forward networks consist of processing
Te
For the multilayer perception calculating the weights of the hidden layer in an efficient way that would result in a very
small or zero output error is a challenging task.
We can easily measure the error between the actual and desired output at the output layer. But at the hidden layer
there is no direct information of the error available. Therefore, adjusting the weights of hidden layer perception is
difficult.
The back propagation algorithm solves this problem by providing the equation for updating the weights at hidden
layer along with weight update formula at output layers such that the overall output error is minimized.
The back-propagation algorithm contains two phases:
Consider the layered feed forward neural network with two continuous perceptron layers shown in Fig. 5.6.13.
ge
io led
ic ow
n
bl kn
at
Pu ch
Te
th
In the above network, I input neurons are connected to each of the J hidden layer neuron.
Weights connecting the input layer to the hidden layer are represented by matrix V. Similarly, outputs of J hidden
neurons are connected to each of the K output neurons, with weight matrix W.
EBPTA uses supervised learning mode, therefore the training patterns Z should be arranged in pairs with desired
response d provide by the teacher.
O = [W [VZ]]
Where, [VZ] is the internal mapping and relates to the hidden layer mapping Z Y
ge
io led
ic ow
n
bl kn
at
Pu ch
Te
The training begins with the feed forward recall phase (Step 2). After a single pattern vector Z is submitted at the
input, the layers response Y and O are computed.
Then the error signal computation phase (Step 4) follows. The error signal vector must be determined in the output
layer first and then it is propagated back towards the network input nodes.
Next, K J weights are subsequently adjusted within the matrix W (Step 5).
The cumulative cycle error of input to output mapping is computed (in Step 3)
The final error value for the entire training cycle is calculated after each completed pass through the training
ge
set {Z1, Z2, Z3, ....Zp}
The learning stops when the final error value less than Emax is obtained (Step 8)
io led
5.6.3(B) Algorithm (Error Back Propagation Training)
Given are P training pairs
ic ow
been augmented.
th
There are total J – 1neurons in hidden layer having output Y. Note that the J component of Y is also of value (– 1),
at
Pu ch
Step 1 :
W is (K J), V is (J I)
q 1, p 1, E 0
Step 2 : Training step starts here
ge
Step 8 : The training cycle is completed.
For E < Emax terminate the training and
io led
Output weights W, V, q and E.
If E > Emax, then E 0, p 1, and initiate the new training cycle by going to step 2.
ic ow
Q. What is self-organizing map? Draw and explain architecture of Kohonen’s Self Organization Feature Map KSOFM.
(Dec. 15, 10 Marks)
Te
Self-Organizing Maps:
A Self-Organizing Map (SOM) uses unsupervised learning to build a two-dimensional map of a problem space.
It uses competitive learning as opposed to error-correction learning (which is used in backpropagation with gradient
descent)
A self-organizing map can generate a visual representation of data on a hexagonal or rectangular grid.
SOM also represents clustering concepts by grouping similar data together.
Applications include meteorology, oceanography, project prioritization, and oil and gas exploration.
A self-organizing map is also known as a Self-Organizing Feature Map (SOFM) or a Kohonen map.
Kohonens’s self organizing networks, also called Kohonen’s feature maps or topology preserving maps, are used for
data clustering.
Networks of this type impose a neighbourhood constraint on the output unit such that a certain topological property
in the input data is reflected in the output unit’s weight.
AI&SC (MU-Sem. 7-Comp) 5-41 Artificial Neural Network
Fig. 5.7.1 presents relatively simple Kohonen’s self organizing network with two inputs and 49 outputs.
ge
Fig. 5.7.1 : Kohonen’sself organizing network with two inputs and 49 outputs
io led
ic ow
n
bl kn
at
Pu ch
Te
Where,
2
c (i) = exp
– (Pi – Pc)
2
2
Here, reflects the scope of the neighbourhood and Pi and Pc are the positions of the output unit i and c respectively.
Algorithm (Kohonen’s self organization map)
Step 1 : Initialize the weights at some random values set topological neighborhood parameters. As clustering
progresses, the radius of the neighborhood decreases. Initialize the learning rate . It should be slowly
decreasing function of time.
AI&SC (MU-Sem. 7-Comp) 5-42 Artificial Neural Network
i=1 j=1
Step 5 : Find the winning unit index J, so that D(J) is minimum.
Note : Instead of Euclidean distance, dot product method can also be used to finalize the winner, in which
case the winner will be the one with the largest dot product.
Step 6 : For all units j within a specific neighborhood of J and for all i, calculate new weights.
wij = wij(old) + [xi –wij(old)]
ge
Step 7 : Update the learning rate using formula (t + 1) = 0.5 (t)
t
Ex. 5.8.1 : A neuron with 4 inputs has the weight vector w = [1 2 3 4] . The activation function is linear that is, the
n
t
activation function is given by f(net) = 2 * net. If the input vector is X = [5 6 7 8] , then find the output of the
bl kn
Soln. :
at
Pu ch
= W X
t
net
6
5
= [1 2 3 4]
7
8
= 5 + 12 + 21 + 32
= 70
Ex. 5.8.2 : Use perception learning rule to train the network. The set of input training vectors are as follows.
– 11
W1 = 0
0.5
The learning, constant c = 0.1. The teacher’s desired response for X1X2 and X3 are d1 = – 1, d2 = – 1 and
d3 = 1 respectively. Calculate the weights after one complete cycle.
Soln. :
ge
For perceptron learning rule,
t
neti = Wi X
io led
oi = sign (neti)
∆ Wi = c (d – oi) X
ic ow
1
n
= 0 ,
–2
bl kn
X d = –1
– 1
at
1
Pu ch
= [1 – 1 0 0.5] 0 = 2.5
t –2
Compute : net1= Wi X
– 1
Te
∆ W1 = c (d – o1) X
1
= 0.1 (– 1 – 1) 0
–2
– 1
– 0.2
=
0.4
0
0.2
W2 = W1 +∆W1
1 – 0.2 0.8
= 0 +
–1 0.4 – 0.6
=
0 0
0.5 0.2 0.7
Step 2 : Take second training pair,
Set X = X2, d = d2
AI&SC (MU-Sem. 7-Comp) 5-44 Artificial Neural Network
0
X
1.5
= – 0.5 , d = – 1
–1
0
Compute net2 = W2 X=[0.8 –0.6 0 0.7] – 0.5 = – 1.6
t 1.5
–1
O2= Sign (net2) = Sign (– 1.6) = – 1
Here d = – 1 and o2 = – 1
So, (d – o2) = 0
Hence, ∆W2 = 0. Thus there is no change in weight
0.8
= 0
–0.6
W 3 = W2
ge
0.7
Step 3 : Take third training pair,
io led
Set X = X3 , d = d3
– 1
ic ow
= 0.5 ,
1
X d=1
–1
n
t
bl kn
Compute net3 = W3 X
– 1
at
–1
Te
Here d = 1 and o3 = – 1
∆W3 = c (d – o3) X
– 1 – 0.2
= (0.1) (1 – (– 1)) 0.5 = 0.1
1 0.2
–1 – 0.2
W4 = W3 + ∆ W 3
0.8 – 0.2 0.6
=
– 0.6 0.2 – 0.4
+ =
0 0.1 0.1
0.7 – 0.2 0.5
Ex. 5.8.3 : Determine the weights after four steps of training for perceptron learning rule of a single neuron network
starting with initial weights :
t t
W = [0 0] , inputs as X1 = [2 2] ,
t t t
X2 = [1 – 2] , X3 = [– 2 2] , X4 = [– 1 1] ,
Soln. :
Here, we use unipolar binary activation function because desired values given are 0’s and 1’s for perceptron learning,
t
neti = Wi X
oi = f (neti)
Wi = c (d – oi) X
1 neti 0
Here oi =
0 neti 0
Step 1 : Take first training pair,
ge
t
Here W1 is the initial weight, W1 = [0 0]
2
io led
net1 = [0 0] 2
= 0
ic ow
o1 = f(0) = + 1
= W1 + W1 = 0 + – 2 = – 2
0 –2 –2
W2
at
Pu ch
1 and d = 1
X = –2
t
1
net2 = W2 X = [– 2 – 2] –2
= –2+4=2
o2 = f (2) = + 1
W2 = c (d – o2) X
= (1) (1 – 1) – 2 = 0
1
= W2 = – 2
–2
W3
– 2 and d = 0
X = 2
t
net3 = W3 X
AI&SC (MU-Sem. 7-Comp) 5-46 Artificial Neural Network
–2
= [– 2 – 2] 2
= 4–4=0
o3 = f(0) = +1
W3 = c (d – o3) X
= (1) (0 – 1) = 2
–2
2 –2
= W3 + W3= + 2 = 0
–2
W4
–2 –2 –4
Step 4 : Take fourth training pair
Set X = X4 and d = d4
– 1 and d = 1
ge
X =
1
= W4 X = [0 – 4] =0–4=–4
t –1
io led net4
1
o4 = f(– 4) = 0
ic ow
W4 = c (d – o4) X
= (1) (1 – 0) = – 1
–1
n
1 1
bl kn
W5 = W4 + W4
at
0 + – 1 = – 1
Pu ch
=
–4 1 –3
Te
Ex. 5.8.4 : A neuron with 3 inputs has the weight vector w = [0.1 0.3 – 0.2]. The activation function is binary sigmoidal
activation function. If input vector is [0.8 0.6 0.4] then find the output of neuron. MU - May 13, 5 Marks
Soln. :
Fig. P. 5.8.4
Here f (net) = 1
(– net)
[Binary sigmoid means the unipolar continuous]
1+e
Assume = 1
0.8
t
Computenet = W X = [0.1 0.3 – 0.2] 0.6 = 0.08 + 0.18 – 0.08 = 0.18
0.4
Compute o = f (net)
= 1 = 1
= 0.5448
1+e
(– 0.18)
1 + 0.8352
AI&SC (MU-Sem. 7-Comp) 5-47 Artificial Neural Network
Ex. 5.8.5 : Implement OR function using perceptron networks for bipolar inputs and targets. MU - Dec. 11, 10 Marks
Soln. :
The Truth table for a 2 input OR function is (assuming bipolar inputs target).
x1 x2 d
–1 –1 –1
–1 +1 +1
+1 –1 +1
+1 +1 +1
Where x1, x2 are the 2 inputs and ‘d’ is the desired output (target)
ge
Perceptron N/W for OR io led
ic ow
n
bl kn
Fig. P. 5.8.5
at
Pu ch
Step 1 :
t
Compute net1 = w1 x1
t – 1
w1 x1 = [0.5 – 0.5 0.5] – 1
+ 1
= – 0.5 + 0.5 + 0.5 = 0.5
O1 = sgn (net1) = sgn (0.5) = + 1
Now, d1 = –1
w1 = – 2c x1
– 1 2
= – 2 (1) – 1 = 2
+ 1 – 2
ge
w2 = w1 + w1
0.5 2 2.5
= – 0.5 + 2 = 1.5
io led
0.5 – 2 – 1.5
Step 2 :
ic ow
t
net2 = w2 x2
– 1
n
= [2.5 1.5 – 1.5] + 1
bl kn
+ 1
at
= – 2.5
Te
O2 = sgn (net2) = – 1
d2 = +1
– 1 – 2
w2 = 2 c x2 = 2(1) 1 = 2
1 2
w3 = w2 + w2
2.5 – 2 0.5
= 1.5 = 2 = 3.5
– 1.5 2 0.5
Step 3 :
t
net3 = w3 x3
1
= [0.5 3.5 0.5] – 1
1
= 0.5 – 3.5 + 0.5 = – 2.5
O3 = f(net3) = – 1
d3 = +1
AI&SC (MU-Sem. 7-Comp) 5-49 Artificial Neural Network
1 2
w3 = 2 c x3 = 2(1) – 1 = – 2
1 2
w4 = w3 + w3
0.5 2 2.5
= 3.5 = – 2 = 1.5
0.5 2 2.5
Step 4 :
t
net4 = w4 x4
+ 1
= [2.5 1.5 2.5] + 1
+ 1
= 2.5 + 1.5 + 2.5 = 6.5
ge
O4 = sgn(net4) = sgn (6.5) = + 1
d4 = +1
io led
No change in weights
i.e. w4 = 0
ic ow
2.5
w5 = w4= 1.5
2.5
n
bl kn
Step 5 :
t
at
net5 = w5 x1
Pu ch
– 1
= [2.5 1.5 2.5] – 1
+ 1
Te
d1 = –1
No change in weights
2.5
w5 = 0 w6 = w5 = 1.5
2.5
Now, after Step 5, we have,
w6 = w5 = w4
i.e. Weight vector is not changing after 2 consecutive steps. Hence, we stop training and this means that the training
in complete final weight vector after training is
2.5
W = 1.5
2.5
AI&SC (MU-Sem. 7-Comp) 5-50 Artificial Neural Network
ge
X1= 0 X2 = – 0.5 X3 = 0.5
–1 – 1 –1
io led
d1= – 1, d2= – 1 and d3= + 1 are desired responses for X1, X2 and X3 respectively. Initial weight vector
– 11
W1= 0
ic ow
0.5
The learning constant c = 0.1 and = [Link] delta learning rule to calculate the final weights. Use bipolar
n
continuous activation function.
bl kn
Soln. :
at
t
neti = Wi X …(1)
Te
2
oi = f(neti) = –1 …(2)
1+ exp (– net)
1 2
f (neti) (
= 2 1 – oi ) …(3)
– 2
1
X = 0 , d = -1
– 1
– 2
1
0 = [1 + 2 – 0.5] = 2.5
t
Compute net1 = W1 X = [1 – 1 0 0.5]
– 1
2 2
o1 = f(net1) = 1 + exp (– 2.5) – 1 = 1 + 0.0820 – 1 = 0.848
1 2 1
f (net1)
2
( )
= 2 1 – o1 = 2 (1 – 0.848 ) = 0.140
AI&SC (MU-Sem. 7-Comp) 5-51 Artificial Neural Network
1
= c (d – o1) f (net1) X = (0.1) (– 1 – 0.848) (0.14) 0
–2
∆W1
– 1
1 – 0.026
= – 0.026 0 =
–2 0.052
0
– 1 0.026
W2 = W1 +∆W1
1 – 0.026 0.974
= 0 +
–1 0.052 –0.948
0 = 0
0.5 0.026 0.526
nd
Step 2: Take 2 training pair, X = X2 and d = d2
1.5
0
ge
X = –0.5 , d = -1
–1
io led
1.5
0
0.526] –0.5
t
Compute net2 =W2 X = [0.974 – 0.948 0
–1
ic ow
n
= – 1.422 – 0.526
bl kn
= – 1.948
2 2
at
0
= c (d – o2) f (net2) X = (0.1) (– 1 – (– 0.75)) (0.2187) –0.5
1.5
∆W2
–1
0 0
= (– 0.00546) –0.5 = 0.00273
1.5 –0.00819
–1 0.00546
0.974 0 0.974
W3 = W2 + ∆W2 = –0.948 –0.00819 –0.956
0 + 0.00273 = 0.0027
0.526 0.00546 0.5315
Step 3 : Take 3rd training pair, X = X3 and d = d3
– 1
= 0.5 , d = 1
1
X
– 1
1
–1
0.5
t
Compute net3 =W3 X = [0.974 – 0.956 0.0027 0.531]
– 1
AI&SC (MU-Sem. 7-Comp) 5-52 Artificial Neural Network
1 1
f (net3) = 2 (1 – o3 ) = 2 (1 – (– 0.843) ) = 0.145
2 2
1
–1
∆W3 = (0.1) (d – o ) f (net ) X = (0.1) (1 – (– 0.843)) (0.145) 0.5
3 3
–1
+ 1 0.0267
–1 – 0.0267
∆W3 = (0.0267) 0.5 = 0.0133
– 1 – 0.0267
–0.956 0.0267 –0.929
0.974 – 0.0267 0.947
W + ∆ W = 0.0027 + 0.0133 = 0.016
ge
W4 = 3 3
2 1 t
X1 = 0, d1 = – 1 X2 = – 2, d2 = + 1 . The initial weights are W 1 = [1 0 1]
–1 –1
n
bl kn
Soln. :
For delta learning rule
at
Pu ch
t
neti = Wi X …(1)
2
oi = f(neti) = –1 …(2)
Te
1 + exp (– neti )
1
f (neti)
2
=
2(1 – oi ) …(3)
= c(d-oi) )
1 2
∆Wi
2 (
1 – oi …(4)
st
Step 1 : Take 1 training pair X = X1 and d = d1
2
X = 0 , d= –1
–1
t 2
Compute net1 = W1 X = [1 0 1] 0 = 2 – 1 = 1
–1
2
o1 = – 1 = 0.463
1 + exp(– 1)
1
f (net1)
2
= (1 – (0.463) ) = 0.393
2
2
∆W1 = (0.25)(– 1 – 0.463) (0.393) 0
–1
AI&SC (MU-Sem. 7-Comp) 5-53 Artificial Neural Network
2 – 0.287
= (– 0.1437) 0 = 0
–1 0.1437
1 – 0.287 0.713
W2 = W1+ ∆W1= 0 + 0 = 0
1 0.1437 1.1437
nd
Step 2 : Take 2 training pair, X = X2 and d = d2
1
X = – 2 , d = 1
– 1
1
Compute net2 = W2t X = [0.713 0 1.1437] – 2
– 1
= [0.713 – 1.1437] = – 0.4307
ge
o2 = – 0.2119
1 1
f (net2)
2
= (1 – (– 0.2119) ) = (1 – 0.0449) = 0.477
2 2
io led
∆W2 = (0.25) (1 – (– 0.2119)) (0.477) X2
1 0.145
ic ow
W3 = W2 +∆W2
1.143 –0.145 0.998
at
Ex. 5.8.8 : A single neuron network using f (net) = sgn (net) has been trained using the pairs of (xi, di) as given below :
Pu ch
t
X1 = [ 1 – 2 3 – 1] , d1 = – 1
t
X2 = [ 0 – 1 2 – 1] , d2 = 1
Te
t
X3 = [– 2 0 – 3 – 1] , d3 = – 1
Knowing that correction has been performed at each step for c = 1, determine the following weights :
(a) W 3 , W 2 , W 1 by backtracking the training
Soln. :
(a) Backtracking
3
= 6
2
Step 3 : W4
1
W4 = W3 + W3
W = c (d3 – o3) X3
AI&SC (MU-Sem. 7-Comp) 5-54 Artificial Neural Network
We know that the correction has been performed at every step this means, actual output and desired outputs were
not same.
d2 = + 1 o2 = –1
d3 = – 1 o3 = +1
0
–2 4
0
= (1) (– 1 – 1)
–3 6
W3 =
–1 2
W4 = W3 + W3
ge
W3 = W4 – W3
2 0
3 4 –1
io led
2
=
6 6 0
– =
1 2 –1
ic ow
Step 2 : W 3 = W 2 + W 2
n
W 2 = c (d2 – o2) X2
bl kn
0 0
–1 –2
= (1) (1 – (– 1))
2 4
at
Pu ch
–1 –2
Te
Now W 2 = W3 – W2
–1 0 –1
2 –2 4
=
0 4 –4
– =
–1 –2 1
Step 1 : W 2 = W 1 + W 1 and W1=c (d1 – o1) X1
1 1 –2
–2 –2 4
= (1) (1 – (– 1))
3
= (– 2)
3 –6
=
–1 –1 2
W1 = W2 – W1
–1 –2 1
4 4 0
=
–4 –6 2
– =
1 2 –1
AI&SC (MU-Sem. 7-Comp) 5-55 Artificial Neural Network
Thus,
1 –1 –1
0 4 2
=
2
=
–4
=
0
W1 , W2 W3
–1 1 –1
(b) Find W5, W6 and W7
1
–2
=
3
X and d = – 1
–1
ge
t
net4 = W4 X
1
io led
–2
= [3 2 6 1]
3
–1
ic ow
= 3 – 4 + 18 – 1 = 16
n
o4 = sgn (net4) = + 1
bl kn
W4 = c (d – o4) X
at
Pu ch
1 1 –2
–2 –2 4
= (1) (– 1 – 1)
3 3 –6
= (1) (– 2) =
Te
–1 –1 2
2 6
3 –2 1
4
+ W =
6 –6 0
W5 = W4 4 + =
1 2 3
Step 5 : Take second training pair,
set X = X2 and d = d2
0
–1
=
2
X d=1
–1
0
–1
X=[1 6 0 3]
2
t
net5 = W5
–1
= 0–6+0–3=–9
AI&SC (MU-Sem. 7-Comp) 5-56 Artificial Neural Network
o5 = sgn (– 9) = – 1
W5 = c (d – o5) X
0 0 0
–1 –1 –2
= (1) (1 – (– 1))
2
=2
2 4
=
6 –2 4
1 0 1
=
0 4 4
+ =
3 –2 1
Step 6 : Take third training pair,
ge
set X = X3 and d = d3
–2
io led
0
=
–3
X d=–1
–1
ic ow
t
net6 = W6 X
n
–2
bl kn
0
= [1 4 4 1]
–3
=– 2 + 0 – 12 – 1 = – 14
–1
at
Pu ch
o6 = sgn (– 14) = – 1
Te
–2
0
= c (d – o ) X=(1) (– 1 – (– 1))
–3
W6 6 =0
–1
4
1
=
4
W7 = W6
1
6 4 4
1 1 1
=
0
=
4
=
4
Hence W5 W6 W7
3 1 1
Ex. 5.8.9 : Apply Back propagation Algorithm to find the final weights for the following net. Inputs : x = [0.0, 1.0], Weights
between hidden and output layers : w = [0.4, 0.2], Blass on the Output Node O is W O = [– 0.4]. Weights
between input and hidden layer : v = [2, 1 ; 1.2], Bias on Hidden Unit nodes are VO = [0.1 0.3],
Desired output : d = 1.0. MU - Dec. 15, 10 Marks
AI&SC (MU-Sem. 7-Comp) 5-57 Artificial Neural Network
Fig. P.5.8.9
Soln. :
ge
io led
ic ow
n
bl kn
at
Pu ch
Fig. P. 5.8.9(a)
Te
= 0 + 1 + 0.1
= 1.1
Z1 = f (netz1)
Using bipolar continuous activation function, we have
2
f(net) = –net – 1 [Assume = 1]
1+e
2
= –net – 1
1+e
2 2
Z1= f(netz1) = –netz – 1 = –1.1 – 1
1+e 1 1 +e
z1 = 0.5005
AI&SC (MU-Sem. 7-Comp) 5-58 Artificial Neural Network
= 2.3
2
z2 = f(net z2) = –2.3 – 1
1+ e
z2 = 0.8178
(ii) Computing output layer activations (y)
nety = w1 z1 + w2 z2 + w01
ge
y = f(nety) = –0.03624 –1
1+ e
y = – 0.0181
io led
(II) Backward Pass :
Computing error signal vectors for both layers.
(i) Computing error signal terms ok for output layer
ic ow
1 2
ok = (d – ok) (1 – O ) , for k = 1, 2,…. K
n
2 k k
bl kn
1
y
2
= (d – y) (1 – y ) Since, there is only one neuron ‘y’
2
at
1
Pu ch
y
2
= (1 – (– 0.0181)) (1 – (– 0.0181) )
2
y = 0.5089
Te
W1new = W1old + y z1
AI&SC (MU-Sem. 7-Comp) 5-59 Artificial Neural Network
= – 0.4 + 0.5089
W01 = 0.1089
(iv) Adjusting hidden layer weights (v11, v12, v21, v22)
ge
v11 = v11 + z1x1 = 2+ 0.0763 (0) = 2
1 2
Cycle error E = ½ (d – y) 2+ E = (1 – (-0.0181)) = 0.5183
2
at
Pu ch
Review Questions
Q. 6 Explain Error back propagation training algorithm with the help of a flowchart.
Q. 7 A single neuron network using f (net) = sgn (net) has been trained using the pairs of (Xi, di) as given below :
t
X1 = [1 – 2 3 – 1] , d1 = – 1
t
X2 = [0 – 1 2 – 1] , d2 = 1
t
X3 = [– 2 0 – 3 – 1] , d3 = – 1
Knowing that correction has been performed in each step for c = 1, determine the following weights :
(a) W 3, W 2, W 1 by backtracking the training.
(b) W 5, W 6, W 7 obtained for steps 4, 5, 6 of training by reusing the sequence
(X1, d1), (X2, d2), (X3, d3),
Q. 9 Determine the weights after one iteration for hebbian learning of a single neuron network starting with initial weights
w = [1, – 1], inputs as X1 = [1, – 2], X2 = [2, 3], X3=[1, – 1] and c = 1.
ge
Q. 11 Explain with neat diagram supervised and unsupervised learning.
io led
Q. 12 A neuron with 3 inputs has the weight vector w = [0.1 0.3 – 0.2]. The activation function is binary sigmoidal activation
function. If input vector is [0.8 0.6 0.4] then find the output of neuron.
Q. 15 Determine the weights after four steps of training for percenptron learning rule of a single neuron network starting with
initial weights :
at
Pu ch
t t
W = [0 0] , inputs as X1 = [2 2] ,
t t t
X2 = [1 – 2] , X3 = [– 2 2] , X4 = [– 1 1] ,
Te
d1 = 0, d2 = 1, d3 = 0, d4 = 1 and c = 1.
Q. 17 What is learning in neural networks ? Differentiate between supervised and unsupervised learning.
6 Expert System
Unit VI
Syllabus
6.2 Expert system : Introduction, Characteristics, Architecture, Stages in the development of expert system,
ge
6.1 Hybrid Approach io led
6.1.1 Introduction to Hybrid Systems
Hybrid systems are those for which more than one soft computing technique is integrated to solve a real-world
ic ow
problem.
Neural networks, fuzzy logic and genetic algorithms are soft computing techniques which have been inspired by
n
bl kn
Each of these technologies has its own advantages and limitations. For example, while neural networks are good at
at
Pu ch
recognizing patterns, they are not good at explaining how they reach their decisions.
Fuzzy logic systems, which can reason with imprecise information, are good at explaining their decisions but they
Te
cannot automatically acquire the rules they use to make those decisions.
These limitations have been a central driving force behind the creation of intelligent hybrid systems where two or
more techniques are combined in a manner that overcomes the limitations of individual techniques.
In sequential hybrid systems, all the technologies are used in a pipe-line fashion. The output of one technology
becomes the input to another technology (Fig. 6.1.1).
This kind of hybridization form is one of its weakest, because it does not integrate different technologies into a single
unit.
AI&SC (MU-Sem. 7-Comp) 6-2 Expert System
An example is a GA pre-processor which obtains the optimal parameters such as initial weights, threshold, learning
rate etc. and hands over these parameters to a neural network.
ge
In this type, a technology treats another technology as a “subroutine” and calls it to process or generate whatever
io led
information is needed by it. Fig. 6.1.2 illustrates the auxiliary hybrid system.
ic ow
n
bl kn
at
Pu ch
Te
An example is a neuro - genetic system in which an NN employs a GA to optimize its structural parameters, i.e.
parameters which define Neural Network’s architecture.
In embedded hybrid systems, the technologies are integrated in such a manner that they appear to be intertwined.
The fusion is so complete that it would appear that no technology can be used without the other for solving the
problem. Fig. 6.1.3 depicts the schema for an embedded hybrid system.
Example of this system is a NN-FL (Neural Network Fuzzy Logic) hybrid system that has NN which receives information,
processes it and generates fuzzy outputs as well.
AI&SC (MU-Sem. 7-Comp) 6-3 Expert System
ge
Fuzzy neural system is a system with seamless integration of fuzzy logic and neural networks.
io led
While fuzzy logic provides an inference mechanism under cognitive uncertainty, Neural networks offer advantages,
such as learning, adaptation, fault-tolerance, parallelism and generalization.
ic ow
To enable a system to deal with cognitive uncertainties like humans, we need to incorporate the concept of fuzzy logic
into the neural networks.
n
The computational process for fuzzy neural systems is as follows.
bl kn
The first step is to develop a “ fuzzy neuron” based on the understanding of biological neuronal morphologies.
at
Pu ch
This leads to the following three steps in a fuzzy neural computational process.
(a) Model 1 :
In this model the output of fuzzy system is fed as an input to the neural networks.
The input to the system is linguistic statements. The fuzzy interface block converts these linguistic statements into
an input vector which is then fed to a multi-layer neural network. The neural network can be adapted (trained) to
yield desired command outputs or decisions.
(b) Model 2 :
In second model, a multi-layered neural network drives the fuzzy inference mechanism. That is the output of
neural networks is fed as an input to the fuzzy system.
ge
Here, neural networks are used to tune membership functions of fuzzy systems that are employed as decision-
making systems for controlling equipment. Neural network learning techniques can automate this process and
io led
substantially reduce development time and cost while improving performance.
Q. Explain the architecture of ANFIS with the help of a diagram. (Dec. 14, 10 Marks)
n
Q. Draw the five layer architecture of ANFIS and explain each layer in brief. (Dec. 15, 5 Marks)
bl kn
A well –known and practically used Neuro-Fuzzy system is ANFIS (Adaptive Neuro –fuzzy Inference system) explained
at
framework.
ANFIS Architecture:
Fig. 6.1.6 shows the first-order Sugeno model and its equivalent ANFIS architecture is shown in Fig. 6.1.7.
f1 = p1x + q1y + r1
w1 f1 + w2 f2 – –
f = = w1 f1 + w2 f2
w1 + w2
f2 = p2 x + q2 y + r2
Assume that the fuzzy inference system under consideration has two inputs x and y and one output.
For a first-order Sugeno fuzzy model, a common rule set with two fuzzy rules is,
ge
Rule 1: If x is A1 and y is B1 then f1 = p1x + q1y + r1
Neurons in this layer represent fuzzy sets used in the antecedents of fuzzy rules. A neuron in the fuzzification layer
n
receives a crisp input and determines the degree to which this input belongs to the neuron’s fuzzy set.
bl kn
Every node in this layer is an adaptive node and a node function can be represented as,
= Ai (x), for i = 1, 2 or
at
O1, i
Pu ch
Where x (or y) are the input to the node i and Ai (or Bi-2) is a linguistic label associated with this node.
Any appropriate membership function can be used such as generalized bell function
1
A (x) =
1+
x – ci 2b
ai
where, {ai, bi, ci} is the parameter set. Parameters in this layer are called premise parameters.
The number of neurons (or nodes) in this layer will be equal to the number of rules.
AI&SC (MU-Sem. 7-Comp) 6-6 Expert System
Nodes in this layer represent fuzzy sets used in the consequent of fuzzy rules.
Every node in this layer is an adaptive node with a node function represented as,
– –
O4, i = wi fi = wi (pi x + qi y + ri) , i=1, 2
ge
where wi is the normalized firing strength from layer 3 and {pi, qi, ri} is the parameter set of this node.
Every node in this layer represents a single output of the neuro-fuzzy system. It takes the output fuzzy sets clipped by
the respective integrated firing strengths and combines them into a single fuzzy set.
n
Output of this node can be represented as,
bl kn
– i wi fi
O5, 1 = wi fi = , i = 1, 2
i wi
at
Pu ch
The structure of this adaptive network is not unique. The alternative structure is shown Fig. 6.1.8.
Te
Artificial intelligent aims to implement intelligence in machines by developing computer programs that exhibit
intelligent behavior. To develop systems which are capable of solving complex problems, it needs efficient access to
substantial domain knowledge, a good reasoning mechanism and an effective and efficient way of representing
knowledge and inferences; to apply the knowledge to the problems they are supposed to solve. Expert systems also
need to explain to the users how they have reached their decisions.
Expert systems are generally based on the way of knowledge representation, production rule formation, searching
mechanism and so on. Usually, to build an expert system, system’s shell is used, which is an existing knowledge
independent framework, into which domain knowledge can be added to produce a working expert system. This avoids
programming for each new system from scratch.
Often, the two terms, Expert Systems (ES) and Intelligent Knowledge Based Systems (IKBS), are used synonymously.
Expert systems are programs whose knowledge base contains the knowledge used by human experts, in place of
knowledge gathered from textbooks or non-experts. Expert systems have the most widespread areas of artificial
ge
intelligence application.
Expert systems simulate human reasoning process in the given problem domain, whereas computer applications try to
n
simulate the domain itself.
bl kn
Expert systems use various methods to represent the domain knowledge gathered from human expert. It performs
at
Pu ch
reasoning over representations of human knowledge, in addition to numerical calculations or data retrieval. In order
to do this, expert systems have corresponding distinct modules referred to as the inference engine and the
Te
knowledge base. Whereas in case of computer applications it might be just the calculations performed on available
data, without inference knowledge.
3. Use approximations
Expert systems tend to solve the problems using heuristics or approximate methods or probabilistic methods which
are very much like how human do in general. While in case of computer applications strict algorithms are follows to
produce solutions, most of the times which do not guarantee the result to be correct or optimal.
4. Provide explanations
Expert systems usually need to provide explanations and justifications of their solutions or recommendations in order
to make user understand their reasoning process to produce a particular solution. This type of behavior is hardy
observed in case of computer applications.
Data interpretation : There are different types of data to be interpreted by expert system, which have various formats
and features. Example: sonar data, geophysical measurements.
Diagnosis of malfunctions : While collecting data from machines or from experts there can be shortfall of accuracy or
mistakes in readings. Example equipment faults or human diseases.
AI&SC (MU-Sem. 7-Comp) 6-8 Expert System
Structural analysis : If system is build for a domain where complex objects like, chemical compounds, computer
systems are used; configuration of these complex objects must be studied by the expert system.
Planning : Expert systems are required to plan sequences of actions in order to perform some task. Example: actions
that might be performed by robots.
Prediction : Expert systems need to predict the future basis on past knowledge and current information available.
Example: weather forecast, exchange rates, share prices, etc.
High Performance : Expert systems are generally preferred because of their high performance. In the sense, they can
process huge amount of data to produce results considering several details, in acceptable amount of time. The
response time is very less.
Highly responsive : Expert system are required to be highly responsive and user friendly. User can ask any query
system should be able to produce the appropriate reply to it. Even if the query asked by user is not answerable with
ge
the existing knowledgebase, expert system should give some informative reply about the question.
Reliable : Expert systems are highly reliable as they process huge amount of database. Hence the results produce by
io led
the system are always close to exact.
There are no fundamental limits on what problem domains an expert system can be built to deal with. Expert systems
can be developed almost for every domain for which requires human expert. However, the domain should ideally be one in
n
which an expert requires a few hours to accomplish the task. This section explains some of the expert systems which have
bl kn
(i) Dendral : This system is considered to be the first expert system in existence. Dendral identifies the molecular
Pu ch
(ii) Mycin : This is a milestone in expert system development which, made significant contributions to the medical field;
Te
but was not used in actual practice. It provides assistance to physicians in the diagnosis and treatment of meningitis
and bacterial infections. It was developed by Stanford University.
(iii) Altrex : It helps to diagnose engine troubles of certain models of Toyota cars, used in a central service department
which can be called up by those actually servicing the cars for assistance, if required. It was developed by their
research lab.
(iv) Prospector : This expert system is successful to locate deposits of several minerals, including copper and uranium. It
was developed by SRI International.
(v) Predicate : This expert system provides estimate of the time required to construct high rise buildings. It was
developed by Digital Equipment Corporation for use by Land Lease, Australia.
Every expert system is developed for a specific task domain. It is the area where human intelligence is required. Task
refers to some goal oriented problem solving activity and Domain refers to the area within which the task is being
performed.
ge
io led
ic ow
n
bl kn
at
Pu ch
If we consider inference engine as the brain of the expert systems, then knowledge base is the heart. As the heart is
more powerful, the brain can function faster and efficient way. Hence the success of any expert system is more or less
depends on the quality of knowledgebase it works on.
1. Knowledge base
There are two types of knowledge expert systems can have about the task domain.
(i) Factual knowledge :It is the knowledge which is accepted widely as a standard knowledge. It is available in text
books, research journals and internet. It is generally accepted and verified by domain experts or researchers of
that particular field.
(ii) Heuristic knowledge : It is experiential, judgmental and may not be approved or acknowledged publically. This
type of knowledge is rarely discussed and is largely individualistic. It doesn’t have standards for evaluation of its
correctness. It is the knowledge of good practice, good judgment and probable reasoning in the domain. It is the
knowledge that is based on the “art of good guessing.” It is very subjective to the practitioner’s knowledge, and
experience in the respective problem domain.
AI&SC (MU-Sem. 7-Comp) 6-10 Expert System
The knowledge base an expert uses is based on his learning from various sources over a time period. Hence, the
knowledge store of an expert increases with number of years of experience in the given field, which allows him to
interpret the information in his databases to advantage in diagnosis, analysis and design.
2. Inference engine
The inference engine has a problem solving module which organizes and controls the steps required to solve the
problem. A common but powerful inference engine involves chaining of IF…THEN rules to form a line of reasoning.
There are two types of chaining practices to solve a problem.
1. Forward Chaining: This type of reasoning strategy starts from a set of conditions and moves toward some
conclusion, also called as data driven approach.
2. Backward Chaining : Backward chaining is a goal driven approach. In this type of reasoning, the conclusion is
known and the path to the conclusion needs to be found out. For example, a goal state is given, but the path to
ge
that state from start state is not known, then backward reasoning is used.
Inference engine is nothing but these methods implemented as program modules. Inference engine manipulates
io led
and uses knowledge in the knowledge base to generate a line of reasoning.
ic ow
n
bl kn
Knowledge engineer needs to learn how the domain expert reasons with their knowledge by interviewing them. Then
he translates his knowledge into programs using which he designs the interface engine. There might be some
uncertain knowledge involved in the knowledgebase; knowledge engineer needs to decide how to integrate this with
the available knowledgebase. Lastly, he needs to decide upon what kind of explanations will be required by the user,
and according to that he designs the inference levels.
To develop an expert system following are the general steps followed. It is an iterative process. Steps in developing an
expert system include:
1. Identify problem domain
The problem must be suitable for an expert system to solve it.
Find the experts in task domain for the ES project.
ge
Establish cost-effectiveness of the system.
2. Design the system
io led
Identify the ES Technology
Know and establish the degree of integration with the other systems and databases.
ic ow
Realize how the concepts can represent the domain knowledge best.
The knowledge engineer uses sample cases to test the prototype for any deficiencies in performance.
End users test the prototypes of the ES.
6. Maintain the ES
Keep the knowledge base up-to-date by regular review and update.
Cater for new interfaces with other information systems, as those systems evolve.
Knowledge affects the development, efficiency, speed, and maintenance of the system. Knowledge representation is a
way to transform human knowledge to machine understandable format. It is a very challenging task in expert systems,
as the knowledge is very vast, unformatted and most of the times it is uncertain. Knowledge representation formalizes
and organizes the knowledge required to build an expert system.
AI&SC (MU-Sem. 7-Comp) 6-12 Expert System
The knowledge engineer must identify one or more forms in which the required knowledge should be represented in
the system. He must also ensure that the computer can use the knowledge efficiently with the selected reasoning
methods. As the quality of knowledge matters, the representation used also matters a lot as that will best allow a
programmer to write the code for the system.
A number of knowledge-representation techniques have been devised till date to represent the knowledge efficiently,
but finally it depends on the application, the design of system. Few of the knowledge representation techniques are
mentions below.
o Production rules
o Decision trees
o Semantic nets
o Factor tables
o Attribute-value pairs
o Frames
ge
o Scripts
o Logic
o Conceptual graphs
io led
Out of these the most commonly used methods for representing domain knowledge are Production Rules, Semantic
Nets and Frames. Let’s have detail study of these methods.
ic ow
a. Production Rules
n
One widely used representation of knowledge is the set of production rules, or simply rule tree. A rule has a
bl kn
condition and an action associated with it. The condition part is identified by keyword “IF”. It lists a set of
conditions in some logical combination. Actions are specified in “THEN” part.
at
Pu ch
As the IF part of the rule is satisfied; consequently, the THEN part actions can be taken. The piece of knowledge
represented by the production rule is used to produce the line of reasoning. Expert systems whose knowledge is
Te
represented in rule form are called rule-based systems. We have studied rule-based agents named as simple
reflex agents in chapter 1.
As human thinking is evolved on the basis of situation conclusion or conditionaction basis, this model is
predominantly used representing knowledge in ES.
IF-THEN rules are the simplest and efficient way to represent expert’s knowledge. It takes following form.
IF a1, a2, a3,… , anTHEN b1, b2, b3, …bn
Design advisor : [Steele et al., 1989] is a system that critiques chip designs. Its rules look like :
If : the sequential level count of ELEMENT is greater than 2,
UNLESS the signal of ELEMENT is resetable
then : critique for poor resetability
DEFEAT : poor resetability of ELEMENT
due to : sequential level count of ELEMENT greater than 2
by : ELEMENT is directly resettable
AI&SC (MU-Sem. 7-Comp) 6-13 Expert System
b. Semantic Nets
A semantic net or semantic network is a knowledge representation technique used for propositional information,
so it is also called a propositional net. In semantic networks the knowledge is represented as objects and
relationships between objects. They are two dimensional representations of knowledge. It conveys meaning.
Relationships provide the basic structure for organizing knowledge. It uses graphical notations to draw the
networks. Mathematically a semantic net can be defined as a labelled directed graph. As nodes are associated
with other nodes semantic nets are also referred to as associative nets.
Semantic nets consist of nodes, links and link labels. Objects are denoted by nodes of the graph while the links
indicate relations among the objects. Nodes can appear as circles or ellipses or rectangles to represent objects
such as physical objects, concepts or situations. Links are drawn as arrows to express the relationships between
objects, and link labels specify specifications of relationships.
ge
The two nodes connected to each other via a link are related to each other. The relationships can be of two types:
“IS-A” relationship or “HAS” relationship. IS-A relationship stands for one object being “part of” the other related
io led
object. And “HAS” relationship indicates one object “consists of” the other related object. These relationships are
nothing but the super class subclass relationships. It is assumed that all members of a subclass will inherit all the
ic ow
properties of their super classes. That’s how semantic network allows efficient representation of inheritance
reasoning.
n
bl kn
at
Pu ch
Te
For example Fig. 6.5.1 is showing an instance of a semantic net. In the Fig. 6.5.1 all the objects are within
rectangles and connected using labeled arcs. The links are given labels according to the relationships. This makes
the network more readable and conveys more information about all the related objects. For example, the
“Member Of” link between Jill and Female Persons indicates that Jill belongs to the category of Female Persons.
It does also indicate the inheritance among the related objects. Like, Jill inherits the property of having two legs
as she belongs to the category of Female Persons which in turn belongs to the category of Persons which has a
boxed Legs link with value 2.
AI&SC (MU-Sem. 7-Comp) 6-14 Expert System
Semantic nets also can represent multiple inheritance through which, an object can belong to more than one
objects and an object can be a subset of more than one another objects. It does also allows a common form of
inference known as inverse links. The inverse links make the job of inference algorithms much easier to answer
reverse queries.
For example, in Fig. 6.5.1 there IS-A HAS Sister link which is the inverse of Sister Of link. If there is a query “such as
who the sister of Jack ?”The inference system will discover that Has Sister is the inverse of Sister of, to make the
inference algorithm follow the link HAS Sister from Jack to Jill and answer the query.
ge
4. Semantic nets are easy to implement using PROLOG.
Frames provide a convenient structure for representing objects that are typical to stereotypical situations. For
example, visual scenes, structure of complex physical objects etc. In this technique knowledge is decomposed
at
Frame is a type of schema used in many Artificial Intelligence applications including vision and natural language
processing. Frames are also useful for representing commonsense knowledge.
Te
As frames can represent concepts, situations, attributes of concepts, relationships between concepts, and also
procedures to explain their relationships. It allows nodes to have structures and hence is regarded as 3-D
representations of knowledge.
A frame is also known as unit, schema, or list. Typically, a frame consists of a list of properties of the object and
associated values for the properties ; similar to the fields and values; also called as slots and slot fillers. The
contents of slot can be a string, numbers, functions, procedures, etc.
A frame is a group of slots and fillers that defines a stereotypical object. Rather than a single frame, frame
systems usually have collection of frames connected to each other. One of the attribute values can be another
frame. For example, Fig. 6.5.2 shows a frame for a book object.
Slots Fillers
Publisher Techmax
Title Intelligent Systems
Author PurvaRaut
Edition First
Year 15
Pages 275
This is one of the simplest frames, but frames can have more complex structure in real world. A powerful
knowledge system can be built with filler slots and inheritance provided by frames. Following Fig. 6.5.3 is the
example for generic frame.
Slot Fillers
Name Laptop
ge
under_warranty
io led (yes, no)
From Fig.6.5.3 we can conclude that fillers are of various types and it may also include procedural attachments.
ic ow
3. If-needed : It’s a procedural attachment. It will be executed when a filler value is needed.
4. Default :“Default” value is taken if no other value exists. It represents common sense knowledge, when no
at
Pu ch
example, on arrival of a new type of laptop, ADD_LAPTOP procedure should be executed to add that
information.
6. If-removed :It’s a procedural attachment. It is used to remove a value from the slot.
Finally, the domain knowledge in an expert system will reside in a Knowledge Representation Language (KRL),
such as an expert system shell. Let’s understand what it is.
Fig. 6.6.1 depicts generic components of expert system shell. It includes Knowledge acquisition system,
Knowledgebase, Inference engine, Explanation subsystem, and user interface. Knowledgebase and inference
mechanism are the core components of shell.
ge
Let us understand in short what each component is, and what it is used for.
1. Knowledge acquisition system : Knowledge acquisition is the first and the fundamental step in building an expert
io led
system. It helps to collect the experts knowledge required to solve the problems and build the knowledgebase.
But this is also the biggest bottleneck of building ES.
ic ow
2. Knowledge base : This component is the heart of expert system. It stores all factual and heuristic knowledge
about the application domain. It provides with the various representation techniques for all the data, as the
n
programmers are required to program the system accordingly.
bl kn
3. Inference mechanism : Inference engine is the brain of expert system. This component is mainly responsible for
at
generating inference from the given knowledge from the knowledgebase and produce line of reasoning in turn
Pu ch
4. Explanation subsystem : This part of shell is responsible for explaining or justifying the final or intermediate
result of user query. It is also responsible to justify need of additional knowledge.
5. User interface : It is the means of communication with the user. Earlier it was not use to be a part of expert
systems, as there was no significance associated with user interface. But later on it was also recognized as an
important component of the system, as it decides the utility of expert system.
Building expert systems by using shells has significant advantages. It is always advisable to use shell to develop expert
system as it avoids building the system from scratch. To build an expert system using system shell, one needs to enter
all the necessary knowledge about a task domain into a shell. The expert can himself create the knowledgebase by
undergoing some training on how to use the shell. The inference engine that applies the knowledge to the given task
is built into the shell.
There are many commercial shells available today, “EMYCIN” derived from MYCIN ; “Inter modeller” used to develop
educational expert systems, to name a few. There are variety of sizes for shells, starting from shells on PCs, to shells
on large mainframe computers. They range in complexity from simple, forward-chained, rule-based systems.
Accordingly to the size and complexity, shell price range from hundreds to tens of thousands of dollars. Application
wise, they range from general-purpose shells to customized ones, such as financial planning or real-time process
control.
AI&SC (MU-Sem. 7-Comp) 6-17 Expert System
6.7 Explanations
Expert systems are developed with the aim of efficient and maximum utilization of technology in place of human
expertise. To achieve this aim, along with the accuracy in the working, also the user interface must be good.
User must be able to interact with system easily. To facilitate user interactions, system must possess following two
properties:
1. Expert systems must be able to explain its reasoning. In many cases user are interested in not only knowing the
answers to their queries but also how the system has generated that answer. That will ensure the accuracy of the
reasoning process that has produced those answers. Such kind of reasoning will be required typically in medical
applications, where doctor needs to know why a particular medicine is advised for a particular patient, as he
owns the ultimate responsibility for the medicines he subscribe. Hence, it is important that the system must store
enough meta-knowledge about the reasoning process and should be able to explain it to the user in an
understandable way.
ge
2. The system should be able update its old knowledge by acquiring new knowledge. As the knowledgebase is
where the system’s power resides, expert system should be able to maintain the complete, accurate and up to
io led
date knowledge about the domain. It is easy said than done!! As the system is programmed based on the
available knowledgebase, it is very difficult to adapt to the changes in knowledgebase. It must have some
mechanism through which the programs will learn its expert behavior from raw data. Another comparatively
ic ow
simple way is to keep on interacting with human experts and update the system.
TEIRESIES was the first expert system with both these properties implemented in it. MYCIN used TEIRESIES as its user
n
interface.
bl kn
As the TEIRESIAS-MYCIN system answers the user questions, he might be satisfied or might want to know the
at
reasoning behind the answers. User can very well find it out by asking “HOW” question.
Pu ch
The system will interpret it as “How do you know that?” and answers it by using backward chining starting from the
Te
answer to one of the given fact or rule. TEIRESIAS-MYCIN can do fairly good job in satisfying the user’s query and
providing proper reasoning for it.
Types of Explanations
There are four types of explanations generally the expert system is asked for there are:
1. Report on rule trace on progress of consultation.
2. Explanation on how the system has reached to a particular conclusion.
3. Explanation of why the system is asking question.
4. Explanation of why the system did not give any conclusion.
The knowledge acquisition component allows the expert to enter their knowledge or expertise into the expert system.
It can be refined as and when required. Nowadays automated systems that allow the expert to interact directly with
the system are becoming increasingly common, thereby reducing the work pressure of knowledge engineer.
The knowledge acquisition process has three principal steps. Fig. 6.8.1 depicts them in a diagrammatic form. They are
as follow :
1. Knowledge elicitation : Knowledge engineer needs to interact with the domain expert and get all the knowledge.
He also needs to format it in a systematic way so that it can be used while developing the expert system shell.
2. Intermediate knowledge representation : The knowledge obtained from the domain expert needs to be store in
some intermediate representation, such that, it can be worked upon to produce the final refined version.
3. Knowledgebase representation : The intermediate representation of the knowledge needs to be complied and
transformed into an executable format. This version of knowledge is ready to get uploaded to system shell as it is.
e.g. production rules, that the inference engine can process.
In the process of expert system development, numbers of iterations through these three stages are required in
ge
order to equip the system with good quality knowledge.
The iterative nature of the knowledge acquisition process can be represented in Fig. 6.8.1.
io led
ic ow
n
bl kn
at
Pu ch
Te
The knowledge elicitation is the first step of knowledge acquisition. In this process itself there are several stages.
Generally knowledge engineer performs these steps.
These steps need to be carried out before meeting the domain expert to collect the quality knowledge. They are as
follows.
1. Gather maximum possible data about the problem domain from books, manuals, internet, etc., in order to
become familiar with specialist terminology and jargons of the problem domain.
2. Identify the types of reasoning and problem solving tasks that the system will be required to perform.
3. Find domain expert or team of experts who are willing to work on the project. Sometimes experts are frightened
of being replaced by a computer system!
4. Interview the domain experts multiple times during the course of building the system. Find out how they solve
the problems that, the system is expected to solve. Have them check and refine the intermediate knowledge
representation.
AI&SC (MU-Sem. 7-Comp) 6-19 Expert System
Knowledge elicitation is a time consuming process. There exists automated knowledge elicitation and machine
learning techniques which are increasingly being used as common modern alternatives.
1. Expert system Is knowledgebase + inference Traditional systems are Algorithms + data structures
Engine
2. Expert systems can predict future events based on TS cannot do prediction tasks so efficiently as they do not
the current data input patterns using their have a strong inference engine.
inference process.
ge
3. ES have very strong inference system to deduce TS does not have a strong inference system to deduce
knowledge from given facts
io led knowledge.
4. ES have explanation subsystem which can explain TS do not have any mechanism to justify the results.
and justify the results at any intermediate stage. Manual debugging is required to be done
ic ow
5. ES can do tasks like planning, scheduling, TS cannot do expert tasks without human intervention.
prediction, diagnosis; which require to deal with
n
current data input and knowledge from past
bl kn
6. Expert systems are able to match human expertise TS systems can only provide data based on available data,
in a particular domain provided with a complete It cannot provide user with knowledge about the domain.
Te
knowledgebase and a powerful inference engine. Hence human expertise are required to analyse the data
further to deduce knowledge from it. Hence TS cannot
eliminate Human experts it can only assist them.
ge
6. Process monitoring and control
In this category systems performing analysis of real time data are designed. These systems obtain data from
io led
physical devices and produce results specifying anomalies, predicting trends, etc. We have existing expert
systems to monitor the manufacturing processes in the steel making and oil refining industries.
7. Design and manufacturing
ic ow
These expert systems assist in the design of physical devices and processes. It is ranging from high level
conceptual design of abstract entities all the way to factory floor configuration of manufacturing processes.
n
bl kn
Review Questions
at
Pu ch
Q. 7 What are the various methods to represent domain knowledge in case of expert system?
at
io led
n ge









