CLASS IX AI PROJECT CYCLE
Unit - 1
AI REFELCTION, PROJECT CYCLE AND ETHICS (PART B)
What is AI Project Cycle?
AI Project Cycle is a step by step process to solve the problems using proven scientific methods
and drawing the inference about it.
Components of AI Project Cycle
There are six components of AI project cycle:
1. Problem Scoping - Understanding the problem
2. Data Acquisition - Collecting accurate and reliable data
3. Data Exploration - Arranging the data uniformly
4. Modelling - Creating Models from the data
5. Evaluation - Evaluating the project
6. Deployment – Deploying the project into real world
AI Project
Cycle
Problem Data Data
Modelling Evaluation Deployment
Scoping Acquisition Exploration
Types of Learning
4W's Data Types DataSets System Maps Rule Based
Graphs Based
Machine
Who Structured Training Bar Graph Decision Tree
Learning
Deep
What Unstructured Testing Pie Chart
Learning
Where Histogram
Why Line Chart
Scatterplot
1. Problem Scoping
Problem Scoping refers to understanding a
problem finding out various factors which
affect the problem, define the goal or aim of
the project.
It refers to the identification of a problem and
the vision to solve it.
For identifying a problem we need to go
through a schema, in which firstly select a
theme, then break down the theme into
issues/topics and select an issue/topic, now
further break the issue into several problems and select a problem amongst them.
[1]
CLASS IX AI PROJECT CYCLE
For example – Let’s select a theme from the diagram. Here, we made the following
selection -
Theme – Agriculture
Issue – Sowing and Harvesting Patterns
Problem – How might we help farmers to determine the best times for sowing and
harvesting their crops?
The 4Ws Canvas of Problem Scoping
The 4Ws Canvas is a very helpful tool in problem scoping. The 4W’s of Problem Scoping
are Who, What, Where and Why. These Ws helps in identifying and understanding the
problem in a better and efficient manner. They are:
Who – “Who” all are Stake Holders i.e., who are affected directly and indirectly with the
problem and who would be benefitted with the solution.
What – “What” part helps us in understanding and identifying the nature of the problem
and under this block, we also gather evidence like media, newspaper articles,
announcements to prove that the problem we have selected actually exists.
Where – “Where” part focuses on the context/situation of the problem. Here, we also
gather the exact location of the problem.
Why – “Why” we need to solve the problem and what are the benefits to the
stakeholders after solving the problem.
Now, as we have identified the problem, we need to find the 4W’s of the problem using the
4W’s Canvas.
Who - Que: Who are the Stakeholders?
Ans: Farmers, Fertilizer Producers, Labours, Tractor Companies
Que: What do we know about them?
Ans: These are the people worst affected by the problem and loses their
money and time.
What - Que: What is the problem?
Ans: Determining what will be the best time for seeding or crop harvesting.
Que: How do you know that it is a problem? (Is there any evidence)
Ans: Seeding at improper position and time leads to the wastage of money
and time (Gathered information from the farmers).
Where - Que: Determine the context/situation where the stakeholders experience
problem?
Ans: When deciding the mature age for the crop and determining its time.
Que: Where is the problem located?
Ans: In fields of Anantpur, Andhra Pradesh.
Why - Que: Why our solution will be of value to the stakeholders?
Ans: Our solution will help to determine the exact time for sowing and
harvesting of crops, thus leading to the maximum profit.
Que: How will it benefit the society?
Ans: It will lead to maximum yield.
The Problem Statement Template
When we get all the 4Ws, we need to prepare a summary of these 4Ws. This summary is
known as the problem statement template. This template explains all the key points in a
single template.
[2]
CLASS IX AI PROJECT CYCLE
This is the template:
Our [Stakeholders] Who
Has/Have a problem that [Issue, Problem, Need] What
When/While [Context/Situation] Where
An ideal solution would help [Benefit of solution for them] Why
Let’s fill the template now:
Our Farmers, Fertilizer Producers, Labours Who
Has/Have a problem To determine what will be the best time for
What
that seeding or crop harvesting
Decide the mature age for the crop and determine
When/While Where
harvest time
An ideal solution Grow the crop on time and supply against market
Why
would help demand on time
Now, let’s form a single statement from this template. Our Statement would be – Our
farmers, fertilizer producers, labours are the one who has/ have a problem to determine
what will be the best time for seeding or crop harvesting while deciding the mature age for
the crop and determine harvest time where an ideal solution would help them to grow the
crop on time and supply against market demand on time.
2. Data Acquisition
The method of collecting correct and
dependable data to work with is known as
data acquisition.
Data Acquisition consists of two words:
Data: Data refers to the raw facts,
figures, or statistics collected for
reference or analysis.
Acquisition: Acquisition refers to acquiring data for the project.
Data is a representation of facts or instructions about an entity that can be processed or
conveyed by a human or a machine, it can be in the form of text, video, photos, audio,
and so on, and it can be gathered from a variety of places such as websites, journals,
and newspapers.
Data Classification
Basically, Data is classified into 2 categories: Structured Data and Unstructured Data
SNo. Basis Structured Data Unstructured Data
The data which can be The data which cannot be stored in
stored in tabular format and pre-defined format, has images,
1 Definition easily managed by relational audios, videos etc, and cannot be
database is referred to as managed by relational database is
structured data. known as unstructured data.
Quantitative in nature, i.e., it
Qualitative in nature, as it cannot
contains measurable
2 Nature be processed using conventional
numerical values like
tools.
numbers and dates.
Managed using relational Managed using non-relational
3 Manage
database database or NO-SQL
Easy search for specific Searching for a specific data is not
4 Search
information easy
[3]
CLASS IX AI PROJECT CYCLE
5 Analysis Easy to analyze data Data analysis is not easy
Doesn’t require much
6 Storage Requires a lot of storage space
storage space
Log files, audio files, email, text
Excel files and Google docs
7 Example files, social media posts, videos,
spreadsheet
sensor data and image files
Dataset
Dataset is a collection of data in tabular format. It contains numbers or values that are
related to a specific subject. For example, student’s test scores in a class is a dataset.
Need of Splitting dataset into Training dataset and Test dataset
Splitting the dataset into training and test datasets is one
of the important parts of data pre-processing.
If we train our model with a training set and then test it
with a completely different test dataset, then our model
will not be able to understand the correlations between
the features.
Therefore, if we train and test the model with two different datasets, then it will decrease
the performance of the model.
Hence it is important to split a single dataset into two parts, i.e., train and test set.
So, here we have two types of datasets:
Training Dataset:
i. The type of training data that we provide to the model is highly responsible for the
model's accuracy and prediction ability.
ii. It means that the better the quality of the training data, the better will be the
performance of the model.
iii. It is approximately more than or equal to 60% of the total data for a project.
Test Dataset:
i. Once we train the model with the training dataset, it's time to test the model with the
test dataset.
ii. Test data is a well-organized dataset that contains data for each type of scenario for
a given problem that the model would be facing when used in the real world.
iii. This dataset evaluates the performance of the model and ensures that the model
can generalize well with the new or unseen dataset.
iv. Usually, the test dataset is approximately 20-25% of the total original data for a
project.
Data Features
Data features are the subset of dataset. It refers to the type of data we want to collect. It is
a measurable property of the object we’re trying to analyze. They are the columns in a
dataset.
Acquiring Data from Reliable Sources
Sometimes, we use the internet and try to acquire data for our project from some random
websites. Such data might not be authentic as its accuracy cannot be proved. One of the
most reliable and authentic sources of information, are the open-sourced websites hosted
by the government like: [Link], [Link]
There are six ways to collect the data:
[4]
CLASS IX AI PROJECT CYCLE
Surveys:
i. Survey is a method of gathering specific information from a predetermined sample
of respondents in order to get knowledge.
ii. For example a census survey is conducted every year for analyzing the
population.
iii. Surveys are conducted in particular areas to acquire data from particular people.
Cameras:
i. Camera captures the visual information and then that information which is called
image is used as a source of data.
ii. This data is unstructured data and can be used for computer vision projects.
Web Scraping:
i. It is a technique for collecting structured data from the internet.
ii. For example, it is used for news monitoring, market research and price tracking.
Observation:
i. Some of the information we can gather through attentive observation and
monitoring. It is a time consuming data source.
ii. For example, Scientists take insects in observation for years and then that data
will be used as a data source.
Sensors:
i. A device which detects or measures a physical property and records, indicates
and otherwise responds to it.
ii. Example- temperature sensors, humidity sensors, pressure sensors, infrared
sensors etc.
API:
i. API stands for Application Program Interface
ii. It is a software interface that enables two applications to communicate with one
another.
System Maps
System Maps help us to find relationships between different elements of the problem
which we have scoped and strategizing the solution for achieving the goal of our
project.
It shows the components and boundary of a system and the components of the
environment at a point in time.
We use system maps to understand complex issues with multiple factors that affect
each other.
In a system, every element is interconnected.
Features of system map are:
i. Elements are being represented by circles.
ii. Relationship is being represented by arrows. There are two types of arrows:
a. Longer arrow: They represents a longer time for a change to happen.
b. Smaller arrow: They represents a smaller time for a change to happen.
iii. Loops are used to represent a specific chain of causes and effects.
iv. Nature of relationship is being represented by ‘+’ and ‘-‘ signs:
a. ‘+’ sign: They represent directly proportional relationship between
elements i.e., if one element increases other element also increases and
if one element decreases other element also decreases.
[5]
CLASS IX AI PROJECT CYCLE
b. ‘-‘ sign: They represent inversely proportional relationship between
elements i.e., if one element increases other element decreases and if
one element decreases other element increases.
Let us consider an example of water cycle to understand system map. It explains how
water completes its cycle transforming from one form to another:
i. The elements which define the Water Cycle system are: clouds, trees, snow,
underground soil, rivers, oceans, land and animals.
ii. All the elements of the Water cycle are put in circles.
iii. The map here shows cause & effect relationship of elements with each other
with the help of arrows.
iv. The arrowhead depicts the direction of the effect and the sign (+ or -) shows
their relationship.
v. If the arrow goes from X to Y with a + sign, it means that both are directly related
to each other. That is, If X increases, Y also increases and vice versa.
vi. On the other hand, If the arrow goes from X to Y with a – sign, it means that both
the elements are inversely related to each other which means if X increases, Y
would decrease and vice versa.
Q1. Which one of the following is the second stage of AI project cycle?
a. Data Exploration b. Data Acquisition c. Modelling d. Problem Scoping
Q2. Which of the following comes under Problem Scoping?
a. System Mapping b. 4Ws Canvas c. Data Features d. Web scraping
Q3. Which of the following is not valid for Data Acquisition?
a. Web scraping b. Surveys c. Sensors d. Announcements
Q4. If an arrow goes from X to Y with a – (minus) sign, it means that
a. If X increases, Y decreases b. The direction of relation is opposite
c. If X increases, Y increases d. It is a bi-directional relationship
Q5. Which of the following is not a part of the 4Ws Problem Canvas?
a. Who? b. Why? c. What? d. Which?
3. Data Exploration
Data Exploration is the process of arranging the gathered data uniformly for a better
understanding.
Data can be arranged in the form of a table, plotting a chart or making database.
In statistics, data exploration is often referred to as “exploratory data analysis” or EDA.
EDA has been a graphical approach.
It is a technique used to visualize data in the form of statistical method or using graphs.
Need of Data Exploration
A better understanding of data
Provide real-time analysis
Help to make decisions
Reduces complexity of data
Provides the relationships and patterns contained within data
Provides an effective way of communication among users
Data Exploration or Data Visualisation Tool
One of the data visualization tool is graph. We can use many types of graphs here like: bar
chart, line chart, histogram, bubble chart, dot chart, pie chart etc.
[6]
CLASS IX AI PROJECT CYCLE
a. Bar Chart:
i. Description: The Bar chart uses either horizontal or vertical bars(column chart) to
show discrete, numerical comparisons across categories.
ii. Use: Allows us to track the development of one or two variables over time.
b. Pie Chart:
i. Description: Pie chart helps to show proportions and percentages between
categories, by dividing a circle into proportional segments.
ii. Use: Use to get an idea of proportional distribution of data.
c. Line Graph:
i. Description: A line graph is a line or multiple lines showing how single or multiple
variables develop over time.
ii. Use: Allows us to track the development of several variables at same time.
d. Histogram:
i. Description: A histogram visualizes the data over a continuous interval or certain
time period.
ii. Use: Use to show the distribution or range of data.
e. Scatterplot:
i. Description: It consists of multiple data points plotted across two axes. Each
variable would have multiple observations.
ii. Use: Allows us to see whether there is a pattern to be found between two
variables.
4. Modelling
Artificial Intelligence vs Machine Learning vs Deep Learning
AI ML DL
AI stands for Artificial ML stands for Machine Deep Learning, or DL,
Intelligence, and is Learning, and is the study enables software to train
basically the that uses statistical methods itself to perform tasks with
study/process which enabling machines to vast amounts of data.
enables machines to improve with experience.
mimic human behaviour
through particular
algorithm.
AI is the broader family ML is the subset of AI. DL is the subset of ML.
consisting of ML and DL
as it’s components.
The efficiency Of AI is Less efficient than DL as it More powerful than ML
basically the efficiency can’t work for longer as it can easily work for
provided by ML and DL dimensions or higher larger sets of data.
respectively. amount of data.
Examples of AI Examples of ML applications Examples of DL
applications include: include: Virtual Personal applications include:
Google’s AI-Powered Assistants: Siri, Alexa, Sentiment based news
Predictions, Ridesharing Google, etc., Email Spam aggregation, Image
Apps Like Uber, and Malware Filtering. analysis and caption
Commercial Flights, etc. generation, etc
[7]
CLASS IX AI PROJECT CYCLE
Definition:
AI Modelling refers to developing algorithms, also called models which can be trained to get
intelligent outputs i.e., writing codes to make a machine artificially intelligent. There are two
types of AI Models: Learning Based Approach and
Rule Based Approach.
Rule Based Approach:
It refers to the AI Modelling where the
relationships or patterns in data, are
identified by the developer.
The machine follows the rules and
instructions mentioned by the developer,
and performs its task accordingly.
For example, suppose you have a dataset comprising of 100 images of
apples and 100 images of bananas.
To train your machine, you feed this data into the machine and label each
image as either apple or
banana.
Now if you test the machine
with the image of an apple, it
will compare the image with the
trained data and according to
the labels of trained images, it
will identify the test image as
an apple.
The rules given to the machine
in this example are the labels
given to the machine for each
image in the training dataset.
An example of rule based approach is decision tree.
Decision Tree:
It is a tree-structured classifier, where internal nodes represent the features of
a dataset, branches represent the decision rules and each leaf node
represents the outcome.
In a Decision tree, there are two
nodes, which are the Decision
Node and Leaf Node. Decision
nodes are used to make any
decision and have multiple
branches, whereas Leaf nodes
are the output of those decisions
and do not contain any further
branches.
[8]
CLASS IX AI PROJECT CYCLE
The decisions or the test are performed on the basis of features of the given
dataset.
It is a graphical representation for getting all the possible solutions to a
problem/decision based on given conditions.
It is called a decision tree because, similar to a tree, it starts with the root
node, which expands on further branches and constructs a tree-like structure.
A decision tree simply asks a question, and based on the answer (Yes/No), it
further split the tree into subtrees.
Decision Tree Terminologies:
Root Node: Root node is from where the decision tree starts.
Leaf Node: Leaf nodes are the
final output node, and the tree
cannot be segregated further
after getting a leaf node.
Splitting: Splitting is the
process of dividing the decision
node/root node into sub-nodes
according to the given
conditions.
Branch/Sub Tree: A tree
formed by splitting the tree.
Pruning: Pruning is the
process of removing the unwanted branches from the tree.
Parent/Child node: The root node of the tree is called the parent node, and
other nodes are called the child nodes.
Example of Decision Tree:
Suppose there is a candidate who has a job offer and wants to decide
whether he should accept the offer or Not. So, to solve this problem, he uses
decision tree.
The decision tree starts with the root node (Salary).
The root node splits further into the next decision node (distance from the
office) and one leaf node based on the corresponding labels.
The next decision node further gets split into one decision node (Cab facility)
and one leaf node.
Finally, the decision node splits into two leaf nodes (Accepted offers and
Declined offer). This gives the final outcome.
Advantages of the Decision Tree
It is simple to understand as it follows the same process which a human follow
while making any decision in real-life.
It can be very useful for solving decision-related problems.
It helps to think about all the possible outcomes for a problem.
Disadvantages of the Decision Tree
The decision tree contains lots of layers, which makes it complex.
For more class labels, the computational complexity of the decision tree may
increase.
[9]
CLASS IX AI PROJECT CYCLE
Learning Based Approach:
It refers to the AI modelling where the relationship or patterns in data are not
defined by the developer.
In this approach, random data is fed to the machine and it is left on the
machine to figure out patterns and trends out of it.
Generally this approach is followed when the data is unlabelled and too
random for a human to make sense out of it.
Thus, the machine looks at the data, tries to extract similar features out of it
and clusters same datasets together.
In the end as output, the machine tells us about the trends which it observed
in the training data.
For example, suppose you have a dataset of 1000 images of random stray
dogs of your area.
Now you do not have
any clue as to what
trend is being
followed in this
dataset as you don’t
know their breed, or
colour or any other
feature.
Thus, you would put
this into a learning
approach based AI
machine and the
machine would come
up with various
patterns it has
observed in the
features of these 1000 images.
It might cluster the data on the basis of colour, size, fur style, etc.
5. Evaluation:
After a model has been created and trained, it must be thoroughly tested in order to
determine its efficiency and performance, this is known as evaluation.
6. Deployment: Deploying the project into real world.
[10]