0% found this document useful (0 votes)

281 views23 pages

Automobile Data Mining Analysis

This document presents a project analyzing an automobile dataset using the Apriori algorithm for association rule mining. A team of 4 students extracted information from the dataset using data mining techniques to understand relationships between attributes like MPG, cylinders, horsepower, etc. They implemented the Apriori algorithm in Python code to generate rules based on minimum support and confidence thresholds. The analysis found relationships with high confidence between attributes like MPG, cylinders and origin.

Uploaded by

ravikumarrk

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

281 views23 pages

Automobile Data Mining Analysis

Uploaded by

ravikumarrk

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

ADVANCED DATABASES

AND DATA MINING

CSCI-527
PROJECT REPORT PRESENTATION

ANALASYS ON
AUTOMOBILE DATASET
TEAM MEMBERS:
Anusha Vadlamudi Narasimha Rao
50134597

Deepthi Chidura 50129270

Namitha Yellokonda 50126906

Shravya Beerakayala 50124534

Abstract

The goal of a data mining process is to

extract information from a dataset and
transform it into a format that could be used
for any purpose of the concerned field.
We have examined the Auto dataset in
which the performance of various cars are
analyzed based on their attributes such as
mpg, cylinders, displacement, horse power,
weight, acceleration, year and origin.
We have analyzed using data mining
technique Apriori algorithm.

INTRODUCTION
This Auto dataset contains the car model,
mpg (miles per gallon), cylinders,
displacement, horse power, weight,
acceleration, origin.

By using Apriori algorithm, confidence and

support allows us to generate important
decisions to find how they worked as
combined factors to find ways to increase the
performance.

The minimum support and

confidence value are set
according to the application and
the sets which satisfy this criterion
are considered to finally find out
which attributes together satisfy
the confidence and support.

DATA OF AUTO :

ATTRIBUTES DESCRIPTION

MPG
CYLINDERS
DISPLACEMENT
HORSEPOWER
WEIGHT
ACCELERATION
YEAR
ORIGIN
NAME

APRIORI ALGORITHM
The

Apriori Algorithm is an influential

algorithm for mining frequent item sets
for Boolean association rules that have
support and confidence greater than
minimum
support
(min-sup)
and
minimum
confidence
(min-conf),
respectively.

The problem of discovering all association

rules can be broken down into two parts
as follows:
Find all sets of items that have support
values greater than the minimum support.
These items are called large item sets.

Use the large item sets to generate the

desired rules.

Two factors that affect significance of

association rules:

Support: The rule X Y has support s in

the transaction set D if s% of the
transactions in D contains X Y.
Confidence: The rule X Y holds in the
transaction set D with confidence c if c%
of the transactions in D that contain X
also contain Y.

PSEUDO CODE
L1 = {large 1-itemsets};
for (k=2; Lk-1 0; k++) do
begin
Ck= apriori-gen(Lk-1); // new candidates
For all transactions t D do
begin
Ct =subset(C, t);
forall candidates c Ct do
[Link]++;
end
L k= {c Ck | [Link] minsup}
end
answer = k Lk;

DATA CLEANING
Unclean data refers to data that contains
erroneous information.

It may also be used when referring to data

that is in memory and not yet loaded into a
database. There are some missing fields in the
data fields.

UNCLEAN DATA

CODE FOR DATA CLEANING

autoData<- [Link](file =
"~/Documents//data//[Link]", header = TRUE)
horsepwr<- ([Link](autoData$horsepower))
horsepwr<- (ifelse( horsepwr== "?", 0, horsepwr))a

After data cleaning these fields are

removed.

PYTHON CODE FOR APRIORI ALGORITHM

import csv
import os
def apriori_generation_algo(data, min_support=0.3, verbose=False):
can_keys = create_candidate_keys(data)
D_map = map(set, data)
F1, supporting_data = back_prune(D_map, can_keys, min_support, verbose=False)
F = [F1]
key = 2
while (len(F[key - 2]) > 0):
candidate_keys = apriori_generation(F[key-2], key)
F_key, support_K = back_prune(D_map, candidate_keys, min_support)
supporting_data.update(support_K)
[Link](F_key)
key += 1
if verbose:
for kset in F:
for item in kset:

print("" \
+ "{" \
+ "".join(str(i) + ", " for i in iter(item)).rstrip(', ') \
+ "}" \
+ ": supp = " + str(round(supporting_data[item], 3)))
return F, supporting_data
def create_candidate_keys(data, verbose=False):
can_keys = []
for transac in data:
for item in transac:
if not [item] in can_keys:
can_keys.append([item])
can_keys.sort()
return map(frozenset, can_keys)
def back_prune(data, candidates, min_support, verbose=False):
sscount = {}
for tid in data:
for candidate in candidates:
if [Link](tid):
[Link](candidate, 0)
sscount[candidate] += 1
num_items = float(len(data))
ret_list = []
supporting_data = {}

for key in sscount:

support = sscount[key] / num_items
if support >= min_support:
ret_list.insert(0, key)
supporting_data[key] = support
if verbose:
for kset in ret_list:
for item in kset:
print("{" + str(item) + "}")
print("")
for key in sscount:
print("" \
+ "{" \
+ "".join([str(i) + ", " for i in iter(key)]).rstrip(', ') \
+ "}" \
+ ": supp = " + str(supporting_data[key]))
return ret_list, supporting_data
def apriori_generation(frequency_sets, key):
returnList = []
lenLk = len(frequency_sets)
for i in range(lenLk):
for j in range(i+1, lenLk):
a=list(frequency_sets[i])
b=list(frequency_sets[j])

[Link]()
[Link]()
F1 = a[:key-2]
F2 = b[:key-2]
if F1 == F2:
[Link](frequency_sets[i] | frequency_sets[j])
return returnList
def rules_from_conseq(frequency_set, H, supporting_data, rules, min_confidence=0.9,
verbose=False):
m = len(H[0])
if m == 1:
Hmp1 = cal_conf(frequency_set, H, supporting_data, rules, min_confidence, verbose)
if (len(frequency_set) > (m+1)):
Hmp1 = apriori_generation(H, m+1)
Hmp1 = cal_conf(frequency_set, Hmp1, supporting_data, rules, min_confidence,
verbose)
if len(Hmp1) > 1:
rules_from_conseq(frequency_set, Hmp1, supporting_data, rules, min_confidence,
verbose)
def cal_conf(frequency_set, H, supporting_data, rules, min_confidence=0.9, verbose=False):
pruned_H = []
for consequence in H:
confidence = supporting_data[frequency_set] / supporting_data[frequency_set consequence]

if confidence >= min_confidence:

append((frequency_set - consequence, consequence, confidence))
pruned_H.append(consequence)
if verbose:
print("" \
+ "{" \
rules.

+ "".join([str(i) + ", " for i in iter(frequency_setconsequence)]).rstrip(', ') \

+ "}" \
+ " --> " \
+ "{" \
+ "".join([str(i) + ", " for i in iter(consequence)]).rstrip(', ') \
+ "}" \
+ ": conf = " + str(round(confidence, 3)) \
+ ", supp = " + str(round(supporting_data[frequency_set], 3)))
return pruned_H
def gen_rules(F, supporting_data, min_confidence=0.9, verbose=True):
rules = []
for i in range(1, len(F)):
for frequency_set in F[i]:

def import_data():
with open('C:/Users/Anusha/Desktop/Auto_clean_data.csv',"rU") as fin:
data = [row for row in [Link]([Link]().splitlines())]
return data
data = import_data()
D_map = map(set, data)
can_keys = create_candidate_keys(data, verbose=True)
F1, supporting_data = back_prune(D_map, can_keys, 0.3, verbose=True)
F, supporting_data = apriori_generation_algo(data, min_support=0.05,
verbose=True)
H = gen_rules(F, supporting_data, min_confidence=0.9, verbose=True)

OBSERVATION
If mpg equals 14, Cylinders equals 8 and origin
equals 1 then confidence = 1.0 and support =
0.063
If mpg equals 13, Cylinders equals 8 and origin
equals 1 then confidence = 0.929 and support =
0.066
If cylinders equals 8, Origin equals 73 and origin
equals 1 then confidence = 1.0 and support =
0.051
If horsepower 150, Cylinders equals 8 and origin
equals 1 then confidence = 1.0 and support =
0.056.

CONCLUSION

In our project we have observed Apriori

algorithm and generated rules by considering
minimum support and confidence. The data
set is cleaned using R-Programming and the
algorithm is implemented using python code.
The python code is run is the java
environment and the results are obtained.

Apriori Algorithm Implementation Guide
No ratings yet
Apriori Algorithm Implementation Guide
9 pages
Apriori and FP Growth Algorithms Explained
No ratings yet
Apriori and FP Growth Algorithms Explained
8 pages
Apriori Algorithm (Python 3.0) - A Data Analyst
No ratings yet
Apriori Algorithm (Python 3.0) - A Data Analyst
13 pages
Frequent Itemset Mining Algorithms Analysis
No ratings yet
Frequent Itemset Mining Algorithms Analysis
23 pages
Apriori Algorithm: Step-by-Step Example
100% (1)
Apriori Algorithm: Step-by-Step Example
8 pages
Apriori Algorithm Implementation Guide
No ratings yet
Apriori Algorithm Implementation Guide
8 pages
Apriori Algorithm Data Analysis Guide
No ratings yet
Apriori Algorithm Data Analysis Guide
11 pages
Apriori Algorithm for Frequent Itemsets
No ratings yet
Apriori Algorithm for Frequent Itemsets
13 pages
Frequent Itemset Mining with Apriori & FP-Growth
No ratings yet
Frequent Itemset Mining with Apriori & FP-Growth
5 pages
Apriori Algorithm Implementation Code
No ratings yet
Apriori Algorithm Implementation Code
28 pages
Apriori Algorithm Quickstart Guide
No ratings yet
Apriori Algorithm Quickstart Guide
23 pages
Implementing Apriori Algorithm in Python
No ratings yet
Implementing Apriori Algorithm in Python
3 pages
Apriori Algorithm in Python Guide
No ratings yet
Apriori Algorithm in Python Guide
4 pages
Apriori Algorithm Overview in ML
No ratings yet
Apriori Algorithm Overview in ML
8 pages
Python Programs for Data Mining Techniques
No ratings yet
Python Programs for Data Mining Techniques
37 pages
Apriori Algorithm Implementation Guide
No ratings yet
Apriori Algorithm Implementation Guide
6 pages
Apriori Algorithm for Association Rules
No ratings yet
Apriori Algorithm for Association Rules
11 pages
AI and ML Introduction Guide
No ratings yet
AI and ML Introduction Guide
21 pages
Apriori Algorithm Implementation Guide
No ratings yet
Apriori Algorithm Implementation Guide
9 pages
Apriori Algorithm Overview and Properties
No ratings yet
Apriori Algorithm Overview and Properties
7 pages
Apriori Algorithm in Data Mining
No ratings yet
Apriori Algorithm in Data Mining
1 page
Analyzing Students' Performance Data
No ratings yet
Analyzing Students' Performance Data
6 pages
Clustering and Association Rule Code
No ratings yet
Clustering and Association Rule Code
12 pages
Apriori Algorithm Implementation
No ratings yet
Apriori Algorithm Implementation
5 pages
Apriori Algorithm in Data Mining
No ratings yet
Apriori Algorithm in Data Mining
5 pages
Association Analysis Study Guide
No ratings yet
Association Analysis Study Guide
9 pages
11 Association Rules Mining and Recommendation Systems
No ratings yet
11 Association Rules Mining and Recommendation Systems
70 pages
Multistage Algorithms for Frequent Pairs
No ratings yet
Multistage Algorithms for Frequent Pairs
178 pages
Disadvantages of Apriori Algorithm
No ratings yet
Disadvantages of Apriori Algorithm
7 pages
Apriori Algorithm for Association Rules
No ratings yet
Apriori Algorithm for Association Rules
4 pages
Apriori Algorithm for Market Basket Analysis
No ratings yet
Apriori Algorithm for Market Basket Analysis
7 pages
Defining Support and Confidence in Data Mining
No ratings yet
Defining Support and Confidence in Data Mining
5 pages
Big Data Analytics: Apriori & Clustering
No ratings yet
Big Data Analytics: Apriori & Clustering
41 pages
Apriori Algorithm Overview by Wasilewska
No ratings yet
Apriori Algorithm Overview by Wasilewska
23 pages
Apriori Algorithm Overview and Example
No ratings yet
Apriori Algorithm Overview and Example
23 pages
Apriori and FP-Growth Algorithms Guide
No ratings yet
Apriori and FP-Growth Algorithms Guide
3 pages
Python 3 Anaconda Installation Guide
No ratings yet
Python 3 Anaconda Installation Guide
24 pages
Machine Learning Internal Assessment Solutions
No ratings yet
Machine Learning Internal Assessment Solutions
10 pages
Apriori Algorithm for Rule Mining
No ratings yet
Apriori Algorithm for Rule Mining
6 pages
Apriori Algorithm for Grocery Data
No ratings yet
Apriori Algorithm for Grocery Data
3 pages
Apriori Algorithm for Frequent Itemsets
No ratings yet
Apriori Algorithm for Frequent Itemsets
15 pages
K-means, Apriori & ID3 Algorithms in Python
No ratings yet
K-means, Apriori & ID3 Algorithms in Python
10 pages
DB4-2
No ratings yet
DB4-2
29 pages
Apriori Algorithm Implementation in Python
No ratings yet
Apriori Algorithm Implementation in Python
2 pages
Da Unit 4
No ratings yet
Da Unit 4
177 pages
MSApriori Algorithm Steps Explained
No ratings yet
MSApriori Algorithm Steps Explained
5 pages
Frequent Itemset Mining Methods Explained
No ratings yet
Frequent Itemset Mining Methods Explained
19 pages
Candidate Generation and Pruning in Data Mining
100% (1)
Candidate Generation and Pruning in Data Mining
9 pages
Data Analytics: Apriori Algorithm Overview
No ratings yet
Data Analytics: Apriori Algorithm Overview
178 pages
Apriori Algorithm in Data Mining Lab
No ratings yet
Apriori Algorithm in Data Mining Lab
4 pages
Ch10 PPT
No ratings yet
Ch10 PPT
52 pages
CSci 530 Process Simulation Assignment
No ratings yet
CSci 530 Process Simulation Assignment
7 pages
Addressing Vitamin A Deficiency in Tagus
No ratings yet
Addressing Vitamin A Deficiency in Tagus
1 page
5G
No ratings yet
5G
20 pages
UIDAI Enrolment and Authentication Guide
100% (1)
UIDAI Enrolment and Authentication Guide
41 pages
Thesis Master Sep14
No ratings yet
Thesis Master Sep14
186 pages
MTC 2 e
No ratings yet
MTC 2 e
8 pages
AI Applications in Speech Recognition
92% (12)
AI Applications in Speech Recognition
48 pages
OC Org
No ratings yet
OC Org
33 pages
Palm Vein Technology Overview
83% (12)
Palm Vein Technology Overview
57 pages
Anti-Bag Snatching Alarm System
67% (3)
Anti-Bag Snatching Alarm System
15 pages
Crab 60 Scaffolding System Guidelines
No ratings yet
Crab 60 Scaffolding System Guidelines
3 pages
UL 142 Steel Tank Compliance Checklist
No ratings yet
UL 142 Steel Tank Compliance Checklist
6 pages
A Rib A Supplier Enablement
No ratings yet
A Rib A Supplier Enablement
2 pages
Trends in Industrial Robotisation
No ratings yet
Trends in Industrial Robotisation
8 pages
As 3785.8-1994 Underground Mining - Shaft Equipment Personnel Conveyances in Other Than Vertical Shafts
No ratings yet
As 3785.8-1994 Underground Mining - Shaft Equipment Personnel Conveyances in Other Than Vertical Shafts
7 pages
Corporate Risk Register Template
No ratings yet
Corporate Risk Register Template
3 pages
Manhole Invert Elevation Guidelines
No ratings yet
Manhole Invert Elevation Guidelines
3 pages
Tabela Com Codigos e Caracteristicas de Capacitores de Tantalo
100% (1)
Tabela Com Codigos e Caracteristicas de Capacitores de Tantalo
51 pages
Classifieds: Auburn, Nebraska Business Directory
No ratings yet
Classifieds: Auburn, Nebraska Business Directory
1 page
Module 2 - WHAT
No ratings yet
Module 2 - WHAT
12 pages
Test Plan Template (IEEE 829-1998 Format)
No ratings yet
Test Plan Template (IEEE 829-1998 Format)
9 pages
1st Running Bill for Allayna Builders
No ratings yet
1st Running Bill for Allayna Builders
52 pages
Andheri-Dahisar Metro Corridor DPR
100% (1)
Andheri-Dahisar Metro Corridor DPR
392 pages
Introduction to Workshop Technology
No ratings yet
Introduction to Workshop Technology
3 pages
Uttara Bank Limited Annual Report 2019
No ratings yet
Uttara Bank Limited Annual Report 2019
277 pages
Principles of Integrated Project Management
No ratings yet
Principles of Integrated Project Management
15 pages
Hot Rolled Plate Certificate EN10025-2
No ratings yet
Hot Rolled Plate Certificate EN10025-2
10 pages
Material Limits on Shape Efficiency
No ratings yet
Material Limits on Shape Efficiency
68 pages
NIC Code for Electrical Cables and Wires
No ratings yet
NIC Code for Electrical Cables and Wires
8 pages
ATM Withdrawal Sequence Diagram
100% (4)
ATM Withdrawal Sequence Diagram
232 pages
IBM Cognos Report Studio Fundamentals - Student Guide (Volume 1)
No ratings yet
IBM Cognos Report Studio Fundamentals - Student Guide (Volume 1)
266 pages
Key Questions in Operations Management
No ratings yet
Key Questions in Operations Management
23 pages
Railway Station Works Supervision Guidelines
No ratings yet
Railway Station Works Supervision Guidelines
2 pages
Doka Walls - DWG and Calcultion
No ratings yet
Doka Walls - DWG and Calcultion
20 pages
Gujarat's Solar Energy Initiatives
No ratings yet
Gujarat's Solar Energy Initiatives
25 pages
Powerformer: Revolutionizing Power Generation
No ratings yet
Powerformer: Revolutionizing Power Generation
11 pages
Chapter 1 General Introduction To Organization Development
100% (2)
Chapter 1 General Introduction To Organization Development
3 pages
Peacock Precision Measuring Tools Catalog
100% (2)
Peacock Precision Measuring Tools Catalog
171 pages
Understanding Total Quality Management
No ratings yet
Understanding Total Quality Management
44 pages
JLR Halts EV Plans at Tata's India Plant
No ratings yet
JLR Halts EV Plans at Tata's India Plant
6 pages

Automobile Data Mining Analysis

Uploaded by

Automobile Data Mining Analysis

Uploaded by

ADVANCED DATABASES

AND DATA MINING

Deepthi Chidura 50129270

Namitha Yellokonda 50126906

Shravya Beerakayala 50124534

The goal of a data mining process is to

By using Apriori algorithm, confidence and

The minimum support and

Apriori Algorithm is an influential

The problem of discovering all association

Use the large item sets to generate the

Two factors that affect significance of

Support: The rule X Y has support s in

It may also be used when referring to data

CODE FOR DATA CLEANING

After data cleaning these fields are

PYTHON CODE FOR APRIORI ALGORITHM

for key in sscount:

if confidence >= min_confidence:

+ "".join([str(i) + ", " for i in iter(frequency_setconsequence)]).rstrip(', ') \

In our project we have observed Apriori

You might also like