0% found this document useful (0 votes)
6 views35 pages

Normalization

Uploaded by

minal.gohar.pro
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views35 pages

Normalization

Uploaded by

minal.gohar.pro
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Normalization

Asif Sohail
University of the Punjab
Punjab University College of Information Technology (PUCIT)

© 2009 Punjab University College of Information Technology (PUCIT) September 8, 2009 Slide 1
Introduction

• It is a process of converting a complex, large and unstable


relation into a set of simple, small and stable relations.
• It is a process of efficiently organizing data in a database.
• Normalization results in a well structured relation – a relation
that contains min. redundancy and allows insert, update and
delete without errors/inconsistencies.
• Errors or inconsistencies caused by redundant data are also
called anomalies.
• There are three types of anomalies:
a) Insert Anomaly.
b) Delete Anomaly.
c) Update Anomaly.
© 2009 Punjab University College of Information Technology (PUCIT) September 8, 2009 Slide 2
Introduction

a) Insert Anomaly.
• It occurs when extra data beyond the desired data
must be added to the database.
b) Update Anomaly.
• It occurs when it is necessary to change multiple
rows to modify only a single fact.
c) Delete Anomaly.
• It occurs when deleting a row causes some
unwanted deletions.

© 2009 Punjab University College of Information Technology (PUCIT) September 8, 2009 Slide 3
Functional Dependency
• Normalization is based on the analysis of Functional Dependency (FD).

• A FD is a relationship between two attributes A & B of a relation R,


such that attribute B is said to be functionally dependent on attribute
A, if A uniquely determines the values of B.(A->B)

• The attribute on the left-hand side of the arrow in a functional


dependency is called Determinant and on the right are called
Dependents.

• An attribute may be functionally dependent on more than one


attributes.

© 2009 Punjab University College of Information Technology (PUCIT) September 8, 2009 Slide 4
Continued…

EMPLOYEE1

EMP_ID Name Dept_Name Salary

EMPLOYEE2

EMP_ID Course_Title Name Dept_Name Salary Date_Completed

Date a Course is completed is completely determined by


the EMP_ID and Course_Title
© 2009 Punjab University College of Information Technology (PUCIT) September 8, 2009 Slide 5
Functional Dependency
• Functional Dependencies can be thought of as an integrity
constraints that encode data semantics.
• Functional Dependencies are helpful in identifying the keys for a
given relation and to replace a relation with a collection of
smaller relations.
• An attribute or set of attributes is a key, if it can functionally
determine the other attributes of the relation.
• Example: Consider a relation R (A, B, C, D, E) with the following
FDs:
• A -> D
• D -> B
• B -> C
• E -> B
© 2009 Punjab University College of Information Technology (PUCIT) September 8, 2009 Slide 6
Inference Rules
• Rules of Inference for functional dependencies, called inference
axioms or Armstrong axioms, after their developer, can be used
to find all the FDs logically implied by a set of FDs.
• There are the following inference rules:
a) Reflexivity.
b) Augmentation.
c) Transitivity.
d) Additivity or union.
e) Projectivity or Decomposition
f) Pseudo transitivity:

© 2009 Punjab University College of Information Technology (PUCIT) September 8, 2009 Slide 7
Inference Rules
a) Reflexivity.
• {Y 1,...,Y n} ⊆ {X1,...,Xm} implies {X1,...,Xm} → {Y 1,...,Y n}
• OR If B is a subset of A, then A → B.
• For Example: StName,stAdr → stName
b) Augmentation.
• If we have A → B then AC → BC. For Example
• If stId → stName then
• StId,stAdr → stName,stAdr
c) Transitivity.
• If A → B and B → C, then A → C
• If stId → prName and prName → credits then
• stId → credits
© 2009 Punjab University College of Information Technology (PUCIT) September 8, 2009 Slide 8
Inference Rules
d) Additivity or Union:
• If A → B and A → C, then A → BC
• If empId → eName and empId → qual Then we can write it as
empId → eName, qual
e) Projectivity or Decomposition
• If A → BC then A → B and A → C
• If empId → eName,qual Then we can write it as
• empId → eName and empID → qual
f) Pseudo transitivity:
• If A → B and BC → D, then AC → D
• If stID → stName and stName,fName → stAdr Then we can
write it as StId,fName → stAdr
© 2009 Punjab University College of Information Technology (PUCIT) September 8, 2009 Slide 9
Normal Forms
• Normalization process is built around the concept of Normal
Forms.
• A Normal Form is a state of a relation that can be determined
by applying simple rules regarding functional Dependencies.
– First Normal Form
– Second Normal Form
– Third Normal Form
– Boyce Codd Normal Form
– Fourth Normal Form
– Fifth Normal Form

© 2009 Punjab University College of Information Technology (PUCIT) September 8, 2009 Slide 10
Table with
Multivalued attribute
Remove multivalued attributes

First Normal Form

Remove partial dependencies

Second Normal Form

Remove transitive dependencies

Third Normal Form

Removing remaining anomalies


Boyce-Codd
Normal Form
Remove multivalued dependencies

Fourth Normal Form

Remove remaining anomalies

Fifth Normal Form


First Normal Form (1NF)
• A relation is said to be in 1NF, if it contains no Repeating Group (RG).
• A RG is a collection of multi-valued attributes OR when there is more
than one field storing the same kind of information in a single table,
there is a RG.
• To eliminate a RG, the value at the intersection of a row and column
must be atomic(having one value).
• If you developed a logical design by transforming ER diagram into
relations, there should not be any multivalued attributes remaining
• Consider the following relation:
Student (RegNo, Name, Program, C-Code, C-Title, C-Grade)
• This relation has a repeating group consisting of C-Code, C-Title, C-Grade
and therefore it has the insert, delete and update anomalies.
• Multiple values create problems in performing operations like select or
join.

© 2009 Punjab University College of Information Technology (PUCIT) September 8, 2009 Slide 12
First Normal Form (1NF)
• The relation Student can be converted into 1NF using either
of the following methods:
a) Change the PK of the relation and define a composite key
RegNo & C_Code. We fill the blanks by duplicating the non-
repeating data. This approach is commonly referred to as
Flattening the table.
b) Split the relation into 2 relations by placing the repeating data
along with a copy of the original key attribute(s) in a separate
relation. The new relation will always have concatenated key.
Student (RegNo, Name, Program)
Course (RegNo, C-Code, C-Title, C_Grade)

Example 2: STD(stId, stName, stAdr, prName, bkId)


© 2009 Punjab University College of Information Technology (PUCIT) September 8, 2009 Slide 13
Second Normal Form (2NF)
• A relation is in 2NF if:
– It is in 1NF
– Every nonkey attribute is fully functionally dependent on
the primary key
• A situation of Partial Functional Dependency arises when PK
of a relation is composite and a non key attribute is
functionally dependent on part (but not all) of the PK.
• Referring to the Course relation:
Course (RegNo, C-Code,C-Title, C_Grade)
• The functional dependencies are:
C-Code -> C_Title (Partial FD)
RegNo,C_Code -> C_Grade (Full FD)

© 2009 Punjab University College of Information Technology (PUCIT) September 8, 2009 Slide 14
Second Normal Form (2NF)
• Since all the non key attributes are not fully functionally
dependent on the PK or there is partial functional
dependency in the relation, therefore it is not in 2NF.
• The Anomalies associated with the course relation are:
a) Insert Anomaly:
• A course instance cant be inserted without a student (RegNo)
b) Delete Anomaly.
• Deleting a student will unnecessarily delete course data.
c) Update Anomaly.
• A course cant be updated independently.

© 2009 Punjab University College of Information Technology (PUCIT) September 8, 2009 Slide 15
Second Normal Form (2NF)
• The process for transforming a 1NF table to 2NF is:
– Identify any determinants other than the composite key, and
the columns they determine.
– Create and name a new table for each determinant and the
unique columns it determines.
– Move the determined columns from the original table to the
new table. The determinate becomes the primary key of the
new table.
– Delete the columns you just moved from the original table
except for the determinate which will serve as a foreign key.
– The original table may be renamed to maintain semantic
meaning.

© 2009 Punjab University College of Information Technology (PUCIT) September 8, 2009 Slide 16
Second Normal Form (2NF)
• The relation Course can be converted into 2NF by
decomposing it into the following relations:
Course (C-Code,C-Title)
Result (RegNo, C-Code, C_Grade)

• A relation in 1NF will be in 2NF if:


– The PK consists of only one attribute OR
– No nonkey attributes exist in the relation OR
– Every nonkey attribute is functionally dependent on
the full set of primary key attributes

© 2009 Punjab University College of Information Technology (PUCIT) September 8, 2009 Slide 17
Third Normal Form (3NF)
• A relation is said to be in 3NF, if it is in 2NF and there is no
Transitive Dependency.
• A Transitive Dependency is a functional dependency between
two or more non key attributes of a relation.
• Consider the following relation:
Emp (EmpNo, EName, Job, Sal, Proj-No,Proj-Details)
• In the above relation, there is a following transitive
dependency:
Proj-No -> Proj-Details
• Due to this, project information cant be maintained
independent of a employee record and hence there are
anomalies in the relation.

© 2009 Punjab University College of Information Technology (PUCIT) September 8, 2009 Slide 18
Third Normal Form (3NF)
• You can remove transitive dependency from a relation in the
following way:
• Create a new relation against transitively dependent
attributes and leave the PK of new relation in the old relation
to serve as a FK.
Emp (EmpNo, EName, Job, Sal, Proj-No)
Project (Proj-No, Proj-Details)

More Example:
Ex2: STD (stId, stName, stAdr, prName, prCrdts)
– stId -> stName, stAdr, prName, prCrdts
– prName -> prCrdts

© 2009 Punjab University College of Information Technology (PUCIT) September 8, 2009 Slide 19
Third Normal Form (3NF)
• The process of transforming a table into 3NF is:
– Identify any determinants, other the primary key, and the
columns they determine.
– Create and name a new table for each determinant and the
unique columns it determines.
– Move the determined columns from the original table to the
new table. The determinate becomes the primary key of the
new table.
– Delete the columns you just moved from the original table
except for the determinate which will serve as a foreign key.
– The original table may be renamed to maintain semantic
meaning.

© 2009 Punjab University College of Information Technology (PUCIT) September 8, 2009 Slide 20
Normalization - Exercise
• Order Relation:
OrderNo, , OrderDate, PaymentMode, CustNo, CustName, CustAddress,
ProductNo, ProdName, Price, ProdQty, QtyInHand, ProdTotal, OrderTotal

• Student Relation:
RegNo, Name, Address, Program, C-Code, C-Title, C-Grade, T-Code,
T-Name

• Patient Relation:
PatNo, PatName, PatAge, VisitNo, VisitDate, DNo, DName, DSpeciality,
Diagnosis

• Project Relation:
PNo, PName, PBudget, EmpNo, EName, Job, ChgHour, Hours

© 2009 Punjab University College of Information Technology (PUCIT) September 8, 2009 Slide 21
Boyce-Codd Normal Form (BCNF)
• A relation is said to be in BCNF, iff every determinant is a
candidate key (contains a key) or there is no overlapping of
candidate keys.
• If a table contains atomic candidate keys, the 3NF and BCNF
are equivalent. The BCNF differs from the 3NF only when
there are more than one candidate keys and the keys are
composite and overlapping.
• A 3NF relation is not in BCNF and this happens only if
– the candidate keys in the relation are composite keys (that
is, they are not single attributes),
– there is more than one candidate key in the relation, and
– the keys are not disjoint, that is, some attributes in the keys
are common.
© 2009 Punjab University College of Information Technology (PUCIT) September 8, 2009 Slide 22
Boyce-Codd Normal Form (BCNF)
• Consider a relation R(A, B, C, D) such that
A,B -> C, D and C->B.
• The relation R has no partial dependency nor it contains
transitive dependency. Thus the relation R is in 3NF.
• According to the Pseudo transitivity principle, we can have a
dependency A, C → B, D
• Since C → B, so we now have Partial Functional Dependency.
Accordingly, we decompose the relation as:
• R1 ( A, C, D) and
• R2 (C, B)

© 2009 Punjab University College of Information Technology (PUCIT) September 8, 2009 Slide 23
Boyce-Codd Normal Form (BCNF)

© 2009 Punjab University College of Information Technology (PUCIT) September 8, 2009 Slide 24
Boyce-Codd Normal Form (BCNF)
• Consider the following relation:
PROJECT (RegNo, PTool, Supervisor)
Constraints:
1. For each Project Tool (PTool), a student has only one supervisor.
2. A project may be in more than one tools.
3. Each supervisor can supervise only one tool.
• In the above relation, no single attribute is a PK.
• Possible candidate keys are RegNo, PTool and RegNo, Supervisor.
• The candidate keys overlap as they share RegNo.
• The functional dependencies are:
RegNo, PTool -> Supervisor
Supervisor -> Ptool
• The anomalies result from the fact that there is a determinant (Supervisor)
that is not a candidate key in the relation.

© 2009 Punjab University College of Information Technology (PUCIT) September 8, 2009 Slide 25
Boyce-Codd Normal Form (BCNF)
• The Anomalies associated with the course relation are:
a) Insert Anomaly:
• Supervisor and PTool cant be defined unless a student takes a
project.
b) Delete Anomaly.
• Deleting a student will unnecessarily delete project data.
c) Update Anomaly.
• Updating a PTool may result in unwanted changes.
• The relation Project can be converted into BCNF by
decomposing it into the following relations:
PROJECT1 (RegNo, Supervisor)
PROJECT2 (Supervisor, PTool)
© 2009 Punjab University College of Information Technology (PUCIT) September 8, 2009 Slide 26
Boyce-Codd Normal Form (BCNF)
• Example 2:
FACULTY (FACNAME, DEPT, OFFICE, RANK, DATEHIRED)
Assumptions / Constraints:
• FACNAME within a DEPT is unique.
• Each faculty member has only one OFFICE.
• Faculty members from the same DEPT may share offices.
• The OFFICE across all the departments are unique.
Functional Dependencies:
• OFFICE -> DEPT
• FACNAME, DEPT -> OFFICE, RANK, DATEHIRED
• FACNAME, OFFICE -> DEPT, RANK, DATEHIRED

© 2009 Punjab University College of Information Technology (PUCIT) September 8, 2009 Slide 27
Fourth Normal Form (4NF)
• A relation is said to be in 4NF, if it is in BCNF and there
is no multivalued dependency.
• A Multivalued Dependency is a type of dependency that
exists in a relation R having at least 3 attributes R(A, B,
C) such that, for each value of A, there is a well defined
set of values of B and a well defined set of values of C,
but the set of values of B and the set of values of C is
independent or have no impact on each other.

© 2009 Punjab University College of Information Technology (PUCIT) September 8, 2009 Slide 28
Fourth Normal Form (4NF)
• Consider a relation
COURSE (C_Code, C_Instructor, Book_Titles)
• The constraints on the above relation are:
• Each course may have several instructors.
• Each course uses several book titles.
• Set of Book titles used for a course is independent of the
set of instructors.
• Q: What will be the PK of the relation?
• The Anomalies associated with the course relation are:
a) Insert Anomaly:
• Adding a new book title will require adding as many rows
as many instructors are there.
© 2009 Punjab University College of Information Technology (PUCIT) September 8, 2009 Slide 29
Fourth Normal Form (4NF)
b) Delete Anomaly.
• Deleting an instructor will result in unwanted deletions.
c) Update Anomaly.
• Updating a book title requires updating several rows.
• The relation Course can be converted into 4NF by
decomposing it into the following relations:
• COURSE (C_Code, C_Instructor)
• BOOKS (C_Code, Book_Titles)

• Example 2:
• STUDENT (RegNo, C_Code, Hobbies)

© 2009 Punjab University College of Information Technology (PUCIT) September 8, 2009 Slide 30
Fifth Normal Form (5NF)
• In every Normal Form, we decompose a given relation. The decomposition
can be lossless or lossy decomposition.
• A decomposition {R1, R2, …., Rn} of a relation R is called a lossless
decomposition for R, if the natural join of R1, R2, …., Rn produces exactly
the relation R, otherwise the decomposition will be lossy and will result in
loss of information.
• A relation is said to be in 5NF, if it is in 4NF and there is no remaining
lossless decomposition.
• Alternatively, A relation is said to be in 5NF, if it is in 4NF and there is no
Join Dependency.
• A Join Dependency means that a relation can be decomposed into two or
more relations such that the resulting relations can be recombined to
form original data. Thus if R(A, B, C) is decomposed into R1(A,B) and
R2(B,C) – a join dependency exists if we can get back to R by taking
natural join of R1 and R2.
• Join Dependency ensures that the decomposition is lossless.
© 2009 Punjab University College of Information Technology (PUCIT) September 8, 2009 Slide 31
Fifth Normal Form (5NF)
• Consider the relation STUDENT:
RegNo Name Program C_Code C_Grade
1 Arif BSSE 310 B
1 Arif BSSE 311 A
1 Arif BSSE 312 C
2 Talha BSIT 310 A
2 Talha BSIT 311 C
2 Talha BSIT 312 B
. . . . .
. . . . .

• We decompose the relation into the following relations in order to


convert it into 1NF.
STUDENT (RegNo, Name, Program)
RESULT(RegNo, C_Code, C_Grade)

© 2009 Punjab University College of Information Technology (PUCIT) September 8, 2009 Slide 32
Fifth Normal Form (5NF)
• The Tuples of two relations are given as under:
STUDENT RESULT
RegNo Name Program RegNo C_Code C_Grade

1 Arif BSSE 1 310 B

2 Talha BSIT 1 311 A

. . . 1 312 C

. . . 2 310 A
2 311 C
2 312 B
. . .

• Natural Join of the above two relations will give us the tuples of the
original relation STUDENT. Thus the decomposition is lossless.
• Now we decompose RESULT relation as:
RESULT1(RegNo, C_Code)
RESULT2(C_Code, C_Grade)
© 2009 Punjab University College of Information Technology (PUCIT) September 8, 2009 Slide 33
Fifth Normal Form (5NF)
• The Tuples of two relations and the tuples obtained by natural join of the
relations are given as under:
RegNo C_Code C_Grade
1 310 B
RegNo C_Code C_Code C_Grade
1 310 A
1 310 310 B
1 311 A
1 311 311 A
1 311 C
1 312 312 C
1 312 C
2 310 310 A
1 312 B
2 311 311 C
2 310 B
2 312 312 B
2 310 A
. . . .
2 311 A
• Natural Join of the above two 2 311 C
relations will give us some extra 2 312 C
tuples (highlighted) which were 2 312 B
not in the original relation. . . .
. . .

© 2009 Punjab University College of Information Technology (PUCIT) September 8, 2009 Slide 34
Fifth Normal Form (5NF)
• Even though we are getting more tuples, but still the
decomposition is lossy, because without the original relation, we
have no way to identify which tuples are extra or spurious. Thus we
actually loose information.
• We can guarantee that the decomposition if lossless by making
sure that for each pair of relations that will be joined, the set of
common attributes is a determinant in one of the relations.
• 5NF is not easily verified as lower normal forms.
• In fact, no systematic method exists for obtaining 5NF or for
ensuring that a set of relations is indeed in 5NF.
• It is tempting to suggest that such relations are pathological cases
and are likely to be rare in practice.
• 5NF is also called Project-Join Normal Form.

© 2009 Punjab University College of Information Technology (PUCIT) September 8, 2009 Slide 35

You might also like