0% found this document useful (0 votes)
4 views53 pages

chapter 4

Chapter 4 discusses normalization in database systems, focusing on organizing data to reduce redundancy and eliminate anomalies such as insertion, update, and deletion anomalies. It outlines various forms of normalization, including First Normal Form (1NF), Second Normal Form (2NF), and Third Normal Form (3NF), along with examples and exercises to illustrate the concepts. The chapter emphasizes the importance of functional dependencies and the steps required to achieve a well-structured database design.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views53 pages

chapter 4

Chapter 4 discusses normalization in database systems, focusing on organizing data to reduce redundancy and eliminate anomalies such as insertion, update, and deletion anomalies. It outlines various forms of normalization, including First Normal Form (1NF), Second Normal Form (2NF), and Third Normal Form (3NF), along with examples and exercises to illustrate the concepts. The chapter emphasizes the importance of functional dependencies and the steps required to achieve a well-structured database design.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Database Systems

Concepts and Design


CSC201S2/G2
Chapter 4: Normalisation
Introduction
• Normalization is the process of organizing data in a database.

• Includes creating tables and establishing relationships between those tables

• Redundancy and inconsistent dependency.


Unnormalized form(UNF)
• A table that contains one or more repeating groups.

• An attribute or group of attributes within a table that occurs with multiple


values for a single occurrence of the nominated key attributes of that
table.

• A UNF model will suffer problems like data redundancy thus it lacks the
efficiency of database normalisation.
Example: Repeating Groups
Repeating groups
Redundant Information
• Data redundancy occurs when the same piece of data is stored in two or
more separate places.
• Aim of relational database design is to group attribute into relations to
minimize data redundancy and thereby reduce the file storage space required
by the implemented base relations.
Example: Data Redundancy

• In Staffbranch relation there is redundant data. Branch address is


repeated for every member of staff located at that branch.
Update Anomalies
• Relations that have redundant data may have problems called
update anomalies.

• Type of update anomalies are:

• Insertion

• Deletion

• Modification
Example: Update Anomalies
EMP_PROJ ( Emp#, Proj#, Ename, Pname, No_hours)

Modification Anomaly

• Changing the project name of project number P1 from “Billing”


to “Customer-Accounting” may cause this update to be made for
all 100 employees working on project P1
Example: Insert Anomalies
EMP_PROJ ( Emp#, Proj#, Ename, Pname, No_hours)

• Cannot insert a project unless an employee is assigned to .

• Inversely- Cannot insert an employee unless he/she is assigned to


a project.
Example: Delete Anomaly
EMP_PROJ ( Emp#, Proj#, Ename, Pname, No_hours)

When a project is deleted, it will result in deleting all the employees


who work on that project. Alternately, if an employee is the sole
employee on a project, deleting that employee would result in
deleting the corresponding project.
Example: Update Anomalies
• Insert a new staff into the StaffBranch relation;
• Delete a tuple that represents the last member of staff located at a
branch B007;
• Change the address of branch B003.
staffNo sName position salary branchNo bAddress

SL21 John White Manager 30000 B005 22 Deer Rd, London

SG37 Ann Beech Assistant 12000 B003 163 Main St,Glasgow

SG14 David Ford Supervisor 18000 B003 163 Main St,Glasgow

SA9 Mary Howe Assistant 9000 B007 16 Argyll St,Aberdeen

SG5 Susan Brand Manager 24000 B003 163 Main St,Glasgow

SL41 Julie Lee Assistant 9000 B005 22 Deer Rd, London


Types of Dependencies
Dependencies in DBMS is a relation between two or more attributes:

• Functional Dependency
• Fully-Functional Dependency
• Partial Dependency
• Transitive Dependency
Functional Dependencies
If the information stored in a table can uniquely determine another
information in the same table, then it is called Functional Dependency.
If A and B are attributes of a relation R, B is functionally dependent on
A (A → B), if each value of A in R is associated with exactly one
value of B in R.
Example: Functional Dependencies

STUD_NO -> STUD_NAME and


STUD_NO -> STUD_PHONE hold
*A STUD_NO uniquely identifies a STUD_NAME and STUD_PHONE

STUD_NAME->STUD_STATE does not hold


*Two students can have same name (Like RAM in the below table) and hence same state
Full-functional Dependencies
Full functional dependency indicates that if A and B are attributes of a
relation, B is fully functionally dependent on A if B is functionally
dependent on A, but not on any proper subset of A.

A non-key attribute depends on the entire primary key, rather than


just a portion of it.
Example

{supplier_id, item_id} -> price


*supplier_id nor item_id can uniquely determine the price
*Both supplier_id and item_id together can do so
*Price full-functional depend on supplier_id and item_id
Partial Dependencies
A functional dependency P→Q is partially dependent if there is some
attributes that can be removed from P and the dependency still holds.

A non-key attribute is functionally dependent on only part of the


composite primary key, not the entire key.
Example
{name} -> course
{roll_no} -> course
{name, roll_no} -> course
*Both the attributes name and roll_no alone are able to uniquely
identify a course
*The relationship is partially dependent
Transitive Dependencies
When an indirect relationship causes functional dependency it is
called Transitive Dependency.

Attribute B depends on attribute A (A -> B) and C depends on B (B ->


C), indirectly establishing a dependency between A and C (A -> C).

Non-key attribute depends on another non-key attribute, which in


turn depends on the primary key
Example

{roll_no} -> city


{city} -> zip-code
{roll_no} -> zip_code
*roll-no = 1 has city=pune and city=pune will have zip-
code=411044. So wherever roll-no is 1 , zip-code will be 411044
*The relationship is transitive dependency
Exercise
Which functional dependencies holds relation
1. AB →C && C→ B
2. BC → A && B→C
3. BC →A && A→ C
4. AC→ B && B→C

Which functional dependencies holds relation


R(v, w, x, y, z)
1. v → wx
2. yz → x
3. x→ yz
Exercise
List all functional dependencies satisfied by the relation
R (A, B, C)

Which functional dependencies are holds the relation


1. ∝→ 𝛽 && 𝛽𝛾 → ∝
2. 𝛾 → 𝛽 && ∝→ 𝛽
3. 𝛽 → 𝛾 && ∝ 𝛽 → 𝛾
4. ∝ → 𝛾 && 𝛽𝛾 → ∝
Normalisation
• Database design technique that reduces data redundancy and
eliminates undesirable characteristics like Insertion, Update and
Deletion Anomalies.
• There are three main reasons to normalize a database.
• minimize duplicate data,
• minimize or avoid update anomalies
• simplify queries.
Normalisation
A process of organizing the data in database to avoid data redundancy,
insertion anomaly, update anomaly & deletion anomaly.
Steps to normalize the database:

• First Normal Form (1NF)


• Second Normal Form (2NF)
• Third Normal Form (3NF)
• Boyce-Code Normal Form (BCNF)
First Normal Form (INF)
• All data values are atomic.
• INF is a relation in which the intersection of each row and column
contains one and only one value.

Approach for removing repeating groups:


• Entering appropriate data in the empty columns of rows containing the
repeating data.
• Placing the repeating data, along with a copy of the original key
attribute(s), in a separate relation.
• A primary key is identified for the new relation.
Example
Normalise into 1NF using First approach

ClientNo propertyNo cName pAddress rentStart rentFinish rent ownerNo oName

John Kay 6 lawrence St,Glasgow Tina Murphy


CR76 PG4 1-Jul-00 31-Aug-01 350 CO40

CR76 PG16 John Kay 5 Novar Dr, Glasgow 1-Sep-02 1-Sep-02 450 CO93 Tony Shaw

Aline Stewart 6 lawrence St,Glasgow Tina Murphy


CR56 PG4 1-Sep-99 10-Jun-00 350 CO40

Aline Stewart 2 Manor Rd, Glasgow Tony Shaw


CR56 PG36 10-Oct-00 1-Dec-01 370 CO93

Aline Stewart 5 Novar Dr, Glasgow Tony Shaw


CR56 PG16 1-Nov-02 1-Aug-03 450 CO93

1NF ClientRental relation with the first approach


1NF ClientRental relation with the second approach

ClientNo cName
ClientNo propertyNo pAddress rentStart rentFinish rent ownerNo oName
CR76 John Kay
CR56 Aline Stewart 6 Lawrence St,Glasgow Tina Murphy
CR76 PG4 1-Jul-00 31-Aug-01 350 CO40

CR76 PG16 5 Novar Dr, Glasgow 1-Sep-02 1-Sep-02 450 CO93 Tony Shaw

CR56 PG4 6 Lawrence St,Glasgow 1-Sep-99 10-Jun-00 350 CO40 Tina Murphy

CR56 PG36 2 Manor Rd, Glasgow 10-Oct-00 1-Dec-01 370 CO93 Tony Shaw

CR56 PG16 5 Novar Dr, Glasgow 1-Nov-02 1-Aug-03 450 CO93 Tony Shaw

1NF ClientRental relation with the second approach


Example
Module Dept Lecturer Texts
M1 D1 L1 T1
Module Dept Lecturer Texts
M1 D1 L1 T2
M1 D1 L1 T1, T2
M2 D1 L1 T1
M2 D1 L1 T1. T3
M2 D1 L1 T3
M3 D1 L2 T4
M3 D1 L2 T4
M4 D2 L3 T1, T5
M4 D2 L3 T1
M5 D2 L4 T6
M4 D2 L3 T5
Unnormalised M5 D2 L4 T6
INF
Exercise
Convert the following relation in 1NF

Product Id Colour Price


1 Black, red Rs. 210

2 Green Rs. 150

3 Red Rs. 110

4 Green, blue Rs. 260

5 Black Rs. 100


Problems in INF

Module Dept Lecturer Texts INSERT anomalies


M1 D1 L1 T1
Can't add a module with no texts
M1 D1 L1 T2
M2 D1 L1 T1 UPDATE anomalies
M2 D1 L1 T3
To change lecturer for M1, we have to change
M3 D1 L2 T4
M4 D2 L3 T1
two rows
M4 D2 L3 T5 DELETE anomalies
M5 D2 L4 T6
If we remove M3, we remove L2 as well
Second Normal Form (2NF)
• A relation is in second normal form (2NF) if it is in 1NF and no non-key
attribute is partially dependent on a candidate key
OR
• Second normal form (2NF) is a relation that is in first normal form and
every non-primary-key attribute is fully functionally dependent on the
primary key.
• The normalization of 1NF relations to 2NF involves the removal of partial
dependencies.
• If a partial dependency exists, remove the function dependent attributes
from the relation by placing them in a new relation along with a copy of
their determinant.
Second Normal Form (2NF)

ClientNo propertyNo cName pAddress rentStart rentFinish rent ownerNo oName

John Kay 6 lawrence St,Glasgow Tina Murphy


CR76 PG4 1-Jul-00 31-Aug-01 350 CO40

CR76 PG16 John Kay 5 Novar Dr, Glasgow 1-Sep-02 1-Sep-02 450 CO93 Tony Shaw

Aline Stewart 6 lawrence St,Glasgow Tina Murphy


CR56 PG4 1-Sep-99 10-Jun-00 350 CO40

Aline Stewart 2 Manor Rd, Glasgow Tina Shaw


CR56 PG36 10-Oct-00 1-Dec-01 370 CO935

Aline Stewart 5 Novar Dr, Glasgow Tony Shaw


CR56 PG16 1-Nov-02 1-Aug-03 450 CO93
Second Normal Form (2NF)

• Client(clientNo, cName)

• Property(propertyNo, pAddress, rent, ownerNo, oName)

• Client-Property(clientNo, propoertyNo, rentStart, rentFinish)


Third Normal Form (3NF)
A table design is said to be in 3NF if both the following conditions hold:
• Table must be in 2NF
• Transitive functional dependency of non-prime attribute on any super
key should be removed.
OR
A relation is in third normal form (3NF) if it is in 2NF and no non-key
attribute is transitively dependent on a candidate key

The normalization of 2NF relations to 3NF involves the removal of transitive


dependencies by placing the attribute(s) in a new relation along with a copy of
the determinant.
Third Normal Form (3NF)

ClientNo propertyNo cName pAddress rentStart rentFinish rent ownerNo oName

• propertyNo  ownerNo
• ownerNo  oName
• So oName transitively depend on propertyNo
Third Normal Form (3NF)

• Client (clientNo, cName)

• Owner (ownerNo, oName)

• Property(propertyNo, pAddress, rent, ownerNo)

• Client-Property (clientNo, propertyNo, rentStart, rentFinish)


Exercise
Consider the Students table:
1. Identify any repeating groups and functional dependences
2. Show all the intermediate steps to derive the third normal form for STUDENT.

Alpha Name Email Courses Marks

100111 John Doe doe@[Link] NN204, SI204, 2,3,3


IT221
092244 Matt Smith smith@[Link] 4,4
SM223, EE301
113221 Melinda Black black@[Link] SI204 3
090112 Tom Johnson Johnson@[Link] NN204, SI204, 4,2,3
IT221
Exercise
Alpha Name Email Courses Marks

100111 John Doe doe@[Link] 2


NN204
100111 John Doe doe@[Link] SI204 3
100111 John Doe doe@[Link] IT221 3
092244 Matt Smith smith@[Link] SM223 4

092244 Matt Smith smith@[Link] EE301 4


113221 Melinda Black black@[Link] SI204 3
090112 Tom Johnson Johnson@[Link] NN204 4

090112 Tom Johnson Johnson@[Link] SI204 2


090112 Tom Johnson Johnson@[Link] IT221 3
In 2NF

• Student (Alpha, Name)

• StudentCourse (Alpha,Courses,Marks)

In 3NF

• Student (Alpha, Name, Email)

• StudentCourse (Alpha,Courses, Marks)


Exercise
Consider the Patients table:
1. Identify any repeating groups and functional dependences
2. Show all the intermediate steps to derive the third normal form for PATIENT.

Patient Patient Doctor Appointment d Consultant Consultant


Doctor no Sample
no name name ate Name address

3/9/2014 Farnes Acadia Rd blood


01027 Grist 919 Robinson 20/12/2014 Farnes Acadia Rd none
10/10/2014 Edwards Beech Ave urine
3/9/2014 Farnes Acadia Rd none
08023 Daniels 818 Seymour
3/9/2014 Russ Fir St sputum
191146 Falken 717 Ibbotson 4/10/2014 Russ Fir St blood
001239 Burgess 818 Seymour 5/6/2014 Russ Fir St sputum

007249 Lynch 717 Ibbotson 9/11/2014 Edwards Beach Ave none


Exercise
PATIENT table is in INF

Patient Patient Doctor Appointment d Consultant Consultant


Doctor no Sample
no name name ate Name address

01027 Grist 919 Robinson 3/9/2014 Farnes Acadia Rd blood


01027 Grist 919 Robinson 20/12/2014 Farnes Acadia Rd none
01027 Grist 919 Robinson 10/10/2014 Edwards Beech Ave urine
08023 Daniels 818 Seymour 3/9/2014 Farnes Acadia Rd none
08023 Daniels 818 Seymour 3/9/2014 Russ Fir St sputum
191146 Falken 717 Ibbotson 4/10/2014 Russ Fir St blood
001239 Burgess 818 Seymour 5/6/2014 Russ Fir St sputum

007249 Lynch 717 Ibbotson 9/11/2014 Edwards Beach Ave none


Exercise
Consider the Pets table:
1. Identify any repeating groups and functional dependences
2. Show all the intermediate steps to derive the third normal form for PETS.
Pet Pet Pet Pet
Owner Visit Date Procedure id Procedure name
Id Name Type Age

JAN 13/2002 01 Rabies Vaccination


246 Rover Dog 12 Sam Cook MAR 27/2002 10 Examine And Treat Wound
APR 02/2002 05 Heart Worm Test
JAN 21/2002 08 Tetanus Vaccination
298 Spot Dog 2 Terry Kim
MAR 10/2002 05 Heart Worm Test
JAN 23/2001 01 Rabies Vaccination
341 Morris Cat 4 Sam Cook
JAN 13/2002 01 Rabies Vaccination
Annual Check Up
APR 30/2002 20
519 Tweedy Bird 2 Terry Kim Eye Wash
APR 30/2002 12
Exercise
Pet Pet Pet Pet
Owner Visit Date Procedure id Procedure name
Id Name Type Age

246 Rover Dog 12 Sam Cook JAN 13/2002 01 Rabies Vaccination

246 Rover Dog 12 Sam Cook MAR 27/2002 10 Examine And Treat Wound
246 Rover Dog 12 Sam Cook APR 02/2002 05 Heart Worm Test
298 Spot Dog 2 Terry Kim JAN 21/2002 08 Tetanus Vaccination
298 Spot Dog 2 Terry Kim MAR 10/2002 05 Heart Worm Test
341 Morris Cat 4 Sam Cook JAN 23/2001 01 Rabies Vaccination
341 Morris Cat 4 Sam Cook JAN 13/2002 01 Rabies Vaccination
519 Tweedy Bird 2 Terry Kim APR 30/2002 20 Annual Check Up
12
519 Tweedy Bird 2 Terry Kim APR 30/2002 Eye Wash
Exercise
PET table is in 3NF
Pet (Pet id, Pet name, Pet type, Pet age, Owner)
PetOwner (Pet id, Visited date, Procedure id)
Procedure(Procedure id, Procedure name )
Boyce-Codd Normal Form (BCNF)
Advance version of 3NF referred as 3.5NF.

BCNF is stricter than 3NF.

A table complies with BCNF if it is in 3NF and for every functional


dependency X->Y, X should be the super key of the table.
Example

Primary key: {Student, Course}


Functional dependency
{student, course} -> Teacher
Teacher-> Course

teacher is not super key but determines


course.
Example
After decomposing it into Boyce-Codd normal form
Exercise
After decomposing it into Boyce-Codd normal form

Stu_ID Stu_Branch Stu_Course Branch_Number Stu_Course_No

101 Computer Science & Engineering DBMS B_001 201

101 Computer Science & Engineering Computer Networks B_001 202

Electronics & Communication


102 VLSI Technology B_003 401
Engineering

Electronics & Communication Mobile


102 B_003 402
Engineering Communication
Exercise
The table below shows an extract from a tour operator's data on travel agent
bookings. Derive the 3NF of the data, showing all the intermediate steps
Batch Agent Agent holiday quantity Airport
cost airport name
no no name code booked code
B563 363 10 1 Luton
Bairns
1 76 B248 248 20 12 Edinburgh
travel
B428 322 18 11 Glasgow
B563 363 15 1 Luton
Active C930 568 2 14 Newcastle
2 142
Holidays A270 972 1 14 Newcastle
B728 248 5 12 Edinburgh
Bairns C930 568 11 1 Luton
3 76
travel A430 279 15 11 Glasgow

You might also like