0% found this document useful (0 votes)
15 views22 pages

Database Normalization Explained

Chapter 4 discusses logical database design, focusing on normalization to reduce data redundancy and prevent update anomalies such as insertion, deletion, and modification anomalies. It outlines the steps of normalization through various normal forms (1NF, 2NF, 3NF, etc.) and emphasizes the importance of functional dependencies in creating a well-structured database. The chapter also highlights the trade-off between normalization and system performance, suggesting de-normalization when necessary for efficiency.

Uploaded by

teshalewellu2022
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views22 pages

Database Normalization Explained

Chapter 4 discusses logical database design, focusing on normalization to reduce data redundancy and prevent update anomalies such as insertion, deletion, and modification anomalies. It outlines the steps of normalization through various normal forms (1NF, 2NF, 3NF, etc.) and emphasizes the importance of functional dependencies in creating a well-structured database. The chapter also highlights the trade-off between normalization and system performance, suggesting de-normalization when necessary for efficiency.

Uploaded by

teshalewellu2022
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Chapter 4: Logical Database Design

4.1 Functional Dependency and Normalization


A relational database is merely a collection of data, organized in a
particular manner.
Database normalization is a series of steps followed to obtain a
database design that allows for consistent storage and efficient
access of data in a relational database. These steps reduce data
redundancy and the risk of data becoming inconsistent.
NORMALIZATION is the process of identifying the logical
associations between data items and designing a database that will
represent such associations but without suffering the update
anomalies which are:
1. Insertion Anomalies
2. Délétion Anomalies
3. Modification Anomalies
Normalization may reduce system performance since data will
be cross referenced from many tables. Thus de-normalization is
sometimes used to improve performance, at the cost of reduced
consistency guarantees. Normalization normally is considered
“good” if it is lossless decomposition.
All the normalization rules will eventually remove the update
anomalies that may exist during data manipulation after the
implementation.
The type of problems that could occur in insufficiently
normalized table is called update anomalies which includes.
1. Insertion Anomalies
Occur when new data cannot be added to a database without the
presence of unrelated data. For example, in a database where
customer data is stored with order information, it might be
impossible to add a new customer without an associated order.
2. Deletion Anomalies
Happen when the deletion of a record causes unintended loss of
other valuable data. For example, if deleting an order also
removes customer information that should remain intact.
3. Modification Anomalies
Arise when changes to data require multiple updates in several
rows due to redundant data, leading to potential inconsistencies.
For instance, updating a customer's address might need changes
in multiple rows if their data is duplicated across the database.
These anomalies emphasize the importance of database
normalization, particularly achieving at least 3rd Normal Form
(3NF) to minimize redundancy and maintain data integrity.
Let me know if you'd like more detailed explanations or
examples!
NB. The purpose of normalization is to reduce the chances for
anomalies to occur in a database.
Schoo scholladd Skill
EmpID FName LName SkillID Skill SkillType l level

12 Abebe Mekuria 2 SQL Database AAU Sdist_kilo 5


16 Lemma Alemu 5 C++ Programming Unity Gerji 6
28 Chane Kebede 2 SQL Database AAU Sdist_kilo 10
25 Abera Taye 6 VB6 Programming Helico Piazza 8
65 Almaz Belay 2 SQL Database Helico Piazza 9
Oracl Unity Gerji 5
24 Dereje Tamiru 8 e Database
Prolog Jimma Jimma city 8
51 Selam Belay 4 Programming
94 Alem Kebede 3 Cisco Networking AAU Sdist_kilo 7
18 Girma Dereje 1 IP Programming Jimma Jimma city 4
13 Yared Gizaw 7 Java Programming AAu Sdist_kilo 6

Table 1: Example of problems related with Anomalies


Deletion Anomalies:
If employee with ID 16 is deleted then ever information
about skill C++ and the type of skill is deleted from the
database. Then we will not have any information about C++
and its skill type.
Insertion Anomalies:
What if we have a new employee with a skill called Pascal?
We can not decide weather Pascal is allowed as a value for
skill and we have no clue about the type of skill that Pascal
should be categorized as.
Modification Anomalies:
What if the address for Helico is changed from Piazza to
Mexico? We need to look for every occurrence of Helico and
change the value of School_Add from Piazza to Mexico,
which is prone to error.
Functional Dependency (FD): Before moving to the definition
and application of normalization, it is important to have an
understanding of "functional dependency."
Data Dependency
The logical associations between data items that point the
database designer in the direction of a good database design is
called determinant or dependent relationships.
Two data items A and B are said to be in a determinant or
dependent relationship if certain values of data item B always
appears with certain values of data item A. if the data item A is
the determinant data item and B the dependent data item then
the direction of the association is from A to B and not vice
versa. The essence of this idea is that if the existence of
something, call it A, implies that B must exist and have a
certain value, then we say that "B is functionally dependent on A.
We also often express this idea by saying that "A
functionally determines B," or that "B is a function of A," or
that "A functionally governs B." Often, the notions of
functionality and functional dependency are expressed briefly
by the statement, "If A, then B." It is important to note that the
value of B must be unique for a given value of A, i.e., any
given value of A must imply just one and only one value of B,
in order for the relationship to qualify for the name "function“.
The notation is: A→B which is read as; B is functionally
dependent on A. In general, a functional dependency is a
relationship among attributes. In relational databases, we can
have a determinant that governs one or several other
attributes. FDs are derived from the real-world constraints on
the attributes and they are properties on the database intension
not extension.
Partial Dependency
If an attribute which is not a member of the primary key is
dependent on some part of the primary key (if we have composite
primary key) then that attribute is partially functionally dependent
on the primary key.
Let {A,B} is the Primary Key and C is no key attribute. Then if
{A,B}→C and B→C
Then C is partially functionally dependent on {A,B}
Full Functional Dependency
If an attribute which is not a member of the primary key is not
dependent on some part of the primary key but the whole key (if
we have composite primary key) then that attribute is fully
functionally dependent on the primary key.
Let {A,B} be the Primary Key and C is a non- key attribute Then
if {A,B}→C and B→C and A→C does not hold Then C Fully
functionally dependent on {A,B}
Transitive Dependency
In mathematics and logic, a transitive relationship is a
relationship of the following form: "If A implies B, and if also B
implies C, then A implies C."
Example:
If Mr. X is a Human, and if every Human is an Animal, then Mr.
X must be an Animal.
Generalized way of describing transitive dependency is that: If A
functionally governs B, AND If B functionally governs C THEN
A functionally governs C
Provided that neither C nor B determines A i.e. (B /→ A and C
/→ A) In the normal notation:
{(A→B) AND (B→C)} ==> A→C provided that B /→ A and C
/→ A
Normalization
Normalization is a process in database design used to organize
data to reduce redundancy and improve data integrity. It involves
dividing a database into tables and defining relationships between
them to ensure data consistency and efficient querying. The
concept is a cornerstone of the Fundamentals of Database Systems.
Key Goals of Normalization:
 Eliminate Redundancy: Reduce duplicate data to save storage
and avoid inconsistencies.
 Ensure Data Integrity: Maintain data consistency through
well-structured tables.
 Improve Query Performance: Optimize the design for faster
retrieval and updates.
 Simplify Maintenance: Make it easier to maintain and scale
the database over time.
Normal Forms: Normalization is achieved by organizing tables
into different normal forms (NF), each with specific rules.
Steps of Normalization:
Normalization towards a logical design consists of the
following steps:
Un Normalized Form:
Identify all data elements
1. First Normal Form (1NF):
Ensure all attributes contain only atomic (indivisible) values.
Remove repeating groups by creating separate rows for each
unique piece of data.
2. Second Normal Form (2NF):
Achieve 1NF.
Eliminate partial dependency, where a non-prime attribute
depends only on part of a composite primary key.
3. Third Normal Form (3NF):
Achieve 2NF.
Remove transitive dependencies, where non-prime attributes
depend on other non-prime attributes.
4. Fourth Normal Form (4NF):
Achieve BCNF.
Remove multivalued dependencies by ensuring no table contains
two or more independent multivalued facts about an entity.
5. Fifth Normal Form (5NF):
Break down tables further to ensure no redundancy due to join
dependencies.
6. Boyce-Codd Normal Form (BCNF):
A stricter version of 3NF where every determinant is a candidate
key.
7. Domain-Key Normal Form (DKNF):
Ensure every constraint in the database is a logical consequence
of the domain and key constraints.
Table 4: Unnormalized

FirstNam
EmpI e LastNa Skill SkillType School SchoolAdd SkillLev
D me el
Database, Sidist_Kilo
,
SQL, Programmin g AAU, Piazza
VB6
12 Abebe Mekuria Helico 5
Programmin g
Programming
16 Lemma Alemu C++ Unity Gerji 6
IP Jimma Jimma City 4
28
Chane Kebede SQL Database AAU Sidist_Kilo 10
Database
65 Almaz Belay SQL Programming 9
Prolo
g java Programming Helico Piazza 8
Jimma Jimma City 6
AAU Sidist_Kilo
24 Dereje Tamiru Oracle Database Unity Gerji
5
94 Alem Kebede Cisco Networking AAU Sidist_Kilo 7
FIRST NORMAL FORM (1NF)
Remove all repeating groups. Distribute the multi-valued
attributes into different rows and identify a unique identifier
for the relation so that is can be said is a relation in relational
database. Flatten the table.
Definition: a table (relation) is in 1NF If
There are no duplicated rows in the table. Unique identifier
Each cell is single-valued (i.e., there are no repeating
groups).
Entries in a column (attribute, field) are of the same kind.
First Normal Form
Schoo
EmpI FirstNa LastName Skil Skill SkillType l SchoolAdd Ski
D me lID llL
eve
l

12 Abebe Mekuria 1 SQL Database AAU Sidist_Kilo 5


12 Abebe Mekuria 3 VB6 Programming Helico Piazza 8
16 Lemma Alemu 2 C++ Programming Unity Gerji 6
16 Lemma Alemu 7 IP Programming Jimma Jimma City 4
28 Chane Kebede 1 SQL Database AAU Sidist_Kilo 10
65 Almaz Belay 1 SQL Database Helico Piazza 9
65 Almaz Belay 5 Prolog Programming Jimma Jimma City 8
65 Almaz Belay 8 Java Programming AAU Sidist_Kilo 6
24 Dereje Tamiru 4 Oracle Database Unity Gerji 5
94 Alem Kebede 6 Cisco Networking AAU Sidist_Kilo 7
Second Normal form 2NF
No partial dependency of a non-key attribute on part of the primary key. This will result in
a set of relations with a level of Second Normal Form.
Any table that is in 1NF and has a single-attribute (i.e., a non-composite) key is
automatically also in 2NF.
Definition: a table (relation) is in 2NF If
 It is in 1NF and
 If all non-key attributes are dependent on the entire primary key. i.e. no partial
dependency. Example for 2NF:

EMP_PROJ

EmpID EmpName ProjNo ProjName ProjLoc ProjFund ProjMangID Incentive


EMP_PROJ rearranged

EmpID ProjNo EmpName ProjName ProjLoc ProjFund ProjMangI D Incentive


Business rule: Whenever an employee participates in a project,
he/she will be entitled for an incentive
This schema is in its 1NF since we don„t have any repeating
groups or attributes with multi-valued property. To convert it to
a 2NF we need to remove all partial dependencies of non key
attributes on part of the primary key.
{EmpID, ProjNo}→ EmpName, ProjName, ProjLoc,
ProjFund, ProjMangID, Incentive
But in addition to this we have the following dependencies
FD1: {EmpID}→EmpName
FD2: {ProjNo}→ProjName, ProjLoc, ProjFund,
ProjMangID
FD3: {EmpID, ProjNo}→ Incentive
As we can see, some non key attributes are partially
dependent on some part of the primary key. This can be
witnessed by analyzing the first two functional dependencies
(FD1 and FD2). Thus, each Functional Dependencies, with
their dependent attributes should be moved to a new relation
where the Determinant will be the Primary Key for each.
Employee
EmpI D EmpName

Project
ProjNo ProjName ProjLoc ProjFund ProjMangID

Emp_Proj

EmpID ProjNo Incentive


Third Normal Form (3NF)
Eliminate Columns dependent on another non-Primary Key - If
attributes do not contribute to a description of the key; remove them
to a separate table. This level avoids update and deletes anomalies.
Definition: a Table (Relation) is in 3NF If:
 It is in 2NF and
 There are no transitive dependencies between a primary key and
non-primary key attributes. Example for (3NF)
Assumption: Students of same batch (same year) live in one building
or dormitory
StudID Stud F_Name Stud Dept Year Dormitor
L_Name y
Info Sc 1
125/97 Abebe Mekuria 401
Geog 3
654/95 Lemma Alemu 403
CompSc 3
842/95 Chane Kebede 403
InfoSc 1
165/97 Alem Kebede 401
985/95 Almaz Belay Geog 3 403
This schema is in its 2NF since the primary key is a single attribute
and there are no repeating groups (multi valued attributes).
Let„s take StudID, Year and Dormitary and see the dependencies.
StudID→Year AND Year→Dormitary
And Year can not determine StudID and Dormitary can not determine
StudID Then transitively
StudID→Dormitary
To convert it to a 3NF we need to remove all transitive dependencies
of non key attributes on another non-key attribute.
The non-primary key attributes, dependent on each other will be
moved to another table and linked with the main table using Candidate
Key- Foreign Key relationship.
Table: Student
StudID Stud Stud L_Name Dept Year
F_Name
125/97 Abebe Mekuria Info Sc 1
654/95 Lemma Alemu Geog 3
842/95 Chane Kebede CompSc 3
165/97 Alem Kebede InfoSc 1
985/95 Almaz Belay Geog 3

Table: Dorm
Year Dorm
1 401
3 403
Generally, even though there are other four additional levels of
Normalization, a table is said to be normalized if it reaches 3NF. A
database with all tables in the 3NF is said to be Normalized Database.
Mnemonic for remembering the rationale for normalization up to 3NF
could be the following:
1. No Repeating or Redundancy: no repeating fields in the table.
2. The Fields Depend Upon the Key: the table should solely depend
on the key.
3. The Whole Key: no partial key dependency.
4. And Nothing But the Key: no inter data dependency.
5. So Help Me Codd: since Codd came up with these rules.

NB. In Second Normal Form (2NF) Eliminates partial


dependencies while Third Normal Form (3NF)
Eliminates transitive dependencies.

You might also like