NORMALIZATION
3.1 FUNCTIONAL DEPENDENCY
A Functional dependency is a property of the
information represented by a relation. It defines the most
commonly encountered type of relatedness property
between data items of a database.
It is constraint between two attributed or two sets of
attributes. An functional dependency is a property semantic or
meaning of the attributes in a [Link] semantic indicate
how attributes relate to one another and specify the functional
dependency between attributes.
Main Use:
Functional dependency is describe further a relation
schema R by specifying constraints on its attributes.
3.1.1 FUNCTIONAL DEPENDENCY DIAGRAM
In a functional dependency diagram is represented by
rectangles representing attributes and a heavy arrow showing
dependency.
The simplest Functional dependencies: FD: Y --> X. The
left-hand side of the functional dependency is sometimes
called determinant. The Right-hand side is called the
dependent. The determinant and dependent are both sets of
attributes.
Fig: Functional Dependency Diagram
For Example:
Let us consider a functional dependency of relation
R1:BUDGET
FD:{PROJECT} {PROJECT-
BUDGET}
It means that is the BUDGET relation, PROJECT_BUDGET
is functionally dependent on PROJECT, because each project
has one given budget value.
Thus once a project name is known, a unique value of
PROJECT-BUDGET applying functional dependency.
3.1 .2 FULL FUNCTIONAL DEPENDENCY(FFD)
Full functional dependency is used to indicate the
minimum set of attributes in a determinant of a functional
dependency.
The set of attributes X will be fully functionally
dependent on the set of attributes Y if the following
conditions are satisfied.
X is functionally dependent on Y and
X is not functionally dependent on any subset of Y.
Relation: ProjectCost Relation:
EmployeeProject
EmpID ProjectID Days
E088 001 320
E065 002 190
ProjectI
ProjectCost
D
002 5000
002 8000
The above relations states that:
Days are the number of days spent on the project.
FD: { EmpID, ProjectID, ProjectCost} { Days}
However, it is not fully functional dependent. Whereas the
subset {EmpID, ProjectID} can easily determine the {Days}
spent on the project by the employee.
This summarizes and gives our fully functional dependency:
FD: { EmpID, ProjectID} { Days}
3.1.3 ARMSTRONG’S AXIOMS IN FUNCTIONAL
DEPENDENCY
A good relational design such as non-redundant sets of
functional dependencies and complete sets or closure of
functional dependencies .
Non-redundancy and closures occur when new FDs can
be derived from existing FDs.
For Example:
If, X Y and
Y Z then it is also true that
X Z
This derivation, if a given value of X determines a unique
value of Y and this value of Y in turn determines a unique a
unique value of Z, the value of X will also determine this
value of Z. Conversely, it is possible for a set of FDs to
contain some redundant functional dependencies.
Armstrong’s axioms:
A table T and that all sets of attributes X,Y,Z are
contained in the heading of T. Then following are set of a
inference rules called Armstrong’s axioms.
It to derive one functional dependency from other functional
dependencies:
Rule 1: Reflexivity: If, Y⊆ X, then X Y.
Rule 2: Augmentation: If, X Y, then XZ YZ.
Rule 3: Transitivity: If, X Y and Y Z, then X
Z.
Rule 4: Self-determination: X X
Rule 5: Pseudo-transitivity: If X Y and YW Z,
then XW Z.
Rule 6: Union or additive: If X Z and X Y, then
X YZ.
Rule 7:Decomposition or Projective: If X YZ, then
X Y and X Z.
Rule 8: Composition : If X Y and Z W, then XZ
YW.
Rule 9: Self accumulation: If X YZ and Z W,
then X YZW.
3.2 NORMALIZATION
INTRODUCTION
Relational database tables derived from ER models or
from some other design method, suffer from serious problems
in terms of performance, integrity and maintainability. A large
database defined as a single table, results into a large amount
of redundant data.
Storing of large numbers of values of redundant nature
can result in lengthy search operations for just a small number
of target rows. It can result in long and expensive updates.
For Example:
Relation: STUDENT_INFO
STU_NAM COURSE_I PHONE_N
SUBJECT
E D O
Mobile
Abi CS-101 9841758596
Computing
Data
Communicatio
John BCA-201 8855446633
n&
Networking
Operating
Rajesh CS-102 9944557722
System
Software
John BCA-201 8855446633
Engineering
Mobile
Ragu BCA-205 9003344778
Computing
Abi CS-101 Data Mining 9841758596
The above table STUDENT_INFO is not a good design. For
example STU_NAME "Abi" and "John" have repetitive
and PHONE_NO information. This data redundancy or
repetition waste storage space and leads to then loss of data
integrity in the database.
A good database design with minimum redundancy, necessary
to represent the semantics of the database, minimizes the
storage needed to store a database.
Normalization is a process of decomposing a set of relation
with anomalies to produce smaller and well structured
relations that contain minimum or no redundancy.
Definition: Normalization
Database normalization is the process of removing
redundant data from your tables to improve storage
efficiency, data integrity, and scalability.
In the relational model, methods exist for quantifying
how efficient a database is. These classifications are
called normal forms (or NF), and there are algorithms for
converting a given database between them.
Normalization generally involves splitting existing tables
into multiple ones, which must be re-joined or linked
each time a query is issued.
Need of Normalization
Eliminates redundant data
Reduces chances of data errors
Reduces disk space
Improve data integrity, scalability and data consistency.
Anomalies in DBMS
There are three types of anomalies that occur when the
database is not normalized. These are insertion, update and
deletion anomaly.
Example: Suppose a manufacturing company stores the
employee details in a table named employee that has four
attributes: emp_id for storing employee's id, emp_name for
storing employee's name, emp_address for storing
employee's address and emp_dept for storing the department
details in which the employee works.
At some point of time the table looks like this:
emp_ id emp_name emp_addres emp_ dept
s
201 Ram Delhi D001
201 Ram Delhi D002
123 Sunil Mumbai D890
177 Iniyha Chennai D900
177 Iniyha Chennai D004
The above table is not normalized. we will see the problems
that we face when a table is not normalized.
Update anomaly: In the above table we have two rows for
employee Ram as he belongs to two departments of the
company. If we want to update the address of Ram then we
have to update the same in two rows or the data will become
inconsistent.
If somehow, the correct address gets updated in one
department but not in other then as per the database, Ram
would be having two different addresses, which is not correct
and would lead to inconsistent data.
Insert anomaly: Suppose, a new employee joins the
company, who is under training and currently not assigned to
any department then we would not be able to insert the data
into the table if emp_dept field doesn't allow nulls.
Delete anomaly: Suppose, if at a point of time the company
closes the department D890 then deleting the rows that are
having emp_dept as D890 would also delete the information
of employee Sunil since she is assigned only to this
department.
To overcome these anomalies we need to normalize the data.
The most commonly used normal forms:
3.2.1 TYPES OF NORMALIZATION:
\
Fig: Types of Normalization
3.2.1 First Normal Form(1NF)
3.2.2 Second Normal Form(2NF)
3.2.3 Third Normal Form(3NF)
3.2.4 BCNF
3.2.5 Fourth Normal Form(4NF)
3.2.6 Fifth Normal Form(5NF)
3.2.1 FIRST NORMAL FORM(1NF)
A relation R is in first normal (1 NF) if and only if it
does not contain any composite or multi valued attributes
or their combinations.
Table Name: Employee
Example: Suppose a company wants to store the names and
contact details of its employees. It creates a table that look
like this:
emp_ id emp_name emp_addres emp_ mobile
s
101 Ragu Delhi 8912312390
102 John Mumbai 8812121212
9900012222
103 Babu Chennai 7778881212
104 Ashok Bangalore 9990000123
8123450987
Two employee (John & Ashok) are having two mobile
numbers so the company stored them in the same field as you
can see in the table above.
This table is not in 1NF as the rule says "each attribute
of a table must have atomic(single) values", the emp_mobile
values for employees John & Ashok violates that rule.
To make the table complies with 1NF we should have the data
like this:
emp_ id emp_name emp_address emp_mobile
101 Ragu Delhi 8912312390
102 John Mumbai 8812121212
102 John Mumbai 9900012222
103 Babu Chennai 7778881212
104 Ashok Bangalore 9990000123
104 Ashok Bangalore 8123450987
3.2.2 SECOND NORMAL FORM(2NF)
For a table to be in the Second Normal Form,
It should be in the First Normal Form.
And, It should not have partial
Dependency.
Table Name: School
Example: Suppose a school wants to store the data of
teachers and the subjects they teach. They create a table that
looks like this: Since a teacher can teach more than one
subjects, the table can have multiple rows for a same teacher.
teacher_id Subject teacher_ age
111 Maths 38
111 Physics 38
222 Computer science 38
333 Physics 40
333 Chemistry 40
Candidate Keys: {teacher_id, subject}
Non prime attribute: teacher_age
The table is in 1NF because each attribute has atomic
values. However, it is not in 2NF because non prime
attribute teacher_age is dependent on teacher_id alone
which is a proper subset of candidate key. This violates the
rule for 2NF as the rule says "no non-prime attribute is
dependent on the proper subset of any candidate key of the
table".
To make the table complies with 2NF we can break it in two
tables like this:
teacher_details table:
teacher_id teacher_age
111 38
222 38
333 40
teacher_subject table:
teacher_i Subject
d
111 Maths
111 Physics
222 Computer science
333 Physics
333 Chemistry
Now the tables comply with Second normal form(2NF).
3.2. 3 THIRD NORMAL FORM(3NF)
A table is said to be in the Third Normal Form when,
It should be in the Second Normal Form.
And, it doesn't have "Transitive Dependency".
Example:
Suppose a company wants to store the complete address of
each employee, they create a table named employee_details
that looks like this:
employee_details table:
emp_i emp_na emp_zi emp_sta emp_ci emp_distri
d me p te ty ct
1001 John 382005 UP Agra DayalBagh
1002 Anu 322008 TN Chenna M-city
i
1006 Geetha 382007 TN Chenna Urrapakka
i m
1101 Priya 392008 UK Pauri Bhagwan
1201 Sai 322999 MP Gwalior Ratan
Super Keys: {emp_id}, {emp_id, emp_name}, {emp_id,
emp_name, emp_zip}...so on
Candidate Keys: {emp_id}
Non- prime attributes: all attributes except emp_id are non-
prime as they are not part of any candidate keys.
Here, emp_state, emp_city& emp_district dependent on
emp_zip. And, emp_zip is dependent on emp_id that makes
non-prime attributes (emp_state, emp_city & emp_district)
transitively dependent on super key (emp_id). This violates
the rule of 3NF.
To make this table complies with 3NF we have to break the
table into two tables to remove the transitive dependency:
employee table:
emp_id emp_name emp_zip
1001 John 282005
1002 Anu 222008
1006 Geetha 282007
1101 Priya 292008
1201 Sai 222999
employee_zip table:
emp_zip emp_state emp_city emp_district
382005 UP Agra DayalBagh
322008 TN Chennai M-City
382007 TN Chennai Urrapakkam
392008 UK Pauri Bhagwan
322999 MP Gwalior Ratan
3.2.4 BOYCE AND CODD NORMAL FORM (BCNF)
Boyce and Codd Normal Form is a higher version of the
Third Normal Form. This form deals with certain type of
anomaly that is not handled by 3NF. A 3NF table which does
not have multiple overlapping candidate keys is said to be in
BCNF. For a table to be in BCNF, following conditions must
be satisfied:
It must be in 3rd Normal Form
and, for each functional dependency (X→Y), X
should be a super key.
Example: Suppose there is a company where in employee
work in more than one department. They store the data like
this:
emp_i emp_national emp_de dept_ty dept_no_of_e
d ity pt pe mp
1001 Indian Producti D001 300
on and
planning
1001 Indian Stores D001 350
1002 American Design D134 200
and
technical
support
1002 American Purchasi D134 800
ng
departme
nt
Functional dependencies in the table above:
emp_id→ emp_nationality
emp_dept→ {dept_type, dept_no_of_emp}
Candidate key: {emp_id, emp_dept}
The table is not in BCNF as neither emp_id nor
emp_dept alone are keys.
To make the table comply with BCNF we can break the
table in three tables like this:
emp_nationality table:
emp_id emp_nationality
1001 Indian
1002 American
emp_dept table:
emp_dept dept_type dept_no_of_emp
Production and planning D001 300
Stores D001 350
Design and technical D134 200
support
Purchasing department D134 800
emp_dept_mapping table:
emp_id emp_dept
1001 Production and planning
1001 Stores
1002 Design and technical support
1002 Purchasing department
Function dependencies:
emp_id → emp_ nationality
emp_dept→ {dept_type, dept_no_of_emp}
Candidate key:
For first table: emp_id
For second table: emp_dept
For Third table: {emp_id, emp_dept}
This is now in BCNF as in both the functional dependencies
left side part is a key.
3.2.5 COMPARISION BETWEEN 3NF AND BCNF
3NF BCNF
It proposed by Edger [Link] It Proposed [Link]
and Edgar [Link]
jointly proposed
The database is said to be in 2NF if Database is said to be is
and only if, in BCNF if and only if:
It must be 1NF AND It is already is in
There is no partial 2NF and
dependency Every determinant
is Candidate key
It concentates on the primary key It concentrates on all
candidate keys
Redundancy is high Redundancy is low
It may preserve all dependencies It may not preserve all
Functional dependency
A normal form that is used in A Normal form used in
normalizing a database design to database normalization.
reduce the deplication of data and Which is a slifhtly
ensure that the entity is 2NF and stronger version of the
all the attributes in a table are 3NF.
determined only by the candidate
keys of that relaton and not by any
non-prime attributes.
The table should be in 2NF, and The prime attributes of
there shouldn't be any transitive the table should not
dependencies to satisfy 3NF. depend on the non-
prime attributes of the
table to satisfy BCNF.
Loss less join decomposition can Sometimes Loss-less
be achived join decomposition
cannot be achieved.
3.2.6 FOURTH NORMAL FORM (4NF)
Tables cannot have multi-valued dependencies on a Primary
key.
For a table to satisfy the Fourth Normal Form, it should
satisfy the following two conditions:
[Link] should be in the Boyce-Codd Normal Form.
2. And, the table should not have any Multi-valued
Dependency.
What is Multi-valued Dependency?
A table is said to have multi-valued dependency, if the
following conditions are true,
[Link] a dependency A → B, if for a single value of A,
multiple value of B exists, then the table may have multi-
valued dependency.
2. Also, a table should have at-least 3 columns for it to
have a multi-valued dependency.
3. And, for a relation R(A,B,C), if there is a multi-valued
dependency between, A and B, then B and C should be
independent of each other. If all these conditions are true for
any relation(table), it is said to have multi-valued dependency.
For Example:
Below we have a college enrolment table with
columns S_ID , COURSE & HOBBY.
S_ID COURSE HOBBY
1 .NET Cricket
1 RDBMS Hockey
2 C# Cricket
2 PHP Hockey
As you can see in the table above, student with S_ID 1
has opted for two courses, .NET & RDBMS and has two
hobbies, Cricket and Hockey.
Well the two records for student with S_ID 1 , will
give rise to two more records, as shown below, because for
one student, two hobbies exists, hence along with both the
courses, these hobbies should be specified.
S_ID COURSE HOBBY
1 .NET Cricket
1 RDBMS Hockey
1 .NET Cricket
1 RDBMS Hockey
And, in the table above, there is no relationship between the
columns COURSE and HOBBY . They are independent of
each other.
So there is multi-value dependency, which leads to un-
necessary repetition of data and other anomalies as well.
How to satisfy 4th Normal Form?
To make the above relation satisfy the 4th normal form, we
can decompose the table into 2 tables.
For Example:
Table Name: Course
S_ID COURSE
1 .NET
1 RDBMS
2 C#
2 PHP
Table Name: Course
S_ID HOBBY
1 Cricket
1 Hockey
2 Cricket
2 Hockey
Now this relation satisfies the fourth normal form.
A table can also have functional dependency along with
multi-valued dependency. In that case, the functionally
dependent columns are moved in a separate table and the
multi-valued dependent columns are moved to separate
tables.
3.2.7 FIFTH NORMAL FORM (5NF)
For a table to satisfy the Fifth Normal Form, it should satisfy
the following two conditions:
[Link]'s in 4NF
2. If we can decompose table further to eliminate
redundancy and anomaly, and when we re-join the
decomposed tables by means of candidate keys, we
should not be losing the original data or any new record
set should not arise.
In simple words, joining two or more decomposed table
should not lose records nor create new records.
For Example:
Note: Please consider that Semester 1 has Computer
Science, Mathematics and Physics and Semester 2 has only
Mathematics in its academic year.
Table Name: Course
SUBJECT LECTURER CLASS
Computer Ragu Semester 1
Science
Computer Rose Semester 1
Science
Mathematics Rose Semester 1
Mathematics John Semester 2
Physics Arun Semester 1
In above table, Rose takes both Computer Science and
Mathematics class for Semester 1, but she does not take
Mathematics class for Semester 2. In this case, combination
of all these 3 fields is required to identify a valid data.
Imagine we want to add a new class - Semester3 but do not
know which Subject and who will be taking that subject.
We would be simply inserting a new entry with Class as
Semester3 and leaving LECTURER and SUBJECT as
NULL. As we discussed above, it's not a good to have such
entries. Moreover, all the three columns together act as a
primary key, we cannot leave other two columns blank.
Hence we have to decompose the table in such a way that it
satisfies all the rules till 4NF and when join them by using
keys, it should yield correct record. Here, we can represent
each lecturer's Subject area and their classes in a better way.
We can divide above table into three - (SUBJECT,
LECTURER), (LECTURER, CLASS), (SUBJECT,
CLASS)
SUBJECT LECTURER
Computer Ragu
Science
Computer Rose
Science
Mathematics Rose
Mathematics John
Physics Arun
CLASS LECTURER
Semester Ragu
1
Semester Rose
1
Semester Rose
1
Semester John
2
Semester Arun
1
SUBJECT CLASS
Computer Semester 1
Science
Computer Semester 1
Science
Mathematics Semester 1
Mathematics Semester 2
Physics Semester 1
Now, each of combinations is in three different tables. If we
need to identify who is teaching which subject to which
semester, we need join the keys of each table and get the
result.
For example: who teaches Mathematics to Semester 1, we
would be selecting Mathematics and Semester1 from table 3
above, join with table1 using Subject to filter out the lecturer
names.
Then join with table2 using Lecturer to get correct lecturer
name. That is we joined key columns of each table to get
the correct data. Hence there is no lose or new data -
satisfying 5NF condition.
COMPARISION BETWEEN 3NF AND BCNF
3NF BCNF
It proposed byEdger [Link] It Proposed [Link]
and Edgar [Link]
jointly proposed
The database is said to be in 2NF if Database is said to be is
and only if, in BCNF if and only if:
It must be 1NF AND It is already is in
There is no partial 2NF and
dependency Every determinant
is Candidate key
It concentates on the primary key It concentrates on all
candidate keys
Redundancy is high Redundancy is low
It may preserve all dependencies It may not preserve all
Functional dependency
A normal form that is used in A Normal form used in
normalizing a database design to database normalization.
reduce the deplication of data and Which is a slifhtly
ensure that the entity is 2NF and stronger version of the
all the attributes in a table are 3NF.
determined only by the candidate
keys of that relaton and not by any
non-prime attributes.
The table should be in 2NF, and The prime attributes of
there shouldn't be any transitive the table should not
dependencies to satisfy 3NF. depend on the non-
prime attributes of the
table to satisfy BCNF.
Loss less join decomposition can Sometimes Loss-less
be achived join decomposition
cannot be achieved.