1|Unit-III
NORMALIZATION
Introduction : Normalization is the process of organizing data in a database. This includes
creating tables and establishing relationships between those tables according to rules designed
both to protect the data and to make the database more flexible by eliminating redundancy and
inconsistent dependency.
Actually it is a part of design process. This is used to design tables in which data
redundancies are minimized and memory is saved. It is also defined by step by step
decomposition of complex records into simple records. This normalization works through a
series of stages or forms called Normal Forms. Hence it is a theory built around the concept of
normal forms. The most commonly used normal forms are 1NF, 2NF, and 3NF. From the
structural point of view higher normal form is better than lower normal form because those
forms yield relatively few data redundancies in the database. So 3NF is better than 2NF which is
better than 1NF. Almost all business design used 3NF as ideal normal form.
Unnormalized Relation : It is a relation or table in which each row has some repeating
information. The relational model does not support such unnormalized relations in the database.
For example consider a construction company that manages several building projects. Each
project has its own project number, project name, employees assigned to the project and so on.
The table looks like as follows:
PNO PNAME EMPNO ENAME JOB CH HOURS
HOUR WORKED
1 Evergreen 103 JE Arbough Analyst 84.5 23
101 JG News Designer 105.00 19
105 A. Johnson Programer 35.75 12
2 Rolling tide 114 A. Jones Designer 105.00 24
104 K. Ramoras Analyst 84.5 32
3 Star flight 114 A. Jones Designer 105.00 33
101 JG News Designer 105.00 56
112 M. Smith Analyst 84.5 41
2|Unit-III
Functional Dependency (FD) : Functional dependency is a relationship that exists when
one attribute uniquely determines another attribute.
If R is a relation with attributes X and Y, a functional dependency between the attributes
is represented as X Y, which specifies Y is functionally dependent on X. Here X is a
determinant set and Y is a dependent attribute. Each value of X is associated precisely with one
Y value.
Functional dependency in a database serves as a constraint between two sets of attributes.
Defining functional dependency is an important part of relational database design and contributes
to aspect normalization.
First Normal Form(1NF) : In order to maintain a relation in relational database it must be
at least in First Normal Form (1NF). A relational table is said to be in 1NF when it does not
contain repeating groups and every value should be atomic value. It can be defined in various
ways as
Definition 1 : A table is in 1NF if and only if it satisfies the following rules.
a) All the key attributes are defined.
b) There are no repeating groups in the table. i.e., each row/column intersection can contain
one and only one value rather than set of values.
c) All attributes are dependent on the primary key.
Definition 2 : A table is in 1NF if and only if
a) It should contain only atomic values
b) Every non key attribute is functionally dependent on key attribute.
After converting the above unnormalized table into 1NF it will become
Table name : PROJECTS (PNO, EMPNO, PNAME, ENAME, CH HOUR,HOURS WORKED)
PNO EMPNO PNAME ENAME JOB CH HOUR HOURS
WORKED
3|Unit-III
PNO PNAME EMPNO ENAME JOB CH HOURS
HOUR WORKED
1 Evergreen 103 JE Arbough Analyst 84.5 23
1 Evergreen 101 JG News Designer 105.00 19
1 Evergreen 105 A. Johnson Programer 35.75 12
2 Rolling tide 114 A. Jones Designer 105.00 24
2 Rolling tide 104 [Link] Analyst 84.5 32
3 Star flight 114 A. Jones Designer 105.00 33
3 Star flight 101 JG News Designer 105.00 56
3 Star flight 112 M. Smith Analyst 84.5 41
After converting the above table into 1NF, it has a primary key of (PNO, EMPNO) which
is a composite key. Even though the table in 1NF it also has some redundancies which yield the
following anomalies.
1. Insertion Anomaly : We can’t insert an employee until employee was assigned to at
least one project.
2. Deletion Anomaly : If we delete an employee from a project, such deletions may also
cause the lost of other vital data. For example worked hours of employee in that project.
3. Updation Anomaly : Modifying an attributes for a particular employee requires many
alterations, one for each entry.
So even though a table is in 1NF, it can still contain both partial and transitive dependencies.
Partial Dependency : It is the dependency in which an attribute is functionally dependent on
only a part of multi attribute primary key. For example pname attribute is dependent on pno only
which is a part of primary key in the above table. This partial dependency can’t be tolerated
because a table that contains such dependency is still subject to data redundancies and various
anomalies. Therefore this dependency should be removed. After removing this dependency 1NF
becomes 2NF.
Fully Functional Dependency (FFD) : If R is a relation with attributes X and Y, a fully
functional dependency between the attributes is represented as X Y, which specifies Y is fully
4|Unit-III
functionally dependent on X when Y value is dependent on entire set value of X but not
depending on any proper subset of X.
Alternative Examples for First Normal Form :
o A relation will be 1NF if it contains an atomic value.
o It states that an attribute of a table cannot hold multiple values. It must hold only
single-valued attribute.
o First normal form disallows the multi-valued attribute, composite attribute, and their
combinations.
Example: Relation EMPLOYEE is not in 1NF because of multi-valued attribute EMP_PHONE.
EMPLOYEE table:
EMP_ID EMP_NAME EMP_PHONE EMP_STATE
14 John 7272826385, UP
9064738238
20 Harry 8574783832 Bihar
12 Sam 7390372389, Punjab
8589830302
The decomposition of the EMPLOYEE table into 1NF has been shown below:
EMP_ID EMP_NAME EMP_PHONE EMP_STATE
14 John 7272826385 UP
14 John 9064738238 UP
20 Harry 8574783832 Bihar
12 Sam 7390372389 Punjab
5|Unit-III
12 Sam 8589830302 Punjab
Second Normal Form(2NF) : It should be noted that for a table to be in 2NF, it should also
be in 1NF and every non key attribute should functionally dependent on primary key. This can
be defined in various ways.
Definition 1 : A table is said to be in 2NF if and only if
a) It is in 1NF.
b) It does not have any partial dependencies. i.e., no attribute is dependent on a portion of
primary key.
Definition 2 : A table is said to be in 2NF if and only if
a) It is in 1NF.
b) Every non key attribute is fully functionally dependent on primary key.
The conversion of 1NF to 2NF is so simple and it includes the following steps :
1. Write each key component on separate line and then write the original key on the last
line.
2. Write the dependent attributes for each key component.
Then each line represents a new table which satisfies the requirements of 2NF. For example
by converting the above 1NF table in to 2NF, the following tables are formed.
Table 1 : PROJECTS(PNO, PNAME)
PNO PNAME
Table 2 : EMP(EMPNO, ENAME, JOB, CH HOUR)
EMPNO ENAME JOB CH HOUR
6|Unit-III
Table 3 : ASSIGN(PNO, EMPNO, HOURS WORKED)
PNO EMPNO HOURS WORKED
Even though the 2NF removes the partial dependency from the database, the table EMP
of database can still contain transitive dependency. i.e., JOB determines the CH HOUR attribute
which can still leads to the data anomalies and hence it should be converted into next higher
normal form i. e 3NF.
Alternative Examples for Second Normal Form :
o In the 2NF, relational must be in 1NF.
o In the second normal form, all non-key attributes are fully functional dependent on
the primary key
Example: Let's assume, a school can store the data of teachers and the subjects they teach.
In a school, a teacher can teach more than one subject.
TEACHER table:
TEACHER_ID SUBJECT TEACHER_AGE
25 Chemistry 30
25 Biology 30
47 English 35
83 Math 38
83 Computer 38
In the given table, non-prime attribute TEACHER_AGE is dependent on TEACHER_ID which
is a proper subset of a candidate key. That's why it violates the rule for 2NF.
7|Unit-III
To convert the given table into 2NF, we decompose it into two tables:
TEACHER_DETAIL table:
TEACHER_ID TEACHER_AGE
25 30
47 35
83 38
TEACHER_SUBJECT table:
TEACHER_ID SUBJECT
25 Chemistry
25 Biology
47 English
83 Math
83 Computer
Transitive Dependency(TD) : It is one type of dependency in which a non key attribute is
functionally dependent on another non key attribute. For example in the above table JOB is a
non key attribute which will determine another non key attribute i.e., CH HOUR.
Third Normal Form(3NF) : The data anomalies caused by the transitive dependencies in
the table are easily eliminated by converting it into 3NF. The 3NF can also be defined in various
8|Unit-III
ways.
Definition 1 : A table is said to be in 3NF if and only if
1. It is in 2NF
2. It contains no transitive dependencies.
Definition 2 : A table is said to be in 3NF when
1. It is in 2NF
2. Every non key attribute is non transitively dependent on key attribute only.
In the above given example EMP table shows the transitive dependency between JOB
AND CH HOUR and hence by eliminating the transitive dependency, the database is having the
following tables.
Table 1 : PROJECTS(PNO, PNAME)
PNO PNAME
9|Unit-III
Table 2 : EMP(EMPNO, ENAME, JOB)
EMPNO ENAME JOB
Table 3 : ASSIGN(PNO, EMPNO, HOURS WORKED)
PNO EMPNO HOURS WORKED
Table 4 : JOB (JOB, CH HOUR)
JOB CH HOUR
Alternative Examples for Third Normal Form :
o A relation will be in 3NF if it is in 2NF and not contain any transitive partial
dependency.
o 3NF is used to reduce the data duplication. It is also used to achieve the data
integrity.
o If there is no transitive dependency for non-prime attributes, then the relation must
be in third normal form.
A relation is in third normal form if it holds atleast one of the following conditions for every
non-trivial function dependency X → Y.
1. X is a super key.
2. Y is a prime attribute, i.e., each element of Y is part of some candidate key.
10 | U n i t - I I I
Example:
EMPLOYEE_DETAIL table:
EMP_ID EMP_NAME EMP_ZIP EMP_STATE EMP_CITY
222 Harry 201010 UP Noida
333 Stephan 02228 US Boston
444 Lan 60007 US Chicago
555 Katharine 06389 UK Norwich
666 John 462007 MP Bhopal
Super key in the table above:
1. {EMP_ID}, {EMP_ID, EMP_NAME}, {EMP_ID, EMP_NAME, EMP_ZIP}....so on
Candidate key: {EMP_ID}
Non-prime attributes: In the given table, all attributes except EMP_ID are non-
prime.
Here, EMP_STATE & EMP_CITY dependent on EMP_ZIP and EMP_ZIP dependent on
EMP_ID. The non-prime attributes (EMP_STATE, EMP_CITY) transitively dependent on
super key(EMP_ID). It violates the rule of third normal form.
That's why we need to move the EMP_CITY and EMP_STATE to the new
<EMPLOYEE_ZIP> table, with EMP_ZIP as a Primary key.
11 | U n i t - I I I
EMPLOYEE table:
EMP_ID EMP_NAME EMP_ZIP
222 Harry 201010
333 Stephan 02228
444 Lan 60007
555 Katharine 06389
666 John 462007
EMPLOYEE_ZIP table:
EMP_ZIP EMP_STATE EMP_CITY
201010 UP Noida
02228 US Boston
60007 US Chicago
06389 UK Norwich
462007 MP Bhopal
12 | U n i t - I I I
BOYCE CODD Normal Form (BCNF) :
Consider a table with 4 attributes like A, B, C, D. and the functional dependencies as
given below :
C B
A B C D
In the second statement, a non key attribute C determines a part of key attribute. There is
no name to this type of dependency and BOYCE discussed about this case with CODD and
introduced a new normal form which is BOYCE-CODD Normal Form(BCNF).
Definition : A table is said to be in BCNF if and only if
1. It is in 3NF
2. Every determinant in the table should be a candidate key.
BCNF is more stronger than 3NF. For example consider the following table which is in 3NF
but not in BCNF.
13 | U n i t - I I I
Table : (STU_ID, STAFF_ID, CLASS_ID, GRADE)
STU_ID STAFF_ID CLASS_ID GRADE
125 25 21334 A
125 20 32455 C
135 20 28458 B
144 25 27563 C
144 20 32455 B
We can observe the following things from the above table.
1. Each course might generate many classes and each class is identified by its unique class
code.
2. A student can take many classes. For example student 125 has taken both classes 21334
and 32455 and earning the grades A and C respectively.
3. A staff member can take or teach many classes. But each class is taught by only one staff
member. For example teacher 20 teaches the classes 32455 and 28458.
Therefore from the above points we have observed the following dependencies.
STU_ID STAFF_ID CLASS_ID GRADE
CLASS_ID STAFF_ID
The second FD shows that the given table is in 3NF but not in BCNF and hence by
converting the given table into BCNF, then the given table can be splitted into the following
tables.
14 | U n i t - I I I
Table 1 : (CLASS_ID, STAFF_ID)
CLASS_ID STAFF_ID
Table 2 : (STU_ID, CLASS_ID, GRADE)
STU_ID CLASS_ID GRADE
Alternative Examples for BCNF :
o BCNF is the advance version of 3NF. It is stricter than 3NF.
o A table is in BCNF if every functional dependency X → Y, X is the super key of the
table.
o For BCNF, the table should be in 3NF, and for every FD, LHS is super key.
Example: Let's assume there is a company where employees work in more than one
department.
EMPLOYEE table:
EMPLOYEE table:
EMP_ID EMP_COUNTRY EMP_DEPT DEPT_TYPE EMP_DEPT_NO
264 India Designing D394 283
264 India Testing D394 300
364 UK Stores D283 232
364 UK Developing D283 549
15 | U n i t - I I I
Candidate key: {EMP-ID, EMP-DEPT}
In the above table Functional dependencies are as follows:
1. EMP_ID → EMP_COUNTRY
2. EMP_DEPT → {DEPT_TYPE, EMP_DEPT_NO}
EMP_COUNTRY table:
EMP_ID EMP_COUNTRY
264 India
264 India
EMP_DEPT table:
EMP_DEPT DEPT_TYPE EMP_DEPT_NO
Designing D394 283
Testing D394 300
Stores D283 232
Developing D283 549
EMP_DEPT_MAPPING table:
EMP_ID EMP_DEPT
D394 283
D394 300
D283 232
16 | U n i t - I I I
D283 549
Functional dependencies:
1. EMP_ID → EMP_COUNTRY
2. EMP_DEPT → {DEPT_TYPE, EMP_DEPT_NO}
Candidate keys:
Forthe first table: EMP_ID
For the second table: EMP_DEPT
For the third table: {EMP_ID, EMP_DEPT}
Now, this is in BCNF because left side part of both the functional dependencies is a key.
Given a table with 3 attributes like A, B, C. The multi valued dependency A B holds
in the table if and only if the set of B values matching a given (A-value, C-value) pair in that
table depends only on A-value and independent of C-value or depends on C-value and
independent of A-value.
Fourth Normal Form (4NF) : It is defined as follows :
A table is in 4NF if and only if
1. It is in BCNF.
2. It does not have multi valued dependencies.
For example consider the unnormalized relation containing information about
COURSES, TEACHERS, TEXT BOOKS. Each record in that relation consists COURSE name,
a repeating group of teachers and a repeating group of text books. It is given as follows :
COURSE TEACHER TEXT
Physics Prof. Brown Basic mechanics
Prof. Green Principles of Optics
17 | U n i t - I I I
Maths Prof. White Modern Algebra
Projective Geometry
After converting it into normalized form, it will be looked like as follows :
COURSE TEACHER TEXT
Physics Prof. Brown Basic Mechanics
Physics Prof. Brown Principles of Optics
Physics Prof. Green Basic Mechnics
Physics Prof. Green Principles of Optics
Maths Prof. White Modern Algebra
Maths Prof. White Projective Geometry
The meaning of normalized relation is that each tuple represents that the course can be
taught by the teacher and uses the text books as reference and also the teacher uses all the
indicated text books.
Even though the above normalized table contains good deal of redundancy but leading to
problems over update operations. This is because of multi valued dependencies. For example to
add information that Physics uses a new text book, then it is necessary to create two new tuples,
one for each of two teachers. Hence it is clear that for course C and text book X, the set of
teachers T matching the pair (C, X) depends on C alone but not X. i.e., it makes no difference
whatever the value of X. Hence it shows multi valued dependency. To eliminate this type of
dependency the above table should be converted into next higher normal form i.e., 4NF.
By converting the above table into 4NF, it will become as
18 | U n i t - I I I
COURSE TEACHER
COURSE TEXT
Physics Prof. Brown
Physics Basic Mechanics
Physics Prof. Green
Physics Principles of Optics
Maths Prof. White
Maths Modern Algebra
Maths Projective Geometry
Join Dependency (JD) : It is denoted by JD (R1,R2,….,Rn) specified on relation schema R,
specifies a constraint on states r of R. The constraint states that every r of R should have a non
additive join decomposition into R1,R2,….Rn. i.e., for every such r we have
(πR1(r), πR2(r), π R3(r),……. πRn(r))=r
o Join decomposition is a further generalization of Multivalued dependencies.
o If the join of R1 and R2 over C is equal to relation R, then we can say that a join
dependency (JD) exists.
o Where R1 and R2 are the decompositions R1(A, B, C) and R2(C, D) of a given
relations R (A, B, C, D).
o Alternatively, R1 and R2 are a lossless decomposition of R.
o A JD ⋈ {R1, R2,..., Rn} is said to hold over a relation R if R1, R2,....., Rn is a lossless-join
decomposition.
o The *(A, B, C, D), (C, D) will be a JD of R if the join of join's attribute is equal to the
relation R.
o Here, *(R1, R2, R3) is used to indicate that relation R1, R2, R3 and so on are a JD of R.
19 | U n i t - I I I
Fifth Normal Form (5NF) : A relation schema R is in 5NF with respect to a set F of
functional, multivalued, and join dependencies for every nontrivial join dependency
JD(R1,R2,…,Rn) in F and every Ri is a super key of R.
Below are the points :
o A relation is in 5NF if it is in 4NF and not contains any join dependency and joining
should be lossless.
o 5NF is satisfied when all the tables are broken into as many tables as possible in
order to avoid redundancy.
o 5NF is also known as Project-join normal form (PJ/NF).
Example
SUBJECT LECTURER SEMESTER
Computer Anshika Semester 1
Computer John Semester 1
Math John Semester 1
Math Akash Semester 2
Chemistry Praveen Semester 1
In the above table, John takes both Computer and Math class for Semester 1 but he doesn't
take Math class for Semester 2. In this case, combination of all these fields required to
identify a valid data.
Suppose we add a new Semester as Semester 3 but do not know about the subject and
who will be taking that subject so we leave Lecturer and Subject as NULL. But all three
columns together acts as a primary key, so we can't leave other two columns blank.
So to make the above table into 5NF, we can decompose it into three relations P1, P2 & P3:
20 | U n i t - I I I
P1
SEMESTER SUBJECT
Semester 1 Computer
Semester 1 Math
Semester 1 Chemistry
Semester 2 Math
P2
SUBJECT LECTURER
Computer Anshika
Computer John
Math John
Math Akash
Chemistry Praveen
P3
21 | U n i t - I I I
SEMSTER LECTURER
Semester 1 Anshika
Semester 1 John
Semester 1 John
Semester 2 Akash
Semester 1 Praveen
Relational Decomposition
o When a relation in the relational model is not in appropriate normal form then the
decomposition of a relation is required.
o In a database, it breaks the table into multiple tables.
o If the relation has no proper decomposition, then it may lead to problems like loss of
information.
o Decomposition is used to eliminate some of the problems of bad design like
anomalies, inconsistencies, and redundancy.
22 | U n i t - I I I
o Properties of Relational Decomposition
Lossless Decomposition
o If the information is not lost from the relation that is decomposed, then the decomposition
will be lossless.
o The lossless decomposition guarantees that the join of relations will result in the same
relation as it was decomposed.
o The relation is said to be lossless decomposition if natural joins of all the decomposition give
the original relation.
Example:
EMPLOYEE_DEPARTMENT table:
EMP_ID EMP_NAME EMP_AGE EMP_CITY DEPT_ID DEPT_NAME
22 Denim 28 Mumbai 827 Sales
33 Alina 25 Delhi 438 Marketing
23 | U n i t - I I I
46 Stephan 30 Bangalore 869 Finance
52 Katherine 36 Mumbai 575 Production
60 Jack 40 Noida 678 Testing
The above relation is decomposed into two relations EMPLOYEE and DEPARTMENT
EMPLOYEE table:
EMP_ID EMP_NAME EMP_AGE EMP_CITY
22 Denim 28 Mumbai
33 Alina 25 Delhi
46 Stephan 30 Bangalore
52 Katherine 36 Mumbai
60 Jack 40 Noida
DEPARTMENT table
DEPT_ID EMP_ID DEPT_NAME
827 22 Sales
438 33 Marketing
869 46 Finance
575 52 Production
678 60 Testing
Now, when these two relations are joined on the common column "EMP_ID", then the
resultant relation will look like:
Employee ⋈ Department
24 | U n i t - I I I
EMP_ID EMP_NAME EMP_AGE EMP_CITY DEPT_ID DEPT_NAME
22 Denim 28 Mumbai 827 Sales
33 Alina 25 Delhi 438 Marketing
46 Stephan 30 Bangalore 869 Finance
52 Katherine 36 Mumbai 575 Production
60 Jack 40 Noida 678 Testing
Hence, the decomposition is Lossless join decomposition.
Dependency Preserving
o It is an important constraint of the database.
o In the dependency preservation, at least one decomposed table must satisfy every
dependency.
o If a relation R is decomposed into relation R1 and R2, then the dependencies of R either
must be a part of R1 or R2 or must be derivable from the combination of functional
dependencies of R1 and R2.
25 | U n i t - I I I
o For example, suppose there is a relation R (A, B, C, D) with functional dependency set (A-
>BC). The relational R is decomposed into R1(ABC) and R2(AD) which is dependency
preserving because FD A->BC is a part of relation R1(ABC).
Multivalued Dependency
o Multivalued dependency occurs when two attributes in a table are independent of
each other but, both depend on a third attribute.
o A multivalued dependency consists of at least two attributes that are dependent on a
third attribute that's why it always requires at least three attributes.
Example: Suppose there is a bike manufacturer company which produces two colors(white
and black) of each model every year.
BIKE_MODEL MANUF_YEAR COLOR
M2011 2008 White
M2001 2008 Black
M3001 2013 White
M3001 2013 Black
M4006 2017 White
M4006 2017 Black
Here columns COLOR and MANUF_YEAR are dependent on BIKE_MODEL and independent
of each other.
In this case, these two columns can be called as multivalued dependent on BIKE_MODEL.
The representation of these dependencies is shown below:
1. BIKE_MODEL → → MANUF_YEAR
2. BIKE_MODEL → → COLOR
26 | U n i t - I I I
This can be read as "BIKE_MODEL multidetermined MANUF_YEAR" and "BIKE_MODEL
multidetermined COLOR".
Inclusion Dependency
o Multivalued dependency and join dependency can be used to guide database design
although they both are less common than functional dependencies.
o Inclusion dependencies are quite common. They typically show little influence on
designing of the database.
o The inclusion dependency is a statement in which some columns of a relation are
contained in other columns.
o The example of inclusion dependency is a foreign key. In one relation, the referring
relation is contained in the primary key column(s) of the referenced relation.
o Suppose we have two relations R and S which was obtained by translating two entity
sets such that every R entity is also an S entity.
o Inclusion dependency would be happen if projecting R on its key attributes yields a
relation that is contained in the relation obtained by projecting S on its key
attributes.
o In inclusion dependency, we should not split groups of attributes that participate in
an inclusion dependency.
o In practice, most inclusion dependencies are key-based that is involved only keys.
OTHER DEPENDENCIES :
o There are two types of templates:
o tuple-generating templates and constraint-generating templates.
27 | U n i t - I I I
o A template consists of a number of hypothesis tuples that are meant to show an
example of the tuples that may appear in one or more relations. The other part of
the template is the template conclusion.
o For tuple-generating templates, the conclusion is a set of tuples that must also exist
in the relations if the hypothesis tuples are there. For constraint-generating
templates, the template conclusion is a condition that must hold on the hypothesis
tuples.
OTHER NORMAL FORMS :-
Domain Key Normal Forms(DKNF) :
o The process of normalization and the process of discovering undesirable
dependencies was carried through 5NF as a meaningful design activity, but it has
been possible to define stricter normal forms that take into account additional types
of dependencies and constraints.
o The idea behind domain-key normal form (DKNF) is to specify (theoretically, at least)
the "ultimate normal form" that takes into account all possible types of
dependencies and constraints.
o A relation is said to be in DKNF if all constraints and dependencies that should hold
on the relation can be enforced simply by enforcing the domain constraints and key
constraints on the relation. For a relation in DKNF, it becomes very straightforward to
enforce all database constraints by simply checking that each attribute value in a
tuple is of the appropriate domain and that every key constraint is enforced.