0% found this document useful (0 votes)
19 views11 pages

Functional Dependency and Normalization Guide

Chapter Four discusses functional dependency and normalization in database design. It explains how functional dependencies indicate relationships between attributes and outlines the normalization process to eliminate data redundancy through various normal forms, specifically focusing on the first three normal forms (1NF, 2NF, 3NF). The chapter emphasizes the importance of normalization in preventing anomalies that can corrupt databases.

Uploaded by

badanejarso063
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views11 pages

Functional Dependency and Normalization Guide

Chapter Four discusses functional dependency and normalization in database design. It explains how functional dependencies indicate relationships between attributes and outlines the normalization process to eliminate data redundancy through various normal forms, specifically focusing on the first three normal forms (1NF, 2NF, 3NF). The chapter emphasizes the importance of normalization in preventing anomalies that can corrupt databases.

Uploaded by

badanejarso063
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Chapter Four

4: Functional Dependency and Normalization


4.1. Functional Dependency
A functional dependency is a particular relationship between two attributes. If a particular value
of one attribute (A) in a relation uniquely determines the value of another attribute (B) in the same
relation, then there is a functional dependency between attributes A and B.
This means that if we know the value of A, then we can determine a unique value for B. In this
case, B is functionally dependent on A. A functional dependency can be expressed using an arrow
i.e. this one can be shown as A → B. Or, in other words, A determines B.
EMPLOYEE1
Name Department Salary Date_Completed
Emp_ID Course
100 Sara Negash Information 100000 C++ 30/11/2003
Technology
100 Sara Negash Information 100000 Report 15/10/2003
Technology Writing
101 Andy King Administration 100000 Report 20/6/2003
Writing
102 Terri Finance 90000 Report 20/6/2003
O'Sullivan Writing
102 Terri Finance 90000 Investments 15/7/2003
O'Sullivan
103 Tekle Information 88000 Investments 15/7/2003
Haimanot Technology
105 Terhas Girma Marketing 120000 Surveys 30/8/2003

In the EMPLOYEE example, shown in Error! Reference source not found. above, if we take a
particular Emp_ID value, then we know that there is only one possible value for the Name attribute.
For example, the Emp_ID value 100 always has a Name value of 'Sara Negash'. We can express
this as:
Emp_ID → Name
A given Emp_ID value has only one corresponding Name value.

An attribute can be functionally dependent on two or more attributes.


Take for example, the EMP_COURSE relation shown in Figure 2. In this relation, the value of the
Date Completed attribute cannot be determined by the Emptied alone, nor can it be determined by

1
the Course value alone – because the Date Completed is a characteristic of an Employee taking a
Course. The Date Completed depends on the combination of the Emptied and Course values.
Note that the instances or rows in a relation (i.e. the sample data) do not prove the existence of a
functional dependency. Knowledge of the system, obtained through the requirements analysis, is
needed to understand the data and thus to identify functional dependencies.
Note also that a functional dependency A→ B does not imply that the value of one attribute can
be computed from the value of the other – it means that there can be only one value of B for each
value of A. In addition, the reverse is not always true – i.e. there is not always only one value of
A for each value of B e.g. there can be two Employees with the same Name, and they would have
different Emp ID values.
4.2. Normal Forms
Normalization
 Normalization is a process used to convert complex data structures into simple, stable data
structures, that do not have data redundancies. There are well-accepted principles and rules
that are used to normalise a data model. The process involves ensuring that the data
structures are in:
• First normal form
And then putting them into
• Second normal form
And then into
• Third normal form.
 There are higher normal forms (fourth and fifth) but in practice these are not always used
and we will not cover them in this course – as this aims to provide a good introduction to
normalisation.
 For each normal form, there are additional constraints on the attributes and data in the
relation that must be fulfilled.
 The underlying ideas in normalization are simple enough. Through normalization we want
to design for our relational database a set of files that (1) contain all the data necessary for
the purposes that the database is to serve, (2) have as little redundancy as possible, (3)
accommodate multiple values for types of data that require them, (4) permit efficient
updates of the data in the database, and (5) avoid the danger of losing data unknowingly.
 The primary reason for normalizing databases to at least the level of the 3rd Normal Form
(the levels are explained below) is that normalization is a potent weapon against the

2
possible corruption of databases stemming from what are called "insertion anomalies,"
"deletion anomalies," and "update anomalies." These types of error can creep into
databases that are insufficiently normalized.
 An "insertion anomaly" is a failure to place information about a new database entry into
all the places in the database where information about that new entry needs to be stored. In
a properly normalized database, information about a new entry needs to be inserted into
only one place in the database; in an inadequately normalized database, information about
a new entry may need to be inserted into more than one place, and, human fallibility being
what it is, some of the needed additional insertions may be missed
 .A "deletion anomaly" is a failure to remove information about an existing database entry
when it is time to remove that entry. In a properly normalized database, information about
an old, to-be-gotten-rid-of entry needs to be deleted from only one place in the database;
in an inadequately normalized database, information about that old entry may need to be
deleted from more than one place, and, human fallibility being what it is, some of the
needed additional deletions may be missed.
 An update of a database involves modifications that may be additions, deletions, or both.
Thus "update anomalies" can be either of the kinds of anomalies discussed above.
 All three kinds of anomalies are highly undesirable, since their occurrence constitutes
corruption of the database. Properly normalized databases are much less susceptible to
corruption than are un normalized databases.
 Normalization can be viewed as a series of steps (i.e., levels) designed, one after another,
to deal with ways in which tables can be "too complicated for their own good". The purpose
of normalization is to reduce the chances for anomalies to occur in a database.
 The definitions of the various levels of normalization illustrate complications to be
eliminated in order to reduce the chances of anomalies.
 At all levels and in every case of a table with a complication, the resolution of the problem
turns out to be the establishment of two or more simpler tables which, as a group, contain
the same information as the original table but which, because of their simpler individual
structures, lack the complication.

4.2.1. First Normal Form


1st Normal Form (1NF)
Def: A table (relation) is in 1NF if

3
1. There are no duplicated rows in the table.
2. Each cell is single-valued (i.e., there are no repeating groups or arrays).
3. Entries in a column (attribute, field) are of the same kind.
Solution:
 Eliminate repeating groups in individual tables.
 Create a separate table for each set of related data.
 Identify each set of related data with a primary key.
Note: The order of the rows is immaterial; the order of the columns is immaterial.
Note: The requirement that there be no duplicated rows in the table means that the table has a key
(although the key might be made up of more than one column—even, possibly, of all the columns).
Example: First Normal Form (1NF)
First Normal Form (1NF)
Consider the example of the EMPLOYEE relation – now let us say that employees complete
training Courses. This information might be stored in a relation named EMPLOYEE, that looks
like the table below.
Emp_ID Name Department Salary Courses
100 Sara Negash Information 100000
Technology Course Date Completed
C++ 30/11/2003
Report 15/10/2003
Writing

101 Andy King Administration 100000


Course Date Completed
Report 20/6/2003
Writing

102 Terri Finance 90000


O'Sullivan Course Date Completed
Report 20/6/2003
Writing
Investments 15/7/2003

103 Tekle Information 88000


Haimanot Technology Course Date Completed
Investments 15/7/2003

4
105 Terhas Girma Marketing 120000
Course Date Completed
Surveys 30/8/2003
Survey 10/9/2003
Analysis

This relation is not in any normal form because a normal-form relation must have only simple
values in each row-column intersection. A simple value is one single value that does not have
further components.
The values in the Courses attribute are made up of one or more rows in another relation. This is
not a simple value.
To make the relation into a first normal form relation, we must make each value in each column
be a simple value. To do this, we take each row in the non-normalised relation and look at the
relation in the non-simple column. For each relation row in the non-simple column, we make a
row in a new relation.
So, for example, the row for Emp_ID 100 becomes two rows:
Emp_ID Name Department Salary Course Date_Comleted
100 Sara Negash Information 100000 C++ 30/11/2003
Technology
100 Sara Negash Information 100000 Report Writing 15/10/2003
Technology
If we do the same for each row in the relation in Error! Reference source not found. above, the
resulting relation will look like the one shown in Figure 1 below. All the values in each column
are simple values – so this relation is in first normal form.
EMPLOYEE1
Emp_ID Name Department Salary Course Date_Completed
100 Sara Negash Information 100000 C++ 30/11/2003
Technology
100 Sara Negash Information 100000 Report 15/10/2003
Technology Writing
101 Andy King Administration 100000 Report 20/6/2003
Writing
102 Terri 90000 Report 20/6/2003
O'Sullivan Writing
102 Terri Finance 90000 Investments 15/7/2003
O'Sullivan
103 Tekle Information 88000 Investments 15/7/2003
Haimanot Technology
105 Terhas Girma Marketing Finance 120000 Surveys 30/8/2003

5
105 Terhas Girma Marketing 120000 Survey 10/9/2003
Analysis
Figure 1 – EMPLOYEE1 relation with sample data – in first normal form
It is always possible to normalise a non-normal relation in this way.
In table EMPLOYEE2, each row in this table is unique for the combination of Emp_ID and Course.
However, for some employees, the Emp_ID, Department and Salary values appear in more than
one row – for Emp_ID 100, 102 and 105. So, if the salary for Emp_ID 100 changes, it needs to be
recorded in 2 rows. Hence, there is still redundant data in this relation.
Moving onto the higher normal forms will remove the data redundancy.
However, we can see at this stage that the reason there is redundant data in this relation is that it
contains data about two different entities – Employee and Course. The principles of normalisation
can be used to divide the relation into two relations – EMPLOYEE and EMP COURSE.
The EMPLOYEE relation is as in Error! Reference source not found.. The other is shown below
– EMP_COURSE. In the EMP_COURSE relation, the primary key is a composite of the Emp_ID
and the Course.
EMP COURSE
Emp_ID Course Date_Completed
100 C++ 30/11/2003
100 Report Writing 15/10/2003
101 Report Writing 20/6/2003
102 Report Writing 20/6/2003
102 Investments 15/7/2003
103 Investments 15/7/2003
105 Surveys 30/8/2003
106 Survey Analysis 10/9/2003
Figure 2 – EMP_COURSE relation, showing the Course and Date_Completed for each
Employee-Course combination
4.2.2. Second Normal Form
2nd Normal Form (2NF)
Def: A table is in 2NF if it is in 1NF and if all non-key attributes are dependent on the entire key
(Functional dependency: a particular relationship between two attributes. For any relation R,
attribute B is functionally dependent on attribute A if, for every valid instance of A, that value of
B. The functional dependence of B on A is represented as A B.)
Solution:
 Create separate tables for sets of values that apply to multiple records.
 Relate these tables with a foreign key.

6
 Records should not depend on anything other than a table's primary key (a compound
key, if necessary).

Note: Since a partial dependency occurs when a non-key attribute is dependent on only a part of
the (composite) key, the definition of 2NF is sometimes phrased as, "A table is in 2NF if it is in
1NF and if it has no partial dependencies."
Example: Second Normal Form (2NF
Second Normal Form (2NF)
A relation is in second normal form (2NF) if every non-primary-key attribute is functionally
dependent on the whole primary key. Thus no non-primary-key attribute is functionally dependent
on part, but not all, of the primary key.
The relation EMPLOYEE1, in Figure 1, is in 1NF but is not in 2NF. This relation can be written
as follows, showing that the primary key consists of Emp_ID and Course.
EMPLOYEE1(Emp_ID,Name,Department,Salary,Course,Date_Completed)
The functional dependencies here are:
Emp_ID → Name,Department,Salary
Emp_ID,Course → Date_Completed
Name, Department and Salary are functionally dependent on Emp_ID, which is part of the primary
key. So, these are non-primary-key attributes that are functionally dependent on part of the primary
key.
We can say that second normal form is satisfied if any of the following apply
1. The primary key consists of only one attribute (for example, the attribute Emp_ID in the
EMPLOYEE relation shown in Error! Reference source not found.)
2. No non-primary-key attributes exist in the relation.
3. Every non-primary-key attribute is functionally dependent on the full set of primary key
attributes.
To convert a relation to second normal form, we need to remove the functional dependencies that
violate the rules. To do this, the relation needs to be decomposed into new relations.
The new relations are based on the attributes that determine other attributes – also called the
determinants – the determinants are the primary keys of the new relations.
So, if we take the functional dependencies, the new relations are as follows – primary keys are
underlined:
EMPLOYEE (Emp_ID, Name, Department, Salary)

7
EMP_COURSE (Emp_ID, Course, Date_Completed)

Another way to look at this is to say that we take the functional dependency that violated the 2NF
constraint i.e.
Emp_ID → Name,Department,Salary
and make a new relation from its attributes - EMPLOYEE. We then remove the attributes on the
right-hand-side of the dependency from the original relation to make the second new relation i.e.
EMP_COURSE.
The relations EMPLOYEE and EMP_COURSE are now in second normal form because all non-
primary-key attributes are functionally dependent on the whole primary key.
4.2.3. Third Normal Form
3rd Normal Form (3NF)
Def: A table is in 3NF if it is in 2NF and if it has no transitive dependencies.
Transitive Dependency A type of functional dependency where an attribute is functionally
dependent on an attribute other than the primary key. Thus its value is only indirectly determined
by the primary key
Solution:
 Eliminate fields that do not depend on the key.
 Values in a record that are not part of that record's key do not belong in the table. In
general, any time the contents of a group of fields may apply to more than a single
record in the table, consider placing those fields in a separate table.
For example, in an Employee Recruitment table, a candidate's university name and address may
be included. But you need a complete list of universities for group mailings. If university
information is stored in the Candidates table, there is no way to list universities with no current
candidates. Create a separate Universities table and link it to the Candidates table with a university
code key.

Example: Third Normal Form (3NF)

Third Normal Form (3NF)


Relations in second normal form can still contain data redundancies. To eliminate these
redundancies, further constraints must be satisfied.
A relation is in third normal form (3NF) if it is in second normal form and there are not functional
dependencies between two (or more) non-primary-key attributes.

8
A functional dependency between non-primary-key attributes is also called a transitive
dependency.
The EMPLOYEE and EMP_COURSE relations formed above are already in 3NF because there
are no functional dependencies between the non-primary-key attributes.
Take for example, a relation named VEHICLE, where the Registration_No is the primary key:
VEHICLE (Registration_No, Owner, Model, Manufacturer, Engine_Size)
This relation is in 2NF – because all the Registration_No uniquely determines the value of the
Owner, Model, Manufacturer and Engine_Size because a registration number is for one vehicle
only, and a vehicle can have only one owner. Some sample data for this relation is shown in Figure
3 below.
Owner Model Manufacturer Engine_Size
Registration_No
10234 TDA Land Cruiser I Toyota 2.5
10545 REST Land Cruiser II Toyota 2.9
45454 TDA Jeep Nissan 2.5
46765 REST Land Cruiser I Toyota 2.5
54098 Mekelle University Jeep Nissan 2.5
Figure 3 – VEHICLE relation – in 2NF

However, there are functional dependences between some of the non-primary-key attributes.
Assuming that every Model name is unique (i.e. different Manufacturers do not use the same model
names), the Manufacturer and the Engine_Size are dependent on the Model i.e.
Model→ Manufacturer, Engine_Size
This means that some facts are stored more than once – for example, the Manufacturer and
Engine_Size for a Land Cruiser I are stored twice in the sample data shown above. In fact, the
Engine_Size and Manufacturer for any given Model could be stored more than once in the table
above, if the same Model is owned by more than one person.
Because of this functional dependency, the VEHICLE relation is not in 3NF. To convert the
relation to 3NF, we need to remove this functional dependency. This is done in a similar way to
the way in which we made a 1NF relation into a 2NF relation - by creating a new relation using
the attributes in the dependency, with the determinant attribute(s) being the primary key of the new
relation:
VEHICLE1 (Model, Manufacturer, Engine_Size)

We then remove the dependent attributes from the original relation to create another new relation:

9
REGISTRATION (Registration_No, Owner, Model)
Both the VEHICLE1 and REGISTRATION relations are in 3NF because they are in 2NF (every
non-primay-key attribute is functionally dependent on the whole primary key) and because there
are no dependencies between non-primary-key attributes).
Also, the attributes in each relation are facts only about the primary key of that relation.

4.2.4 Boyce–Codd Normal Form (BCNF)

Boyce–Codd Normal Form (BCNF) is a stronger version of the Third Normal Form (3NF).
A relation R is in BCNF if, for every non-trivial functional dependency (X → Y), the
determinant X is a superkey of the relation.

Formally:
A relation R is in BCNF if for every FD X → Y,
→ X is a superkey of R.

Purpose

BCNF eliminates all redundancy caused by functional dependencies.


It fixes anomalies that can still exist in a 3NF relation — especially when:

• A non-prime attribute determines part of a key.


• There are overlapping candidate keys or complex dependencies.

Key Concepts

• Superkey: An attribute or set of attributes that uniquely identifies a tuple in a relation.


• Violation of BCNF: Happens when a non-superkey attribute (or group) determines
another attribute.
• BCNF ensures: No dependency exists where a non-superkey determines a key or part of
a key

10
Example

Relation:
R(StudentID, Course, Instructor)

Functional Dependencies:

1. StudentID, Course → Instructor


2. Instructor → Course

Candidate Key: (StudentID, Course)

Check BCNF condition:

• FD 1: Left side (StudentID, Course) is a key → OK


• FD 2: Left side (Instructor) is not a key → violates BCNF

Decomposition to achieve BCNF:

1. R1(Instructor, Course)
2. R2(StudentID, Instructor)

Now, both resulting relations are in BCNF.

Comparison: 3NF vs BCNF

Feature Third Normal Form (3NF) Boyce–Codd Normal Form


(BCNF)
Condition For every FD X → Y, either X is a For every FD X → Y, X must be
superkey, or Y is a prime attribute a superkey
Redundancy Some redundancy may remain Redundancy removed more
strictly
Complexity Easier to achieve More restrictive; sometimes
requires decomposition
Example of When a non-key attribute determines Any FD where determinant isn’t
Violation another attribute a superkey

11

You might also like