Chapter 4: Functional Dependency and Normalization
4.1. Functional Dependency (FD)
Before moving to the definition and application of normalization, it is important to have an
understanding of "functional dependency."
Data Dependency
The logical associations between data items that point the database designer in the direction of a
good database design are referred to as determinant or dependent relationships.
Two data items A and B are said to be in a determinant or dependent relationship if certain
values of data item B always appears with certain values of data item A. if the data item A is the
determinant data item and B the dependent data item then the direction of the association is from
A to B and not vice versa.
The essence of this idea is that if the existence of something, call it A, implies that B must exist
and have a certain value, and then we say that "B is functionally dependent on A." We also
often express this idea by saying that "A determines B," or that "B is a function of A," or that "A
functionally governs B." Often, the notions of functionality and functional dependency are
expressed briefly by the statement, "If A, then B." It is important to note that the value B must be
unique for a given value of A, i.e., any given value of A must imply just one and only one value
of B, in order for the relationship to qualify for the name "function."
(However, this does not necessarily prevent different values of A from implying the same value
of B.)
XY holds if whenever two tuples have the same value for X, they must have the same value
for Y
The notation is: AB which is read as; B is functionally dependent on A
In general, a functional dependency is a relationship among attributes. In relational databases,
we can have a determinant that governs one other attribute or several other attributes.
FDs are derived from the real-world constraints on the attributes
Example:
Since the type of Wine served depends on the type of Dinner, we say Wine is functionally
dependent on Dinner.
DinnerWine
Since both Wine type and Fork type are determined by the Dinner type, we say Wine is
functionally dependent on Dinner and Fork is functionally dependent on Dinner.
Dinner Wine
DinnerFork
Partial Dependency
If an attribute which is not a member of the primary key is dependent on some part of the
primary key (if we have composite primary key) then that attribute is partially functionally
dependent on the primary key.
Let {A, B} is the Primary Key and C is no key attribute.
Then if {A, B} C and BC or AC
Then C is partially functionally dependent on {A, B}
Full Dependency
If an attribute which is not a member of the primary key is not dependent on some part of the
primary key but the whole key (if we have composite primary key) then that attribute is fully
functionally dependent on the primary key.
Let {A, B} is the Primary Key and C is no key attribute
Then if {A, B}C and BC and AC do not hold (if B cannot determine C and A cannot
determine C)
Then C Fully functionally dependent on {A, B}
Transitive Dependency
In mathematics and logic, a transitive relationship is a relationship of the following form: "If A
implies B, and if also B implies C, then A implies C."
Example:
If Mr X is a Human, and if every Human is an Animal, then Mr X must be an Animal.
Generalized way of describing transitive dependency is that:
If A functionally governs B, AND
If B functionally governs C
THEN A functionally governs C
Provided that neither C nor B determines A i.e. (B / A and C / A)
In the normal notation:
{(AB) AND (BC)}=>AC provided that B / A and C / A
4.2. Normal Forms
A relational database is merely a collection of data, organized in a particular manner. As the
father of the relational database approach, Codd created a series of rules called normal forms
that help define that organization One of the best ways to determine what information should be
stored in a database is to clarify what questions will be asked of it and what data would be
included in the answers.
A large database defined as a single relation may result in data duplication. This repetition of
data may result in:
Making relations very large.
It isn't easy to maintain and update data as it would involve searching many records in
relation.
Wastage and poor utilization of disk space and resources.
The likelihood of errors and inconsistencies increases.
So to handle these problems, we should analyze and decompose the relations with redundant data
into smaller, simpler, and well-structured relations that are satisfy desirable properties.
Normalization is a process of decomposing the relations into relations with fewer attributes.
What is Normalization?
Normalization is the process of organizing the data in the database.
Normalization is used to minimize the redundancy from a relation or set of relations. It is
also used to eliminate undesirable characteristics like Insertion, Update, and Deletion
Anomalies.
Normalization divides the larger table into smaller and links them using relationships.
The normal form is used to reduce redundancy from the database table.
Why do we need Normalization?
The main reason for normalizing the relations is removing these anomalies. Failure to eliminate
anomalies leads to data redundancy and can cause data integrity and other problems as the
database grows. Normalization consists of a series of guidelines that helps to guide you in
creating a good database structure.
Database normalization is a series of steps followed to obtain a database design that allows for
consistent storage and efficient access of data in a relational database. These steps reduce data
redundancy and the risk of data becoming inconsistent.
Normalization is the process of identifying the logical associations between data items and
designing a database that will represent such associations but without suffering the update
anomalies which are;
Insertion Anomalies
Deletion Anomalies
Modification Anomalies
Normalization may reduce system performance since data will be cross referenced from many
tables. Thus denormalization is sometimes used to improve performance, at the cost of reduced
consistency guarantees.
Normalization normally is considered as good if it is lossless decomposition.
All the normalization rules will eventually remove the update anomalies that may exist during
data manipulation after the implementation. The update anomalies are;
The type of problems that could occur in insufficiently normalized table is called update
anomalies which includes;
(1) Insertion anomalies
An "insertion anomaly" is a failure to place information about a new database entry into all the
places in the database where information about that new entry needs to be stored. In a properly
normalized database, information about a new entry needs to be inserted into only one place in
the database; in an inadequately normalized database, information about a new entry may need to
be inserted into more than one place and, human fallibility being what it is, some of the needed
additional insertions may be missed.
(2) Deletion anomalies
A "deletion anomaly" is a failure to remove information about an existing database entry when it
is time to remove that entry. In a properly normalized database, information about an old, to-be-
gotten-rid-of entry needs to be deleted from only one place in the database; in an inadequately
normalized database, information about that old entry may need to be deleted from more than
one place, and, human fallibility being what it is, some of the needed additional deletions may be
missed.
(3) Modification anomalies
A modification of a database involves changing some value of the attribute of a table. In a
properly normalized database table, whatever information is modified by the user, the change
will be effected and used accordingly.
The purpose of normalization is to reduce the chances for anomalies to occur in a database.
Example of problems related with Anomalies
Deletion Anomalies:
If employee with ID 16 is deleted then ever information about skill C++ and the type of skill is
deleted from the database. Then we will not have any information about C++ and its skill type.
Insertion Anomalies:
What if we have a new employee with a skill called Pascal? We cannot decide whether Pascal is
allowed as a value for skill and we have no clue about the type of skill that Pascal should be
categorized as.
Modification Anomalies:
What if the address for Helico is changed fro Piazza to Mexico? We need to look for every
occurrence of Helico and change the value of School_Add from Piazza to Mexico, which is
prone to error.
Database-management system can work only with the information that we put explicitly into its
tables for a given database and into its rules for working with those tables, where such rules are
appropriate and possible.
Steps of Normalization:
We have various levels or steps in normalization called Normal Forms. The level of complexity,
strength of the rule and decomposition increases as we move from one lower level Normal Form
to the higher.
A table in a relational database is said to be in a certain normal form if it satisfies certain
constraints.
Normal form below represents a stronger condition than the previous one
Normalization towards a logical design consists of the following steps:
UnNormalized Form:
Identify all data elements
First Normal Form:
Find the key with which you can find all data
Second Normal Form:
Remove part-key dependencies. Make all data dependent on the whole key.
Third Normal Form
Remove non-key dependencies. Make all data dependent on nothing but the key.
For most practical purposes, databases are considered normalized if they adhere to third normal
form.
4.2.1. First Normal Form
A relation will be 1NF if it contains an atomic value.
It states that an attribute of a table cannot hold multiple values. It must hold only single-
valued attribute.
First normal form disallows the multi-valued attribute, composite attribute, and their
combinations.
Requires that all column values in a table are atomic (e.g., a number is an atomic value, while a
list or a set is not).
We have two ways of achieving this:
1. Putting each repeating group into a separate table and connecting them with a primary key-
foreign key relationship
2. Moving this repeating group to a new row by repeating the common attributes. If so then find
the key with which you can find all data
Definition: a table (relation) is in 1NF
If
There are no duplicated rows in the table. Unique identifier
Each cell is single-valued (i.e., there are no repeating groups).
Entries in a column (attribute, field) are of the same kind.
Example for First Normal form (1NF)
UNNORMALIZED
FIRST NORMAL FORM (1NF)
Remove all repeating groups. Distribute the multi-valued attributes into different rows and
identify a unique identifier for the relation so that is can be said is a relation in relational
database.
4.2.2. Second Normal Form
No partial dependency of a non key attribute on part of the primary key. This will result in a set
of relations with a level of Second Normal Form.
Any table that is in 1NF and has a single-attribute (i.e., a non-composite) primary key is
automatically in 2NF.
In the 2NF, relational must be in 1NF.
In the second normal form, all non-key attributes are fully functional dependent on the
primary key
Definition: a table (relation) is in 2NF If
It is in 1NF and
If all non-key attributes are dependent on the entire primary key.
i.e. no partial dependency.
Example for 2NF:
EMP_PROJ
Business rule: Whenever an employee participates in a project, he/she will be entitled for an
incentive.
This schema is in its 1NF since we don’t have any repeating groups or attributes with multi-
valued property. To convert it to a 2NF we need to remove all partial dependencies of non key
attributes on part of the primary key.
{EmpID, ProjNo}EmpName, ProjName, ProjLoc, ProjFund, ProjMangID, Incentive But in
addition to this we have the following dependencies
FD1: {EmpID}EmpName
FD2: {ProjNo}ProjName, ProjLoc, ProjFund, ProjMangID
FD3: {EmpID, ProjNo} Incentive
As we can see, some non key attributes are partially dependent on some part of the primary key.
This can be witnessed by analyzing the first two functional dependencies (FD1 and FD2). Thus,
each Functional Dependencies, with their dependent attributes should be moved to a new relation
where the Determinant will be the Primary Key for each.
4.2.3. Third Normal Form
Eliminate Columns Dependent on another non-Primary Key - If attributes do not contribute to a
description of the key, remove them to a separate table.
A relation will be in 3NF if it is in 2NF and not contain any transitive partial dependency.
3NF is used to reduce the data duplication. It is also used to achieve the data integrity.
If there is no transitive dependency for non-prime attributes, then the relation must be in
third normal form.
A relation is in third normal form if it holds atleast one of the following conditions for every
non-trivial function dependency X → Y.
1. X is a super key.
2. Y is a prime attribute, i.e., each element of Y is part of some candidate key.
This level avoids update and deletes anomalies.
Definition: a Table (Relation) is in 3NF If
It is in 2NF and
There are no transitive dependencies between a primary key and non-primary key
attributes.
Example for (3NF)
Assumption: Students of same batch (same year) live in one building or dormitory
This schema is in its 2NF since the primary key is a single attribute. Let’s take StudID, Year and
Dormitary and see the dependencies.
StudIDYear AND YearDormitary
And Year cannot determine StudID and Dormitary cannot determine StudID
Then transitively StudIDDormitary
To convert it to a 3NF we need to remove all transitive dependencies of non key attributes on
another non-key attribute.
The non-primary key attributes, dependent on each other will be moved to another table and
linked with the main table using Candidate Key- Foreign Key relationship.
Generally, even though there are other four additional levels of Normalization, a table is said to
be normalized if it reaches 3NF. A database with all tables in the 3NF is said to be Normalized
Database.
Mnemonic for remembering the rationale for normalization up to 3NF could be the following:
1. No Repeating or Redundancy: no repeating fields in the table.
2. The Fields Depend upon the Key: the table should solely depend on the key.
3. The Whole Key: no partial key dependency.
4.2.4. Boyce Codd Normal Form
Boyce-Codd Normal Form or BCNF is an extension to the third normal form, and is also known
as 3.5 Normal Form.
Rules for BCNF
For a table to satisfy the Boyce-Codd Normal Form, it should satisfy the following two
conditions:
1. It should be in the Third Normal Form.
2. And, for any dependency A → B, A should be a super key.
The second point sounds a bit tricky, right? In simple words, it means, that for a dependency A
→ B, A cannot be a non-prime attribute, if B is a prime attribute.
BCNF is the advance version of 3NF. It is stricter than 3NF.
A table is in BCNF if every functional dependency X → Y, X is the super key of the table.
For BCNF, the table should be in 3NF, and for every FD, LHS is super key.
Example: Let's assume there is a company where employees work in more than one department.
EMPLOYEE table:
EMP_ID EMP_COUNTRY EMP_DEPT DEPT_TYPE EMP_DEPT_NO
264 India Designing D394 283
264 India Testing D394 300
364 UK Stores D283 232
364 UK Developing D283 549
In the above table Functional dependencies are as follows:
EMP_ID → EMP_COUNTRY
EMP_DEPT → {DEPT_TYPE, EMP_DEPT_NO}
Candidate key: {EMP-ID, EMP-DEPT}
The table is not in BCNF because neither EMP_DEPT nor EMP_ID alone are keys.
To convert the given table into BCNF, we decompose it into three tables:
EMP_COUNTRY table:
EMP_ID EMP_COUNTRY
264 India
264 India
EMP_DEPT table:
EMP_DEPT DEPT_TYPE EMP_DEPT_NO
Designing D394 283
Testing D394 300
Stores D283 232
Developing D283 549
EMP_DEPT_MAPPING table:
EMP_ID EMP_DEPT
D394 283
D394 300
D283 232
D283 549
Functional dependencies:
EMP_ID → EMP_COUNTRY
EMP_DEPT → {DEPT_TYPE, EMP_DEPT_NO}
Candidate keys:
For the first table: EMP_ID
For the second table: EMP_DEPT
For the third table: {EMP_ID, EMP_DEPT}