Theory of Database
Systems
Lecture 10. The process of normalization
I.
Normalization
• Normalization is a technique for producing a set of
suitable relations that support the data
requirements of an enterprise.
Suitable set of relations
• Characteristics of a suitable set of relations include:
– the minimal number of attributes necessary to
support the data requirements of the enterprise;
– attributes with a close logical relationship are found in
the same relation;
– minimal redundancy with each attribute represented
only once with the important exception of attributes
that form all or part of foreign keys.
Benefits of suitable set of relations
• The benefits of using a database that has a
suitable set of relations is that the database will
be:
– easier for the user to access and maintain the data;
– take up minimal storage space on the computer.
How Normalization Supports
Database Design
• Normalization is a bottom-up approach to DB design that begins by
examining the relationships between attributes.
• However a top-down approach can also be used that begins by
identifying the main entities and relationships and uses normalization
as a validation technique.
The Process of Normalization
• Normalization is a formal technique for analyzing
a relation based on its primary key and the
functional dependencies between the attributes of
that relation.
• Often executed as a series of steps. Each step
corresponds to a specific normal form, which has
known properties.
Normalization
• Four most commonly used normal forms are first
(1NF), second (2NF) and third (3NF) normal
forms, and Boyce–Codd normal form (BCNF).
• Normalization is based on functional
dependencies among the attributes of a relation.
• A relation can be normalized to a specific form to
prevent possible occurrence of update anomalies.
The Process of Normalization
The relationship between the normal forms.
It shows that some 1NF relations are also in
2NF and some 2NF relations are also in 3NF,
an so on.
The Process of Normalization
Unnormalized Form (UNF)
• Before discussing first normal form, we initially
give a definition of the state prior to first normal
form.
• Unnormalized form is a table that contains one or
more repeating groups.
• To create an unnormalized table
– Transform the data from the information source
(e.g. form) into table format with columns and rows.
• In this format, the table is in unnormalized form
(UNF).
Repeating group
• A repeating group is an attribute, or group of
attributes, within a table that occurs with multiple
values for a single occurrence of the nominated
key attribute(s) of that table.
• Nominated key: refers to the attribute(s) that
uniquely identify each row within the
unnormalized table.
Example: Form
Collection of DreamHome leases.
In the example it is assumed that a client rents a given
property only once and cannot rent more than one
property at any one time.
UNF example
• Sample data is taken from two leases for two different
clients and is transferred into table format with rows and
columns.
• This is an unnormalized table.
ClientRental unnormalized table.
UNF example
• We identify the key attribute for the Clientrental
unnormalized table as clientNo.
• Next we identify the repeating group in the
unnormalized table:
Repeating Group = (propertyNo, pAddress, rentstart,
rentFinish, rent, ownerNo, ownerName)
• As a consequence, there are multiple values at
the intersection of certain rows and columns.
First Normal Form (1NF)
• A relation in which the intersection of each row
and column contains one and only one value.
UNF to 1NF
• To transform the unnormalized table to first
normal form we identify and remove repeating
groups within the table.
– Nominate an attribute or group of attributes to act
as the key for the unnormalized table.
– Identify the repeating group(s) in the unnormalized
table which repeats for the key attribute(s).
• There are two common approaches to removing
repeating groups from unnormalized tables.
Method 1
• We remove the repeating group by entering
appropriate data into the empty columns of rows
containing the repeating data (‘flattening’ the
table). We fill in the blanks by duplicating the
nonrepeating data.
• The resulting relation contains atomic values at
the intersection of each row and column, and is
therefore in 1NF.
• With this approach redundancy is introduced
into the resulting relation.
Method 1 example
• Remove the repeating group by entering the
appropriate client data into each row.
• The resulting relation ClientRental is in 1NF as there is
a single value at the intersection of each row and
column.
Method 1 example
• We identify the candidate keys for the
ClientRental relation as being composite keys:
– (clientNo, propertyNo)
– (clientNo, rentStart)
– (propertyNo, rentStart)
• We select (clientNo, propertyNo) as the primary
key.
• The relation contains data describing clients,
property rented, and property owners, which is
repeated several times. As a result, the relation
contains significant data redundancy.
Method 2
• We remove the repeating group by placing the
repeating data along with a copy of the original
key attribute(s) into a separate relation.
• A primary key is identified for the new relation.
• This approach produces relations in at least 1NF
with less redundancy.
Method 2 example
• Using the second approach, we remove the repeating group by
placing the repeating data along with a copy of the original key
attribute (clientNo) into a separate table, called
PropertyRentalOwner.
Method 2 example
• Then we identify a primary key for the new table
(clientNo, propertyNo).
• The format of the resulting 1NF relations are as follows:
Client (clientNo, CName)
PropertyRentalOwner (clientNo, propertyNo, pAddress,
rentStart, rentFinish, rent, ownerNo, oName)
• Both the Client and PropertyRentalOwner tables are in
1NF, but the PropertyRentalOwner table contains
significant redundancy.
Second Normal Form (2NF)
• Second normal form is based on the concept of
full functional dependency.
• Full functional dependency indicates that if
– A and B are attributes of a relation,
– B is fully functionally dependent on A if B is
functionally dependent on A but not on any proper
subset of A.
• A functional dependency A B is a full functional
dependency if removal of any attribute from A
results in the dependency not being sustained any
more.
Second Normal Form (2NF)
• A relation that is in 1NF and every non-primary-
key attribute is fully functionally dependent on the
primary key.
– Second normal form applies to relations with
composite keys (the primary key composed of two
or more attributes).
– A relation with a single attribute primary key is
automatically in at least 2NF.
1NF to 2NF
• Identify the primary key for the 1NF relation.
• Identify the functional dependencies in the
relation.
• If partial dependencies exist on the primary key
remove them by placing them in a new relation
along with a copy of their determinant.
Partial dependency
• A functional dependency A B is partially
dependent if there is some attribute that can be
removed from A and the dependency still holds.
2NF example
Consider the ClientRental relation.
• This ClientRental table is in 1NF. The primary key of the
table is (clientNo, propertyNo).
• In order to move this table to a 2NF solution, we must
identify and remove the partial dependencies from the
table.
Functional dependencies in
ClientRental relation
• The functional dependencies (fd) for the
ClientRental relation are as follows:
• The presence of partial dependencies show that
the table is not in 2NF.
– cName is partially dependent on the primary key, in
other words, on only the clientNo attribute.
– Property attributes are also partially dependent on
the primary key.
Transform the ClientRental relation
into 2NF
• To remove the partial dependencies, we create new
tables so that the non-primary-key columns are
removed, along with a copy of the part of the
primary key on which they are fully functionally
dependent.
• This results in the creation of three new relations
called Clioent, Rental, and PropertyOwner.
2NF relations derived from
ClientRental relation
• The three tables, Client, Rental and PropertyOwner are
in 2NF because every non-primary-key column is fully
functionally dependent on the primary key of the table.
Remarks
• Although 2NF relations have less redundancy than those
in 1NF, they may still suffer from update anomalies.
• E.g. if we want to update the name of on owner e.g.
Tony Diamond we have to update two tuples in the
PropertyOwner relation.
• If we update only one tuple and not the other the
database would be in an inconsistent state.
• This update anomaly is caused by a transitive
dependency.
• We need to remove such dependencies by progressing
to third normal form.