0% found this document useful (0 votes)
29 views24 pages

Understanding Functional Dependency and Normalization

The document discusses functional dependency and normalization in relational databases, explaining how functional dependencies define relationships between attributes and the importance of normalization in reducing data redundancy. It outlines various normal forms (1NF, 2NF, 3NF, BCNF) and their requirements, along with examples to illustrate the concepts. Additionally, it highlights the need for normalization to eliminate anomalies such as insertion, update, and deletion anomalies in database design.

Uploaded by

solaiprabha7
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
29 views24 pages

Understanding Functional Dependency and Normalization

The document discusses functional dependency and normalization in relational databases, explaining how functional dependencies define relationships between attributes and the importance of normalization in reducing data redundancy. It outlines various normal forms (1NF, 2NF, 3NF, BCNF) and their requirements, along with examples to illustrate the concepts. Additionally, it highlights the need for normalization to eliminate anomalies such as insertion, update, and deletion anomalies in database design.

Uploaded by

solaiprabha7
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd

NORMALIZATION

3.1 FUNCTIONAL DEPENDENCY


A Functional dependency is a property of the
information represented by a relation. It defines the most
commonly encountered type of relatedness property
between data items of a database.
It is constraint between two attributed or two sets of
attributes. An functional dependency is a property semantic or
meaning of the attributes in a [Link] semantic indicate
how attributes relate to one another and specify the functional
dependency between attributes.
Main Use:
Functional dependency is describe further a relation
schema R by specifying constraints on its attributes.
3.1.1 FUNCTIONAL DEPENDENCY DIAGRAM
In a functional dependency diagram is represented by
rectangles representing attributes and a heavy arrow showing
dependency.
The simplest Functional dependencies: FD: Y --> X. The
left-hand side of the functional dependency is sometimes
called determinant. The Right-hand side is called the
dependent. The determinant and dependent are both sets of
attributes.

Fig: Functional Dependency Diagram


For Example:
Let us consider a functional dependency of relation
R1:BUDGET
FD:{PROJECT} {PROJECT-
BUDGET}

It means that is the BUDGET relation, PROJECT_BUDGET


is functionally dependent on PROJECT, because each project
has one given budget value.
Thus once a project name is known, a unique value of
PROJECT-BUDGET applying functional dependency.
3.1 .2 FULL FUNCTIONAL DEPENDENCY(FFD)
Full functional dependency is used to indicate the
minimum set of attributes in a determinant of a functional
dependency.
The set of attributes X will be fully functionally
dependent on the set of attributes Y if the following
conditions are satisfied.
 X is functionally dependent on Y and
 X is not functionally dependent on any subset of Y.

Relation: ProjectCost Relation:


EmployeeProject
EmpID ProjectID Days
E088 001 320
E065 002 190
ProjectI
ProjectCost
D
002 5000
002 8000

The above relations states that:


Days are the number of days spent on the project.
FD: { EmpID, ProjectID, ProjectCost} { Days}
However, it is not fully functional dependent. Whereas the
subset {EmpID, ProjectID} can easily determine the {Days}
spent on the project by the employee.
This summarizes and gives our fully functional dependency:
FD: { EmpID, ProjectID} { Days}
3.1.3 ARMSTRONG’S AXIOMS IN FUNCTIONAL
DEPENDENCY
A good relational design such as non-redundant sets of
functional dependencies and complete sets or closure of
functional dependencies .
Non-redundancy and closures occur when new FDs can
be derived from existing FDs.
For Example:
If, X Y and
Y Z then it is also true that
X Z
This derivation, if a given value of X determines a unique
value of Y and this value of Y in turn determines a unique a
unique value of Z, the value of X will also determine this
value of Z. Conversely, it is possible for a set of FDs to
contain some redundant functional dependencies.
Armstrong’s axioms:
A table T and that all sets of attributes X,Y,Z are
contained in the heading of T. Then following are set of a
inference rules called Armstrong’s axioms.
It to derive one functional dependency from other functional
dependencies:
Rule 1: Reflexivity: If, Y⊆ X, then X Y.
Rule 2: Augmentation: If, X Y, then XZ YZ.
Rule 3: Transitivity: If, X Y and Y Z, then X
Z.
Rule 4: Self-determination: X X
Rule 5: Pseudo-transitivity: If X Y and YW Z,
then XW Z.
Rule 6: Union or additive: If X Z and X Y, then
X YZ.
Rule 7:Decomposition or Projective: If X YZ, then
X Y and X Z.
Rule 8: Composition : If X Y and Z W, then XZ
YW.
Rule 9: Self accumulation: If X YZ and Z W,
then X YZW.
3.2 NORMALIZATION
INTRODUCTION
Relational database tables derived from ER models or
from some other design method, suffer from serious problems
in terms of performance, integrity and maintainability. A large
database defined as a single table, results into a large amount
of redundant data.
Storing of large numbers of values of redundant nature
can result in lengthy search operations for just a small number
of target rows. It can result in long and expensive updates.
For Example:
Relation: STUDENT_INFO

STU_NAM COURSE_I PHONE_N


SUBJECT
E D O
Mobile
Abi CS-101 9841758596
Computing
Data
Communicatio
John BCA-201 8855446633
n&
Networking
Operating
Rajesh CS-102 9944557722
System
Software
John BCA-201 8855446633
Engineering
Mobile
Ragu BCA-205 9003344778
Computing
Abi CS-101 Data Mining 9841758596

The above table STUDENT_INFO is not a good design. For


example STU_NAME "Abi" and "John" have repetitive
and PHONE_NO information. This data redundancy or
repetition waste storage space and leads to then loss of data
integrity in the database.
A good database design with minimum redundancy, necessary
to represent the semantics of the database, minimizes the
storage needed to store a database.
Normalization is a process of decomposing a set of relation
with anomalies to produce smaller and well structured
relations that contain minimum or no redundancy.
Definition: Normalization
 Database normalization is the process of removing
redundant data from your tables to improve storage
efficiency, data integrity, and scalability.
 In the relational model, methods exist for quantifying
how efficient a database is. These classifications are
called normal forms (or NF), and there are algorithms for
converting a given database between them.
 Normalization generally involves splitting existing tables
into multiple ones, which must be re-joined or linked
each time a query is issued.
Need of Normalization
 Eliminates redundant data
 Reduces chances of data errors
 Reduces disk space
 Improve data integrity, scalability and data consistency.
Anomalies in DBMS
There are three types of anomalies that occur when the
database is not normalized. These are insertion, update and
deletion anomaly.
Example: Suppose a manufacturing company stores the
employee details in a table named employee that has four
attributes: emp_id for storing employee's id, emp_name for
storing employee's name, emp_address for storing
employee's address and emp_dept for storing the department
details in which the employee works.
At some point of time the table looks like this:
emp_ id emp_name emp_addres emp_ dept
s
201 Ram Delhi D001

201 Ram Delhi D002

123 Sunil Mumbai D890

177 Iniyha Chennai D900

177 Iniyha Chennai D004

The above table is not normalized. we will see the problems


that we face when a table is not normalized.
Update anomaly: In the above table we have two rows for
employee Ram as he belongs to two departments of the
company. If we want to update the address of Ram then we
have to update the same in two rows or the data will become
inconsistent.
If somehow, the correct address gets updated in one
department but not in other then as per the database, Ram
would be having two different addresses, which is not correct
and would lead to inconsistent data.
Insert anomaly: Suppose, a new employee joins the
company, who is under training and currently not assigned to
any department then we would not be able to insert the data
into the table if emp_dept field doesn't allow nulls.
Delete anomaly: Suppose, if at a point of time the company
closes the department D890 then deleting the rows that are
having emp_dept as D890 would also delete the information
of employee Sunil since she is assigned only to this
department.
To overcome these anomalies we need to normalize the data.
The most commonly used normal forms:

3.2.1 TYPES OF NORMALIZATION:

\
Fig: Types of Normalization
3.2.1 First Normal Form(1NF)
3.2.2 Second Normal Form(2NF)
3.2.3 Third Normal Form(3NF)
3.2.4 BCNF
3.2.5 Fourth Normal Form(4NF)
3.2.6 Fifth Normal Form(5NF)
3.2.1 FIRST NORMAL FORM(1NF)
A relation R is in first normal (1 NF) if and only if it
does not contain any composite or multi valued attributes
or their combinations.
Table Name: Employee
Example: Suppose a company wants to store the names and
contact details of its employees. It creates a table that look
like this:
emp_ id emp_name emp_addres emp_ mobile
s
101 Ragu Delhi 8912312390

102 John Mumbai 8812121212


9900012222
103 Babu Chennai 7778881212

104 Ashok Bangalore 9990000123


8123450987
Two employee (John & Ashok) are having two mobile
numbers so the company stored them in the same field as you
can see in the table above.
This table is not in 1NF as the rule says "each attribute
of a table must have atomic(single) values", the emp_mobile
values for employees John & Ashok violates that rule.
To make the table complies with 1NF we should have the data
like this:
emp_ id emp_name emp_address emp_mobile

101 Ragu Delhi 8912312390

102 John Mumbai 8812121212

102 John Mumbai 9900012222

103 Babu Chennai 7778881212


104 Ashok Bangalore 9990000123

104 Ashok Bangalore 8123450987

3.2.2 SECOND NORMAL FORM(2NF)


For a table to be in the Second Normal Form,
 It should be in the First Normal Form.
 And, It should not have partial
Dependency.
Table Name: School
Example: Suppose a school wants to store the data of
teachers and the subjects they teach. They create a table that
looks like this: Since a teacher can teach more than one
subjects, the table can have multiple rows for a same teacher.
teacher_id Subject teacher_ age

111 Maths 38

111 Physics 38

222 Computer science 38

333 Physics 40

333 Chemistry 40

Candidate Keys: {teacher_id, subject}


Non prime attribute: teacher_age
The table is in 1NF because each attribute has atomic
values. However, it is not in 2NF because non prime
attribute teacher_age is dependent on teacher_id alone
which is a proper subset of candidate key. This violates the
rule for 2NF as the rule says "no non-prime attribute is
dependent on the proper subset of any candidate key of the
table".
To make the table complies with 2NF we can break it in two
tables like this:
teacher_details table:
teacher_id teacher_age

111 38

222 38

333 40

teacher_subject table:
teacher_i Subject
d

111 Maths

111 Physics

222 Computer science

333 Physics

333 Chemistry
Now the tables comply with Second normal form(2NF).
3.2. 3 THIRD NORMAL FORM(3NF)
A table is said to be in the Third Normal Form when,
 It should be in the Second Normal Form.
 And, it doesn't have "Transitive Dependency".
Example:
Suppose a company wants to store the complete address of
each employee, they create a table named employee_details
that looks like this:
employee_details table:
emp_i emp_na emp_zi emp_sta emp_ci emp_distri
d me p te ty ct

1001 John 382005 UP Agra DayalBagh

1002 Anu 322008 TN Chenna M-city


i
1006 Geetha 382007 TN Chenna Urrapakka
i m
1101 Priya 392008 UK Pauri Bhagwan

1201 Sai 322999 MP Gwalior Ratan

Super Keys: {emp_id}, {emp_id, emp_name}, {emp_id,


emp_name, emp_zip}...so on
Candidate Keys: {emp_id}
Non- prime attributes: all attributes except emp_id are non-
prime as they are not part of any candidate keys.
Here, emp_state, emp_city& emp_district dependent on
emp_zip. And, emp_zip is dependent on emp_id that makes
non-prime attributes (emp_state, emp_city & emp_district)
transitively dependent on super key (emp_id). This violates
the rule of 3NF.
To make this table complies with 3NF we have to break the
table into two tables to remove the transitive dependency:
employee table:
emp_id emp_name emp_zip

1001 John 282005

1002 Anu 222008

1006 Geetha 282007

1101 Priya 292008

1201 Sai 222999

employee_zip table:
emp_zip emp_state emp_city emp_district

382005 UP Agra DayalBagh

322008 TN Chennai M-City

382007 TN Chennai Urrapakkam


392008 UK Pauri Bhagwan

322999 MP Gwalior Ratan

3.2.4 BOYCE AND CODD NORMAL FORM (BCNF)


Boyce and Codd Normal Form is a higher version of the
Third Normal Form. This form deals with certain type of
anomaly that is not handled by 3NF. A 3NF table which does
not have multiple overlapping candidate keys is said to be in
BCNF. For a table to be in BCNF, following conditions must
be satisfied:
 It must be in 3rd Normal Form
 and, for each functional dependency (X→Y), X
should be a super key.
Example: Suppose there is a company where in employee
work in more than one department. They store the data like
this:
emp_i emp_national emp_de dept_ty dept_no_of_e
d ity pt pe mp

1001 Indian Producti D001 300


on and
planning
1001 Indian Stores D001 350

1002 American Design D134 200


and
technical
support
1002 American Purchasi D134 800
ng
departme
nt
Functional dependencies in the table above:
 emp_id→ emp_nationality
 emp_dept→ {dept_type, dept_no_of_emp}
Candidate key: {emp_id, emp_dept}
 The table is not in BCNF as neither emp_id nor
emp_dept alone are keys.
 To make the table comply with BCNF we can break the
table in three tables like this:
emp_nationality table:
emp_id emp_nationality

1001 Indian

1002 American

emp_dept table:
emp_dept dept_type dept_no_of_emp

Production and planning D001 300

Stores D001 350

Design and technical D134 200


support

Purchasing department D134 800


emp_dept_mapping table:
emp_id emp_dept

1001 Production and planning

1001 Stores

1002 Design and technical support

1002 Purchasing department

Function dependencies:
 emp_id → emp_ nationality
 emp_dept→ {dept_type, dept_no_of_emp}
Candidate key:
 For first table: emp_id
 For second table: emp_dept
 For Third table: {emp_id, emp_dept}
This is now in BCNF as in both the functional dependencies
left side part is a key.

3.2.5 COMPARISION BETWEEN 3NF AND BCNF


3NF BCNF
It proposed by Edger [Link] It Proposed [Link]
and Edgar [Link]
jointly proposed
The database is said to be in 2NF if Database is said to be is
and only if, in BCNF if and only if:
 It must be 1NF AND  It is already is in

 There is no partial 2NF and


dependency  Every determinant

is Candidate key
It concentates on the primary key It concentrates on all
candidate keys
Redundancy is high Redundancy is low
It may preserve all dependencies It may not preserve all
Functional dependency
A normal form that is used in A Normal form used in
normalizing a database design to database normalization.
reduce the deplication of data and Which is a slifhtly
ensure that the entity is 2NF and stronger version of the
all the attributes in a table are 3NF.
determined only by the candidate
keys of that relaton and not by any
non-prime attributes.
The table should be in 2NF, and The prime attributes of
there shouldn't be any transitive the table should not
dependencies to satisfy 3NF. depend on the non-
prime attributes of the
table to satisfy BCNF.
Loss less join decomposition can Sometimes Loss-less
be achived join decomposition
cannot be achieved.

3.2.6 FOURTH NORMAL FORM (4NF)


Tables cannot have multi-valued dependencies on a Primary
key.
For a table to satisfy the Fourth Normal Form, it should
satisfy the following two conditions:
[Link] should be in the Boyce-Codd Normal Form.
2. And, the table should not have any Multi-valued
Dependency.
What is Multi-valued Dependency?
A table is said to have multi-valued dependency, if the
following conditions are true,
[Link] a dependency A → B, if for a single value of A,
multiple value of B exists, then the table may have multi-
valued dependency.
2. Also, a table should have at-least 3 columns for it to
have a multi-valued dependency.
3. And, for a relation R(A,B,C), if there is a multi-valued
dependency between, A and B, then B and C should be
independent of each other. If all these conditions are true for
any relation(table), it is said to have multi-valued dependency.
For Example:
Below we have a college enrolment table with
columns S_ID , COURSE & HOBBY.
S_ID COURSE HOBBY
1 .NET Cricket
1 RDBMS Hockey
2 C# Cricket
2 PHP Hockey
As you can see in the table above, student with S_ID 1
has opted for two courses, .NET & RDBMS and has two
hobbies, Cricket and Hockey.
Well the two records for student with S_ID 1 , will
give rise to two more records, as shown below, because for
one student, two hobbies exists, hence along with both the
courses, these hobbies should be specified.
S_ID COURSE HOBBY
1 .NET Cricket
1 RDBMS Hockey
1 .NET Cricket
1 RDBMS Hockey

And, in the table above, there is no relationship between the


columns COURSE and HOBBY . They are independent of
each other.
So there is multi-value dependency, which leads to un-
necessary repetition of data and other anomalies as well.
How to satisfy 4th Normal Form?
To make the above relation satisfy the 4th normal form, we
can decompose the table into 2 tables.

For Example:
Table Name: Course
S_ID COURSE
1 .NET
1 RDBMS
2 C#
2 PHP
Table Name: Course
S_ID HOBBY
1 Cricket
1 Hockey
2 Cricket
2 Hockey
Now this relation satisfies the fourth normal form.
A table can also have functional dependency along with
multi-valued dependency. In that case, the functionally
dependent columns are moved in a separate table and the
multi-valued dependent columns are moved to separate
tables.

3.2.7 FIFTH NORMAL FORM (5NF)


For a table to satisfy the Fifth Normal Form, it should satisfy
the following two conditions:
[Link]'s in 4NF
2. If we can decompose table further to eliminate
redundancy and anomaly, and when we re-join the
decomposed tables by means of candidate keys, we
should not be losing the original data or any new record
set should not arise.
In simple words, joining two or more decomposed table
should not lose records nor create new records.
For Example:
Note: Please consider that Semester 1 has Computer
Science, Mathematics and Physics and Semester 2 has only
Mathematics in its academic year.
Table Name: Course
SUBJECT LECTURER CLASS
Computer Ragu Semester 1
Science
Computer Rose Semester 1
Science
Mathematics Rose Semester 1
Mathematics John Semester 2
Physics Arun Semester 1

In above table, Rose takes both Computer Science and


Mathematics class for Semester 1, but she does not take
Mathematics class for Semester 2. In this case, combination
of all these 3 fields is required to identify a valid data.
Imagine we want to add a new class - Semester3 but do not
know which Subject and who will be taking that subject.
We would be simply inserting a new entry with Class as
Semester3 and leaving LECTURER and SUBJECT as
NULL. As we discussed above, it's not a good to have such
entries. Moreover, all the three columns together act as a
primary key, we cannot leave other two columns blank.
Hence we have to decompose the table in such a way that it
satisfies all the rules till 4NF and when join them by using
keys, it should yield correct record. Here, we can represent
each lecturer's Subject area and their classes in a better way.
We can divide above table into three - (SUBJECT,
LECTURER), (LECTURER, CLASS), (SUBJECT,
CLASS)

SUBJECT LECTURER
Computer Ragu
Science
Computer Rose
Science
Mathematics Rose
Mathematics John
Physics Arun

CLASS LECTURER
Semester Ragu
1
Semester Rose
1
Semester Rose
1
Semester John
2
Semester Arun
1

SUBJECT CLASS
Computer Semester 1
Science
Computer Semester 1
Science
Mathematics Semester 1
Mathematics Semester 2
Physics Semester 1

Now, each of combinations is in three different tables. If we


need to identify who is teaching which subject to which
semester, we need join the keys of each table and get the
result.
For example: who teaches Mathematics to Semester 1, we
would be selecting Mathematics and Semester1 from table 3
above, join with table1 using Subject to filter out the lecturer
names.
Then join with table2 using Lecturer to get correct lecturer
name. That is we joined key columns of each table to get
the correct data. Hence there is no lose or new data -
satisfying 5NF condition.
COMPARISION BETWEEN 3NF AND BCNF

3NF BCNF
It proposed byEdger [Link] It Proposed [Link]
and Edgar [Link]
jointly proposed
The database is said to be in 2NF if Database is said to be is
and only if, in BCNF if and only if:
 It must be 1NF AND  It is already is in

 There is no partial 2NF and


dependency  Every determinant

is Candidate key
It concentates on the primary key It concentrates on all
candidate keys
Redundancy is high Redundancy is low
It may preserve all dependencies It may not preserve all
Functional dependency
A normal form that is used in A Normal form used in
normalizing a database design to database normalization.
reduce the deplication of data and Which is a slifhtly
ensure that the entity is 2NF and stronger version of the
all the attributes in a table are 3NF.
determined only by the candidate
keys of that relaton and not by any
non-prime attributes.
The table should be in 2NF, and The prime attributes of
there shouldn't be any transitive the table should not
dependencies to satisfy 3NF. depend on the non-
prime attributes of the
table to satisfy BCNF.
Loss less join decomposition can Sometimes Loss-less
be achived join decomposition
cannot be achieved.

You might also like