Chapter 7
Normalization
What is an Anomaly in DBMS?
An anomaly in DBMS means an error or inconsistency that occurs when data is stored in an
unorganized or redundant manner — typically in un-normalized tables.
When the same data is repeated in many places, or not properly linked, then
problems occur during insertion, updating, or deletion.
These problems are called Data Anomalies.
Anomalies occur mainly due to:
1. Data redundancy (repetition of data) :
2. Poor database design (storing everything in one big table)
3. Lack of normalization (no proper separation of related data)
Example Table — STUDENT_COURSE
Student_ID Student_Name Course Instructor Instructor_Phone
1 Rahul DBMS Meena 98765
2 Neha Java Ramesh 99887
3 Kiran DBMS Meena 98765
Here, the instructor’s name and phone number are repeated for each student enrolled in
the same course.
Need to Overcome Anomalies
To ensure:
1. Data consistency (no contradictions)
2. Data integrity (data remains accurate)
3. No redundancy (no repeated data)
4. Ease of maintenance (update once, everywhere correct)
How to Overcome Anomalies?
➡ By applying Normalization
Normalization is the process of organizing data into multiple related tables based on rules
(1NF, 2NF, 3NF, BCNF…).
Anomalies in DBMS
There are three types of anomalies that occur when the database is not normalized. These
are – Insertion, update and deletion anomaly.
Example: Suppose a manufacturing company stores the employee details in a table named
employee that has four attributes: emp_id for storing employee’s id, emp_name for storing
employee’s name, emp_address for storing employee’s address and emp_dept for storing
the department details in which the employee works. At some point of time the table looks
like this:
emp_id emp_name emp_address emp_dept
101 Rick Delhi D001
101 Rick Delhi D002
123 Maggie Agra D890
166 Glenn Chennai D900
166 Glenn Chennai D004
The above table is not normalized. We will see the problems that we face when a table is
not normalized.
Update anomaly: If the same data is stored in multiple places, updating it in one place and
forgetting another leads to inconsistency.
In the above table we have two rows for employee Rick as he belongs to two departments
of the company. If we want to update the address of Rick then we have to update the same
in two rows or the data will become inconsistent. If somehow, the correct address gets
updated in one department but not in other then as per the database, Rick would be having
two different addresses, which is not correct and would lead to inconsistent data.
Insert anomaly: You cannot add a new record because some information is missing.
Suppose a new employee joins the company, who is under training and currently not
assigned to any department then we would not be able to insert the data into the table if
emp_dept field doesn’t allow nulls.
Delete anomaly: Deleting a record causes unintended loss of other data.
Suppose, if at a point of time the company closes the department D890 then deleting the
rows that are having emp_dept as D890 would also delete the information of employee
Maggie since she is assigned only to this department.
To overcome these anomalies we need to normalize the data.
Summary of Anomalies
Type When it Occurs Example Problem
Insertion Cannot insert data without Add new course without Missing
Anomaly other data a student dependency
Update Same data stored at multiple
Change instructor phone Inconsistent data
Anomaly places
Deletion Deleting one record removes Delete student → lose
Data loss
Anomaly important data course info
Normalized Structure: The tables after Normalization
1. EMPLOYEE
emp_id emp_name emp_address
101 Rick Delhi
123 Maggie Agra
166 Glenn Chennai
2. DEPARTMENT
emp_dept dept_name location
D001 HR Delhi
D002 Finance Mumbai
D004 Marketing Chennai
D890 IT Agra
D900 Sales Chennai
3. EMP_DEPARTMENT (relationship table)
emp_id emp_dept
101 D001
101 D002
123 D890
166 D900
166 D004
How Normalization Solves Anomalies
Anomaly Problem in Original Table Solved in Normalized Tables
Update Change address → must update Employee address stored once in
Anomaly multiple rows EMPLOYEE table
Insert Cannot insert employee with no Can add employee in EMPLOYEE table
Anomaly dept first
Delete Department and employee stored
Delete dept → lose employee info
Anomaly separately
What is Normalization?
Normalization is a process of organizing the data in database to avoid data redundancy,
insertion anomaly, update anomaly & deletion anomaly.
In Normalization:
A large table is broken down into smaller, well-structured tables.
These smaller tables are connected with each other using well-defined
relationships (like primary key, foreign key).
In a well normalized database, any modification or changes in data will require modifying
only a single table.
Goals of Normalization
1. Eliminate Data Redundancy (Avoid Repetition): Store each piece of data only once in the
database.
2. Avoid Data Anomalies: Normalization helps prevent three major anomalies:
Type Description
Update Anomaly Changing data in one place but not in others leads to inconsistency.
Insert Anomaly Cannot insert data because some unrelated value is missing.
Delete Anomaly Deleting data unintentionally removes important related data.
After normalization, these problems are minimized because each table focuses on one
concept (like Employee or Department).
3. Ensure Data Integrity and Consistency: Keep the data accurate and consistent across the
database.
4. Simplify Maintenance and Updates: Make it easier to modify or update data.
5. Improve Query Performance (in some cases): Organize data so queries retrieve accurate
and relevant results.
6. Promote Logical Data Independence: Allow changes in data structure (like adding new
attributes) without affecting existing programs or queries.
Example:
o You can add a dept_manager column to the DEPARTMENT table without
altering the EMPLOYEE table.
7. Establish Clear Relationships Between Data: Define how data in one table relates to data
in another using keys and foreign keys.
8. Optimize Storage: Reduce the amount of data stored unnecessarily.
Summary Table
Goal Description Example
1. Eliminate
Remove duplicate data Store employee info once
redundancy
Prevent insert, update, delete Each table focuses on one
2. Avoid anomalies
issues concept
Maintain valid data
3. Ensure integrity Use primary & foreign keys
relationships
4. Simplify updates Make data maintenance easy Update address in one place
5. Improve accuracy Reduce risk of inconsistent data Consistent employee info
Goal Description Example
6. Logical Add column without affecting
Allow structure changes easily
independence others
7. Define relationships Clarify data connections Employee ↔ Department
8. Optimize storage Use disk space efficiently No repeated values
Properties of Normalization
Normalization aims to structure the data properly so that:
It avoids redundancy,
Maintains data consistency, and
Prevents anomalies during insertion, deletion, and updates.
Five key properties of a normalized relation
1. No Data Value Should Be Unnecessarily Duplicated in Different Rows
Each piece of information should be stored only once in the database.
Duplicate data causes inconsistency and waste of storage.
Example (Before Normalization):
emp_id emp_name emp_dept dept_location
101 Ravi HR Delhi
102 Meena HR Delhi
103 Arjun HR Delhi
Here, “Delhi” is repeated for every HR employee — redundancy.
After Normalization:
Split into two tables:
Employee Table
emp_id emp_name emp_dept
101 Ravi HR
102 Meena HR
103 Arjun HR
Department Table
emp_dept dept_location
HR Delhi
Now, “Delhi” appears only once — redundancy removed.
2. Every Attribute Must Have a Valid Value in Each Row
Each column (attribute) should contain valid and atomic values (not derived, not null if not
allowed).
Each cell stores only one value (no multivalued or missing data).
Example (Before Normalization):
emp_id emp_name emp_phone
101 Ravi 9876543210, 9988776655
102 Meena — (missing)
Here,
Ravi has two phone numbers in one cell (multi-valued).
Meena’s phone is missing (invalid).
After Normalization:
Employee Table
emp_id emp_name emp_phone
101 Ravi 9876543210
101 Ravi 9988776655
102 Meena 9123456789
Now each value is valid and atomic.
3. Each Relation Should Be Self-Contained
Each table (relation) should contain data related to only one concept or entity.
It should not depend on columns of other tables for understanding.
Example (Before Normalization):
emp_id emp_name dept_id dept_name dept_location
101 Ravi D1 HR Delhi
emp_id emp_name dept_id dept_name dept_location
102 Meena D2 IT Bangalore
This table mixes employee and department data → violates the rule.
After Normalization:
Employee Table
emp_id emp_name dept_id
101 Ravi D1
102 Meena D2
Department Table
dept_id dept_name dept_location
D1 HR Delhi
D2 IT Bangalore
Now each table represents a single concept — self-contained.
4. Adding a Row in One Relation Should Not Disturb Other Relations
Inserting a new record in one table should not require inserting unrelated data in
another.
It helps avoid insert anomalies.
Example (Before Normalization):
emp_id emp_name dept_id dept_name
101 Ravi D1 HR
If a new department “Finance” (D3) is created but has no employee yet, we cannot insert it
— because there’s no emp_id or emp_name.
After Normalization:
Department Table
dept_id dept_name
D1 HR
D2 IT
dept_id dept_name
D3 Finance
Now, we can insert Finance department even without any employee — one table change
doesn’t affect another.
5. Updating the Value of an Attribute in One Row Should Not Affect Unrelated Tuples or
Other Tables
Changing one record should not cause side effects or require unnecessary updates
elsewhere.
This avoids update anomalies.
Example (Before Normalization):
emp_id emp_name emp_dept dept_location
101 Ravi HR Delhi
102 Meena HR Delhi
If HR department shifts to “Noida”, you must change both rows → risk of inconsistency.
After Normalization:
Department Table
emp_dept dept_location
HR Delhi
Change “Delhi” → “Noida” in one place only — all employees under HR reflect the change
automatically.
Summary Table
Property Description Example Problem Normalized Solution
1. No Avoid repeating same Same dept location Store in separate
duplication data repeated dept table
Multiple phones / missing Split rows, fill valid
2. Valid values Atomic and not null
values values
3. Self- Each table = one Employee & dept in one
Separate tables
contained concept table
Property Description Example Problem Normalized Solution
Add row without Cannot insert dept
4. Insert safe Separate Dept table
affecting others without employee
Change in one affects Centralize related
5. Update safe Update one row only
many data
Advantages of Normalization
No. Advantage Explanation Example / Impact
Department name stored once
Reduces Data Same data is not stored multiple
1 instead of repeating for every
Redundancy times, saving storage space.
employee.
When data appears in only one Updating “HR” to “Human
Improves Data
2 place, updates automatically Resources” in one table updates
Consistency
reflect everywhere. all linked records.
Eliminates update, insert, and Prevents issues like data loss
3 Avoids Anomalies
delete anomalies. when deleting a department.
Ensures Data Enforces relationships using keys Foreign key ensures valid
4
Integrity and constraints. department IDs for employees.
Enhances Query Small, structured tables make Easier to search or filter data
5
Efficiency data retrieval faster. from smaller, related tables.
Modifying the schema or
Facilitates Easier Changing one table doesn’t
6 updating records becomes
Maintenance affect unrelated ones.
simpler.
Sensitive data can be stored Salary details can be stored in a
Improves Data
7 separately and accessed with secure table accessible only to
Security
proper control. HR.
Supports future expansion or
New attributes or tables can be
8 Better Scalability design changes without much
added easily.
restructuring.
Disadvantages of Normalization
No. Disadvantage Explanation Example / Impact
Data spread across multiple Querying employee with
Complex Queries
1 tables requires complex joins for department details needs
(More Joins)
retrieval. JOIN operation.
No. Disadvantage Explanation Example / Impact
Reduced Too many joins can slow down
Reports combining 5–6 tables
2 Performance for performance, especially for large
may take longer to run.
Large Joins databases.
Understanding normalized Beginners may find it hard to
Difficult for
3 design and relationships can be trace data through multiple
Beginners
challenging. tables.
Breaking data into multiple SELECT queries involving
Overhead in Query
4 tables increases query execution aggregates over multiple
Processing
time in some cases. tables.
Small systems may not need A small library database can
Not Always Practical
5 high normalization; it may add work fine without full
for Small Databases
unnecessary complexity. normalization.
To get complete information, Fetching a student’s marks
Data Retrieval
6 you need to access multiple and class info needs data
Becomes Indirect
tables instead of one. from different tables.
Maintenance of More foreign keys mean more Changes in one table require
7
Foreign Keys dependency between tables. validation in related tables.
Normalization forms
The most commonly used normal forms:
First normal form(1NF)
Second normal form(2NF)
Third normal form(3NF)
Boyce & Codd normal form (BCNF)
Fourth Normal form(4NF)
Fifth Normal form(5NF)
First normal form (1NF)
As per the rule of first normal form, an attribute (column) of a table cannot hold multiple
values. It should hold only atomic values.
Atomic- It is the smallest piece of data that cannot be further divided.
Example: Suppose a company wants to store the names and contact details of its
employees. It creates a table that looks like this:
emp_id emp_name emp_address emp_mobile
101 Herschel New Delhi 8912312390
8812121212
102 Jon Kanpur
9900012222
103 Ron Chennai 7778881212
9990000123
104 Lester Bangalore
8123450987
Two employees (Jon & Lester) are having two mobile numbers stored in the same field in
the table above.
This table is not in 1NF as the rule says “each attribute of a table must have atomic (single)
values”, the emp_mobile values for employees Jon & Lester violates that rule.
To make the table complies with 1NF we should have the data like this:
emp_id emp_name emp_address emp_mobile
101 Herschel New Delhi 8912312390
102 Jon Kanpur 8812121212
102 Jon Kanpur 9900012222
103 Ron Chennai 7778881212
104 Lester Bangalore 9990000123
Second normal form (2NF)
A table is said to be in 2NF if both the following conditions hold:
Table is in 1NF (First normal form)
Every non-key attribute is fully functionally dependent on the whole of the primary
key(i.e., there is no partial dependencies).
(Or No non-prime attribute is dependent on the proper subset of any candidate key of
table.)
An attribute that is not part of any candidate key is known as non-prime attribute.
Example: Suppose a school wants to store the data of teachers and the subjects they teach.
They create a table that looks like this: Since a teacher can teach more than one subjects,
the table can have multiple rows for a same teacher.
teacher_id subject teacher_age
111 Maths 38
111 Physics 38
222 Biology 38
333 Physics 40
333 Chemistry 40
Candidate Keys: {teacher_id, subject}
Non prime attribute: teacher_age
The table is in 1 NF because each attribute has atomic values. However, it is not in 2NF
because non prime attribute teacher_age is dependent on teacher_id alone which is a
proper subset of candidate key. This violates the rule for 2NF as the rule says “no non-prime
attribute is dependent on the proper subset of any candidate key of the table”.
To make the table complies with 2NF we can break it in two tables like this:
teacher_details table:
teacher_id teacher_age
111 38
222 38
333 40
teacher_subject table:
teacher_id Subject
111 Maths
111 Physics
222 Biology
333 Physics
333 Chemistry
Now the tables comply with Second normal form (2NF).
Key Terms Recap for reference
Term Meaning
Functional Dependency
Relationship between attributes (e.g., Roll_No → Name)
(FD)
Determinant The attribute(s) on the left side of the FD (X in X → Y)
Candidate Key Minimal set of attributes that can uniquely identify each row
A set of attributes that uniquely identify a row (may include
Super Key
extra attributes)
Third Normal form (3NF)
A table design is said to be in 3NF if both the following conditions hold:
Table must be in 2NF
Transitive functional dependency of non-prime attribute on any super key should be
removed.
i.e., Every non-key column does not depend on another non-key column. All non-
prime attributes of a relation must be non-transitively functionally dependent on a
key of the relation
An attribute that is not part of any candidate key is known as non-prime attribute.
Example: Suppose a company wants to store the complete address of each employee, they
create a table named employee_details that looks like this:
emp_id emp_name emp_zip emp_state emp_city emp_district
1001 John 282005 UP Agra Dayal Bagh
1002 Ajeet 222008 TN Chennai M-City
1006 Lora 282007 TN Chennai Urrapakkam
1101 Lilly 292008 UK Pauri Bhagwan
1201 Steve 222999 MP Gwalior Ratan
Super keys: {emp_id}, {emp_id, emp_name}, {emp_id, emp_name, emp_zip}…so on
Candidate Keys: {emp_id}
Non-prime attributes: all attributes except emp_id are non-prime as they are not part of
any candidate keys.
Here, emp_state, emp_city & emp_district dependent on emp_zip. And, emp_zip is
dependent on emp_id that makes non-prime attributes (emp_state, emp_city &
emp_district) transitively dependent on super key (emp_id). This violates the rule of 3NF.
To make this table complies with 3NF we have to break the table into two tables to remove
the transitive dependency:
employee table:
emp_id emp_name emp_zip
1001 John 282005
1002 Ajeet 222008
1006 Lora 282007
1101 Lilly 292008
1201 Steve 222999
employee_zip table:
emp_zip emp_state emp_city
282005 UP Agra
222008 TN Chennai
282007 TN Chennai
292008 UK Pauri
222999 MP Gwalior
Boyce Codd normal form (BCNF)
It is an advance version of 3NF that’s why it is also referred as 3.5NF. BCNF is stricter than
3NF.
A relation R is in BCNF if and only if:
it is in 3NF
and for every functional dependency (FD) in the form X → Y,
X must be a super key of the relation(table).
BCNF was developed by Raymond Boyce and Edgar F. Codd to remove certain anomalies
that 3NF could not eliminate completely.
Why do we need BCNF?
Even after applying 3NF, some redundancy and anomalies (insertion, update, deletion) can
still occur when:
A non-primary attribute determines a candidate key.
Or when multiple candidate keys exist and they overlap.
BCNF solves this problem by ensuring that every determinant is a super key.
Example 1 : Suppose there is a company wherein employees work in more than one
department. They store the data like this:
emp_id emp_nationality emp_dept dept_type dept_no_of_emp
1001 Austrian Production and planning D001 200
1001 Austrian Stores D001 250
design and technical
1002 American D134 100
support
1002 American Purchasing department D134 600
Functional dependencies in the table above:
emp_id -> emp_nationality
emp_dept -> {dept_type, dept_no_of_emp}
Candidate key: {emp_id, emp_dept}
The table is not in BCNF as neither emp_id nor emp_dept alone are keys.
To make the table comply with BCNF we can break the table in three tables like this:
emp_nationality table:
emp_id emp_nationality
1001 Austrian
1002 American
emp_dept table:
emp_dept dept_type dept_no_of_emp
Production and planning D001 200
Stores D001 250
design and technical support D134 100
Purchasing department D134 600
emp_dept_mapping table:
emp_id emp_dept
1001 Production and planning
1001 Stores
1002 design and technical support
1002 Purchasing department
Functional dependencies:
emp_id -> emp_nationality
emp_dept -> {dept_type, dept_no_of_emp}
Candidate keys:
For first table: emp_id
For second table: emp_dept
For third table: {emp_id, emp_dept}
This is now in BCNF as in both the functional dependencies left side part is a key.
Fourth and Fifth Normal forms not in syllabus. Given for reference
Fourth normal form (4NF)
o A relation will be in 4NF if it is in Boyce Codd normal form and has no multi-valued
dependency.
o For a dependency A → B, if for a single value of A, multiple values of B exists, then
the relation will be a multi-valued dependency.
Example
STUDENT
STU_ID COURSE HOBBY
21 Computer Dancing
21 Math Singing
34 Chemistry Dancing
74 Biology Cricket
59 Physics Hockey
The given STUDENT table is in 3NF, but the COURSE and HOBBY are two independent entity.
Hence, there is no relationship between COURSE and HOBBY.
In the STUDENT relation, a student with STU_ID, 21 contains two
courses, Computer and Math and two hobbies, Dancing and Singing. So there is a Multi-
valued dependency on STU_ID, which leads to unnecessary repetition of data.
So to make the above table into 4NF, we can decompose it into two tables:
STUDENT_COURSE
STU_ID COURSE
21 Computer
21 Math
34 Chemistry
74 Biology
59 Physics
STUDENT_HOBBY
STU_ID HOBBY
21 Dancing
21 Singing
34 Dancing
74 Cricket
59 Hockey
Fifth normal form (5NF)
o A relation is in 5NF if it is in 4NF and not contains any join dependency and joining
should be lossless.
o 5NF is satisfied when all the tables are broken into as many tables as possible in
order to avoid redundancy.
o 5NF is also known as Project-join normal form (PJ/NF).
Example
SUBJECT LECTURER SEMESTER
Computer Anshika Semester 1
Computer John Semester 1
Math John Semester 1
Math Akash Semester 2
Chemistry Praveen Semester 1
In the above table, John takes both Computer and Math class for Semester 1 but he doesn't
take Math class for Semester 2. In this case, combination of all these fields required to
identify a valid data.
Suppose we add a new Semester as Semester 3 but do not know about the subject and who
will be taking that subject so we leave Lecturer and Subject as NULL. But all three columns
together acts as a primary key, so we can't leave other two columns blank.
So to make the above table into 5NF, we can decompose it into three relations P1, P2 & P3:
P1
SEMESTER SUBJECT
Semester 1 Computer
Semester 1 Math
Semester 1 Chemistry
Semester 2 Math
P2
SUBJECT LECTURER
Computer Anshika
Computer John
Math John
Math Akash
Chemistry Praveen
P3
SEMSTER LECTURER
Semester 1 Anshika
Semester 1 John
Semester 1 John
Semester 2 Akash
Semester 1 Praveen
What is De-normalization?
De-normalization is the process of intentionally adding redundancy (duplicate data) to a
database by combining normalized tables to improve query performance.
Why is De-normalization Needed?
De-normalization is mainly used to improve the performance of database queries when:
Too many joins make queries slow.
The database is read-heavy (more SELECTs, fewer INSERT/UPDATEs).
We need faster access to summary or aggregated data.
The system has to serve real-time reporting or dashboards.
It’s a trade-off between performance and data redundancy.
Example of De-normalization
Normalized Tables (3NF form)
We have two tables:
1. EMPLOYEE Table
Emp_ID Emp_Name Dept_ID
E1 John D1
E2 Mary D2
E3 Raj D1
2. DEPARTMENT Table
Dept_ID Dept_Name Manager
D1 Sales Mr. Raj
D2 IT Ms. Neha
➡To get employee and department info together, we must join:
SELECT Emp_ID, Emp_Name, Dept_Name, Manager
FROM Employee e
JOIN Department d ON e.Dept_ID = d.Dept_ID;
This join may become expensive when the tables are large.
Denormalized Table
Emp_ID Emp_Name Dept_ID Dept_Name Manager
E1 John D1 Sales Mr. Raj
E2 Mary D2 IT Ms. Neha
E3 Raj D1 Sales Mr. Raj
Now, you can fetch all details in one query without joins:
SELECT Emp_ID, Emp_Name, Dept_Name, Manager FROM Employee_Details;
Faster reads, but now department information is repeated (redundant).
Advantages of Denormalization
Advantage Explanation
Reduces the number of joins needed to fetch data, improving
1. Faster Query Execution
response time.
Queries become easier to write and understand, as related
2. Simpler Queries
data is in a single table.
3. Better Performance for Aggregated data or frequently accessed summaries can be
Reports pre-stored.
4. Useful in Data Denormalization is common in OLAP systems for analytical
Warehouses performance.
Disadvantages of Denormalization
Disadvantage Explanation
1. Data Redundancy Duplicate data increases storage requirements.
A change in one record (like a manager name) must be
2. Update Anomalies
updated in multiple rows.
If not carefully maintained, duplicate data can become
3. Risk of Inconsistency
inconsistent.
4. More Complex Data INSERT, UPDATE, DELETE operations become slower and
Modification error-prone.
Normalization Vs De-normalization
Normalization Denormalization
Removes redundancy Adds redundancy
Increases data integrity Improves query performance
Data is combined into fewer tables with
Data is stored in multiple related tables
redundancy
Slower reads for complex queries(due to
Faster reads (less joins)
joins)
Good for OLTP (transactional systems such as Good for OLAP (analytical systems such as
banking, transactions, frequent updates) Data warehouses, reporting, analytics)
What is Decomposition?
Decomposition in DBMS means breaking a relation (table) into two or more smaller
relations.
The goal is to remove redundancy, avoid anomalies (update/insert/delete problems),
and still keep the information correct.
For example:
Suppose we have a table:
STUDENT(Student_ID, Name, Course, Instructor, Dept)
Here:
Each course belongs to a department.
Instructor also depends on the course.
This table has repeated information (redundancy).
We can decompose it into smaller relations to make it better.
The above table stores which student is taking which course, who teaches it, and in which
department.
Example Data:
Student_ID Name Course Instructor Dept
S1 John DBMS Dr. Sharma CS
S2 Mary Networks Dr. Singh IT
S3 John Networks Dr. Singh IT
S4 Raj DBMS Dr. Sharma CS
Properties of a Good Decomposition
To ensure decomposition is useful, it must satisfy two main properties:
1. Lossless-Join Property
Ensures no data is lost or wrongly created after decomposition.
Rule: At least one of the common attributes between decomposed relations must be
a key in one of them.
Example:
R(A, B, C) with FD A → B.
Decompose into R1(A, B) and R2(A, C).
A is key in R1.
Hence, lossless join.
2. Dependency Preservation Property
Ensures we can still check all functional dependencies without joining tables.
Makes constraint checking easier.
When we split (decompose) a big table into smaller tables to remove redundancy
(normalization), we must make sure that all the functional dependencies (FDs) — i.e., the
rules or relationships among columns — are still valid and checkable in the new smaller
tables.
A decomposition is dependency preserving if we can check all the rules (FDs) without
joining the tables again.
Step 1: Start with one table and some FDs (rules).
Example:
We have one table — Student
RollNo Name Course
101 Ravi DBMS
102 Meena Python
103 Ravi AI
Functional Dependencies (FDs):
1. RollNo → Name
2. Name → Course
That means:
Each RollNo has one unique Name.
Each Name belongs to one Course.
Step 2: Decompose (split) the table
Suppose we split Student into two smaller tables:
Table 1: R1(RollNo, Name)
Table 2: R2(Name, Course)
Step 3: Find which FDs belong to which table
We look at each FD and see if all its columns are in one table:
RollNo → Name ✅ (belongs completely to R1)
Name → Course ✅ (belongs completely to R2)
Both FDs can be checked in their own tables.
Step 4: Check if we can still find all original FDs
Since both FDs are already present in R1 and R2,
we don’t lose any dependency.
So this decomposition is dependency preserving.
We can enforce the same rules (FDs) just by checking R1 and R2 separately — no need to
join them again.
Why Dependency Preservation Is Important
If dependencies are not preserved, then:
We cannot check all rules directly from the smaller tables.
We’ll have to join them again — which is time-consuming and expensive.
So we always try to make dependency-preserving decompositions during normalization.
Summary Table
Term Meaning Example
All original functional dependencies RollNo → Name and Name →
Dependency
can still be checked in decomposed Course both checkable
Preservation
tables separately
B → C lost after decomposing
Not Dependency Some dependencies are lost and need
R(A,B,C) into R1(A,B) and
Preserving joining to check
R2(A,C)
Types of Decomposition
1. Lossless Decomposition
2. Lossy Decomposition
1. Lossless-Join Decomposition
A decomposition is lossless if no data is lost when we join the decomposed tables back
together.
It means:
The original table can be perfectly reconstructed by joining the smaller tables.
After decomposition, if we join the smaller relations back, we should get the exact
original relation (no extra/missing tuples).
This is the good kind of decomposition.
Example:
STUDENT(Student_ID, Name, Course, Instructor, Dept)
Decompose into:
1. STUDENT_COURSE(Student_ID, Name, Course)
2. COURSE_INFO(Course, Instructor, Dept)
If we join them back on Course, we get the original table.
This is lossless-join.
Example of Lossless Decomposition
We decompose the STUDENT table into two smaller tables:
Table 1: COURSE_INFO (Course, Instructor, Dept)
Course Instructor Dept
DBMS Dr. Sharma CS
Networks Dr. Singh IT
Table 2: STUDENT_COURSE (Student_ID, Name, Course)
Student_ID Name Course
S1 John DBMS
S2 Mary Networks
S3 John Networks
S4 Raj DBMS
Now, if we perform a natural join on the common attribute Course, we get:
Student_ID Name Course Instructor Dept
S1 John DBMS Dr. Sharma CS
Student_ID Name Course Instructor Dept
S2 Mary Networks Dr. Singh IT
S3 John Networks Dr. Singh IT
S4 Raj DBMS Dr. Sharma CS
The same as the original STUDENT table → Hence, Lossless.
How Do We Know It is Lossless?
Mathematically, the decomposition of R into R₁ and R₂ is lossless if:
The common attribute between R₁ and R₂ is a key for at least one of them.
Here,
Common attribute = Course→ and in COURSE_INFO, Course → Instructor, Dept (Course is a
key)
Hence, Lossless Decomposition.
2. Lossy Decomposition
A decomposition is lossy if some information is lost (or extra meaningless data
appears) when we join the decomposed tables back together.
After decomposition, if we join back the relations, we may get extra (wrong)
tuples.
This is bad decomposition.
Example:
If we decompose the same STUDENT table into:
1. STUDENT_INFO(Student_ID, Name)
2. COURSE_INFO(Course, Instructor, Dept)
When we join, we don’t know which student took which course → we may
generate wrong data (cartesian product).
Example of Lossy Decomposition
Let’s decompose the STUDENT table differently:
Table 1: STUDENT_INFO (Student_ID, Name, Dept)
Table 2: COURSE_INFO (Course, Instructor)
Table 1:
Student_ID Name Dept
S1 John CS
S2 Mary IT
S3 John IT
S4 Raj CS
Table 2:
Course Instructor
DBMS Dr. Sharma
Networks Dr. Singh
Now, if we join these two tables using no proper common key (like Dept or Course), the
DBMS produces a Cartesian Product, leading to extra combinations that never existed in
the original table.
Example of incorrect results after join:
Student_ID Name Dept Course Instructor
S1 John CS DBMS Dr. Sharma
S1 John CS Networks Dr. Singh
S2 Mary IT DBMS Dr. Sharma
S2 Mary IT Networks Dr. Singh
... ... ... ... ...
We get wrong extra rows → This means Lossy Decomposition.
Comparison Table
Feature Lossless Decomposition Lossy Decomposition
No data lost when decomposed tables Some data is lost or extra
Definition
are joined data appears
Join Result Same as original table Different from original table
Common attribute must be a key in Common attribute is not a
Condition
one of the relations key
Example Common
Course (key in COURSE_INFO) Dept (not key in either table)
Attribute
Result Accurate reconstruction Incorrect or redundant data
Functional Dependency
Functional Dependency (FD)
In a database table, a functional dependency means that one attribute (or a group of
attributes) determines another attribute.
A Functional dependency is a constraint that specifies the relationship between two sets of
attributes where one set can accurately determine the value of other sets.
If we know the value of one attribute, we can uniquely find the value of another
attribute.
Notation:
X → Y means X functionally determines Y (or Y is functionally dependent on X).
A functional dependency is denoted by X → Y, where X and Y are sets of attributes of a
relation scheme R. The dependency specifies that for any two tuples t1 and t2 in a relation
r(R), if t1[X]= t2[X], then it must also hold that t1[Y]= t2[Y]. This means that the value of
attribute X uniquely determines the value of attribute Y.
Example:
In a table of Students(Student_ID, Name, Course, Department):
Student_ID → Name
(Because if you know Student_ID, you can find exactly one Name.)
Course → Department
(If you know the Course, you know which Department it belongs to.)
But Name → Student_ID is usually not true, because two students can have the same name.
What is an Irreducible (or Minimal / Canonical) Set of FDs?
When we have a big set of Functional Dependencies (FDs), some of them might be
repeated, unnecessary, or too long.
So, we try to simplify the set — without changing its meaning or the information it gives.
The simplified (shortest) version of the set is called the:
Irreducible / Minimal / Canonical set of FDs.
An irreducible set of FDs is the smallest possible set that gives the same information as the
original one — no FD can be removed or shortened further.
It is a smallest, simplest set of FDs that tells us the same things as before —
with no unnecessary FDs, no extra attributes, and only one attribute on the RHS.
Steps / Rules to Find Irreducible Set (3 Rules)
There are three main rules (or conditions) for a set of FDs to be minimal.
Rule 1: Each FD must have only one attribute on the right-hand side (RHS).
If any FD has more than one attribute on the RHS, we must split it into multiple FDs — one
for each attribute.
Example: Suppose we have:
A → BC
This means: A determines both B and C.
We split it into:
A→B
A→C
Now, each FD has only one attribute on the right-hand side.
Rule 2: No FD should have an extraneous (unnecessary) attribute on the left-hand side
(LHS).
If removing one attribute from the LHS still gives the same dependency,
then that attribute was extra and should be removed.
Example:
Suppose we have:
AB → C
We need to check if A alone can determine C.
If using other FDs, we can derive A → C, then B is extraneous. So, we can reduce it to:
A→C
Rule 3: No FD should be redundant (removable).
If one FD can be derived from others, then it is not needed and should be removed.
Example:
Suppose we have:
A→B
B→C
A→C
Now, A → C can already be derived using the first two FDs (A → B → C).
So A → C is redundant.
Minimal set is:
A→B
B→C
Summary Table
Step Rule What to Do Example
1 One attribute on RHS Split A → BC → A → B, A → C Simplify RHS
Remove extraneous LHS If AB → C works even with just
2 Simplify LHS
attributes A → C, remove B
If one FD can be derived from A → C derived from A →
3 Remove redundant FDs
others, remove it B and B → C
Summary Table of Steps
Step Action FD Set After Step
{ Stud_ID → Course_ID, Course_ID → Instructor Dept,
Original FDs Given
Instructor → Dept }
Step Action FD Set After Step
{ Stud_ID → Course_ID, Course_ID → Instructor,
Step 1 Split RHS
Course_ID → Dept, Instructor → Dept }
Remove
Step 2 No change
extraneous LHS
Remove
Step 3 Removed Course_ID → Dept
redundant FDs
✅ Final Simplified & { Stud_ID → Course_ID, Course_ID → Instructor,
Canonical Set minimal Instructor → Dept }
We broke any multi-valued FDs into single ones.
We checked for unnecessary (extra) attributes.
We removed any dependencies that can be derived from others.
What remains is the smallest, cleanest, and most efficient set — the irreducible (canonical)
set of functional dependencies.
Properties of FD or Armstrong’s Axioms
These are basic rules used to derive all possible functional dependencies from a given set of
FDs.
1. Reflexivity Rule
If Y is a subset of X, then X → Y.
Anything determines itself or its part.
Example:
(Student_ID, Name) → Student_ID
2. Augmentation Rule
If X → Y, then XZ → YZ (add the same attribute to both sides).
Example:
If Student_ID → Name,
Then (Student_ID, Course) → (Name, Course)
3. Transitivity Rule
If X → Y and Y → Z, then X → Z.
Example:
Student_ID → Course
Course → Department
Therefore, Student_ID → Department
These three are the main axioms.
From them, other useful rules are derived (Union, Decomposition, Pseudotransitivity, etc.),
but they are just extensions.
These are inference rules (Armstrong’s Axioms) used to derive new Functional
Dependencies from existing ones.
They help in simplifying or inferring FDs during normalization.
1. Union Rule
Rule
If we have:
X→Y
X→Z
Then we can combine them as:
X → YZ
This is called the Union rule, because we take the union of the right-hand sides.
Example
Given:
Student_ID → Student_Name
Student_ID → Student_Address
By Union rule, we can combine them:
Student_ID → Student_Name, Student_Address
Meaning:
A single student ID determines both the name and address.
Table Example
Student_ID Student_Name Student_Address
S1 Rahul Delhi
S2 Neha Mumbai
Here, knowing Student_ID automatically gives you both Student_Name and
Student_Address.
2. Decomposition Rule
Rule
If we have:
X → YZ
Then we can split it into:
X→Y
X→Z
This is the opposite of Union — we decompose the right-hand side.
Example
Given:
Emp_ID → Emp_Name, Emp_Salary
By Decomposition, we can derive:
Emp_ID → Emp_Name
Emp_ID → Emp_Salary
Meaning:
An employee’s ID determines both their name and salary separately.
Table Example
Emp_ID Emp_Name Emp_Salary
E1 Raj 50000
E2 Neha 60000
Here, knowing Emp_ID gives both the name and salary.
3. Pseudotransitivity Rule
Rule
If:
X→Y
YZ → W
Then we can infer:
XZ → W
It’s called Pseudotransitivity because it extends the Transitivity rule with extra attributes.
Example
Given:
Course_ID → Instructor
Instructor, Room_No → Schedule
Then by Pseudotransitivity:
Course_ID, Room_No → Schedule
Explanation:
From Course_ID, we get Instructor.
Combine it with Room_No → we can find Schedule.
Table Example
Course_ID Instructor Room_No Schedule
C1 Mehta 101 9 AM
C2 Roy 102 10 AM
From the above:
Course_ID → Instructor
Instructor, Room_No → Schedule
So, Course_ID, Room_No → Schedule
Summary Table
Rule Given Inferred FD Meaning
Union X → Y, X → Z X → YZ Combine FDs with same LHS
Decomposition X → YZ X → Y and X → Z Split FDs with multiple RHS attributes
Pseudotransitivity X → Y, YZ → W XZ → W Substitute Y in second FD with X
Note:
Union → Combine results from same source.
Decomposition → Split results into smaller parts.
Pseudotransitivity → Chain dependencies that involve extra attributes.
What is a Cover of Functional Dependencies?
In DBMS normalization, you may have many functional dependencies (FDs) describing how
attributes depend on each other.
Some of them might be redundant or repeated.
A “cover” of a set of FDs means:
A simplified version of the FD set that represents exactly the same information as the
original.
Two Important Terms
1. Set of FDs (F) — The original set of functional dependencies.
2. Cover (G) — Another set of FDs that is equivalent to F if both imply the same
dependencies.
Formally:
G is a cover of F if:
Every FD in F can be inferred from G, and
Every FD in G can be inferred from F.
Why Covers Are Needed
Covers are used to:
Remove redundant dependencies
Simplify normalization steps
Make it easier to check equivalence between FD sets
Improve database design clarity
Types of Covers
1. Equivalent Cover → Two sets of FDs that imply the same dependencies.
2. Canonical / Minimal Cover → The simplest form of an FD set (no redundant parts).
Example: Basic Cover or Equivalent cover
Let’s take an example FD set:
F = { A → BC, B → C, A → B }
Let’s check if we can simplify this.
Step 1: Decompose FDs (split right-hand side)
Break multi-attribute RHS (right-hand side) into single attributes.
A → BC becomes:
A→B
A→C
Now F becomes:
F = { A → B, A → C, B → C, A → B }
Remove duplicates:
F = { A → B, A → C, B → C }
Now this is the cover of the original FDs (same meaning, simpler form).
Table Example
Let’s take a table Employee:
A (Emp_ID) B (Dept) C (Manager)
E1 HR Raj
E2 IT Neha
E3 HR Raj
Here:
A → B (Emp_ID determines Department)
B → C (Department determines Manager)
By Transitivity,
A → C (Emp_ID determines Manager)
So the cover of {A → BC, B → C} is equivalent to {A → B, A → C, B → C}.
Steps to Find a Canonical (Minimal) Cover
1. Split the RHS (so each FD has only one attribute on the right).
2. Remove redundant attributes from the LHS (check if they are unnecessary).
3. Remove redundant FDs (check if one FD can be derived from others).
Example for Canonical Cover
Given:
F = { A → BC, B → C, A → B }
Step 1: Split RHS →
A → B, A → C, B → C
Step 2: Remove redundant FDs
Check if any FD can be derived from others.
A → C can be derived from A → B and B → C
→ So A → C is redundant.
Final Canonical Cover:
G = { A → B, B → C }
Meaning:
Knowing A gives B, and B gives C → So knowing A gives C indirectly.
Hence, {A → B, B → C} is the minimal cover (simplest equivalent set).
Summary
Term Meaning Example
Functional
A → B means A determines B Student_ID → Name
Dependency
A simplified set of FDs {A → B, A → C, B → C} covers {A
Cover
representing the same meaning → BC, B → C, A → B}
F and G imply same
Equivalent Cover Two FD sets imply each other
dependencies
Term Meaning Example
Canonical/Minimal Simplest, non-redundant version {A → B, B → C} for {A → BC, B →
Cover of FDs C, A → B}
Closure of an Attribute (Attribute Closure) in DBMS
1 What is Closure of an Attribute?
Closure of an attribute means Finding all the attributes that can be determined (found)
from a given attribute or set of attributes using functional dependencies.
It is written as: X+
Where:
X → attribute or set of attributes
X⁺ → closure of X
Why do we need Attribute Closure?
Attribute closure is used to:
Check if a functional dependency is valid
Find candidate keys
Check normal forms (2NF, 3NF, BCNF)
Test dependency preservation
Steps to Find Attribute Closure
Step-by-Step Procedure
1 Start with the given attribute(s)
2 Add all attributes that are directly dependent on them
3 Repeat the process until no new attributes can be added
4 Final set is the closure
Example (Student Database)
Relation: STUDENT
Student_ID Name Dept_ID Dept_Name
Functional Dependencies (FDs)
1. Student_ID → Name
2. Student_ID → Dept_ID
3. Dept_ID → Dept_Name
Find Closure of Student_ID
We write:
(Student_ID)+
Step 1: Start with Student_ID
Step 2: Apply FD: Student_ID → Name
Add Name
Step 3: Apply FD: Student_ID → Dept_ID
Add Dept_ID
Step 4: Apply FD: Dept_ID → Dept_Name
Add Dept_Name
Step 5: Stop
No more attributes can be added.
Final Closure
Since Student_ID⁺ contains all attributes of the relation, Student_ID is a Candidate Key
Closure of a Relation in DBMS
What is Closure of a Relation?
Closure of a relation means finding all the functional dependencies (FDs) that can be derived
from a given set of functional dependencies using inference rules.
It is denoted as: F+
Where:
F → Given set of functional dependencies
F⁺ → Closure of the relation (all implied FDs)
Why Do We Need Relation Closure?
Relation closure is used to:
Find all implied dependencies
Check normal forms (3NF, BCNF)
Check dependency preservation
Verify correctness of database design
Difference Between Attribute Closure and Relation Closure
Attribute Closure Relation Closure
Finds attributes Finds dependencies
Written as X⁺ Written as F⁺
Used to find keys Used to find implied FDs
Steps to Find Closure of a Relation
Step-by-Step Method
1 Start with the given set of FDs (F)
2 Apply Armstrong’s Axioms
3 Generate new FDs
4 Continue until no new FD can be derived
5 The complete set is F⁺
Example (Student Relation)
Relation: STUDENT
Attributes:
(Student_ID, Name, Dept_ID, Dept_Name)
Given Functional Dependencies (F)
1. Student_ID → Name
2. Student_ID → Dept_ID
3. Dept_ID → Dept_Name
Find Closure of the Relation (F⁺)
Step 1: Start with given FDs
Step 2: Apply Transitivity
From:
Student_ID → Dept_ID
Dept_ID → Dept_Name
We get:
Student_ID → Dept_Name
Step 3: Apply Union Rule
From:
Student_ID → Name
Student_ID → Dept_ID
We get:
Student_ID → {Name, Dept_ID}
Step 4: Apply Union + Transitivity
Student_ID → {Name, Dept_ID, Dept_Name}
Final Closure of the Relation (F⁺)
F⁺ = {
Student_ID → Name,
Student_ID → Dept_ID,
Dept_ID → Dept_Name,
Student_ID → Dept_Name,
Student_ID → {Name, Dept_ID},
Student_ID → {Name, Dept_ID, Dept_Name}
We Learn from Relation Closure that
Some dependencies are implied, not directly written
Helps in normalization
Helps identify keys and redundancies
Types of Functional Dependency in DBMS
1 What is Functional Dependency (FD)?
A functional dependency shows the relationship between attributes in a table.
It is written as:
X→Y
Meaning: If we know the value of X, we can uniquely determine the value of Y.
Types of Functional Dependency
There are four important types:
1. Trivial Functional Dependency
2. Non-Trivial Functional Dependency
3. Partial Functional Dependency
4. Full Functional Dependency
1. Trivial Functional Dependency
A functional dependency X → Y is trivial if:
Y⊆X
That means Y is already subset or part of X.
Example
STUDENT Table
Student_ID Name Dept
FD:
✔ Name is already included in the left side
✔ So this is a trivial FD
Why is it called trivial?
Because it is always true and does not give new information.
Key Points
✔ Always valid
✔ Does not cause redundancy
✔ Not useful for normalization
2. Non-Trivial Functional Dependency
A functional dependency X → Y is non-trivial if:
Y⊈X
Y is not subset or part of X.
Example
STUDENT Table
Student_ID Name Dept
FD:
✔ Name is not part of Student_ID
✔ Hence, non-trivial FD
Key Points
✔ Gives meaningful information
✔ Important for normalization
✔ May cause redundancy if not handled properly
3. Partial Functional Dependency
A functional dependency is partial when:
A non-prime attribute depends on only part of a composite key instead of the
whole key.
Example
ENROLLMENT Table
Student_ID Course_ID Student_Name Course_Name
Composite Primary Key:
FDs:
✔ Student_Name depends on only Student_ID
✔ Not on full key
✔ Hence Partial Dependency
Why is it a problem?
Because it causes:
❌ Data redundancy
❌ Update anomalies
Where is it removed?
✔ Removed in Second Normal Form (2NF)
4. Full Functional Dependency
A functional dependency is full when:
A non-prime attribute depends on the entire composite key
Not on any subset of the key
Example
ENROLLMENT Table
Student_ID Course_ID Grade
Composite Key:
FD:
✔ Grade depends on both attributes together
✔ Not on Student_ID alone
✔ Not on Course_ID alone
Hence Full Functional Dependency
Why is it good?
✔ No redundancy
✔ Desired in normalized tables
Comparison Table
Type Definition Example
Trivial FD RHS ⊆ LHS (A, B) → A
Non-Trivial FD RHS not in LHS A→B
Partial FD Depends on part of composite key Student_ID → Name
Full FD Depends on full composite key (Student_ID, Course_ID) → Grade