Unit 3
Unit 3
UNIT – 3
SQL JOINS AND VIEWS
SQL Joins and Views: Inner Join, Natural Join, Full Outer Join, Left Outer Join, right outer Join, Equi
Join, Definition of View, creating a View, Managing Views (Listing, Updating, Deleting).
Normalization: Anomalies in relational database design. Functional dependencies - Axioms.
Decomposition, Transitive Dependency. Data Normalization: First normal form, Second normal form,
Third normal form. Boyce-Codd normal form.
SQL joins:
Join is an operation in DBMS(Database Management System) that combines the rows of two
or more tables based on related columns between them. The main purpose of join is to
queries. It is denoted by ⨝.
retrieve the data from multiple tables in other words Join is used to perform multi-table
Syntax
R3 <- ⨝(R1) <join_condition> (R2)
where R1 and R2 are two relations to be joined and R3 is a relation that will hold the result of
the join operation.
Example
Temp <- ⨝(student) [Link]=[Link](Exam)
where S and E are aliases of the student and exam respectively.
JOIN Example
Consider the two tables below as follows:
1) Inner Join
Inner Join is a join operation in DBMS that combines two or more tables based on related
columns and returns only rows that have matching values among tables. Inner join has two
types.
Theta Join
Conditional join
Equi Join
Natural Join
a) Theta Join
Theta join is more flexible than the inner join. It allows us to join tables based on any
condition, not just equality.
We can use any comparison operator such as >, <, >=, <=, or !=.
Here are the dbms joins with examples
Example:
Consider two tables, Employees and Departments:
Employees Table:
emp_id name salar dept_id
y
1 Divyansh 50000 10
2 Krish 60000 20
3 Neha 55000 30
Departments Table:
dept_id dept_nam min_salary
e
10 HR 45000
20 IT 55000
30 Sales 52000
SQL Query:
SELECT [Link], Departments.dept_name
FROM Employees
JOIN Departments
ON [Link] >= Departments.min_salary;
Output:
name dept_name
Divyansh HR
Krish IT
Neha Sales
Here, we get employees who meet or exceed the minimum salary requirement for their
department.
b) Conditional Join
Conditional join or Theta join is a type of inner join in which tables are combined based on
the specified condition.
In conditional join, the join condition can include <, >, <=, >=, ≠ operators in addition to the
'=' operator.
Example: Suppose two tables A and B
Table A
R S
10 5
7 20
Table B
T U
10 12
A ⨝ S<T B
17 6
Output
R S T U
1 5 10 12
0
Explanation: This query joins the table A, B and projects attributes R, S, T, U were the
condition S < T is satisfied.
c) Equi Join
Equi Join is a type of inner join where the join condition uses the equality operator ('=')
between columns.
Example: Suppose there are two tables Table A and Table C
Table A
Column Column B
A
a a
a b
Table C
Column Column B
A
a a
Output
Column A Column B
a a
Explanation: The data value "a" is available in both tables Hence we write that "a" is the
table in the given output.
d) Natural Join
Natural join is a type of inner join in which we do not need any comparison operators. In
natural join, columns should have the same name and domain. There should be at least one
common attribute between the two tables.
Example: Suppose there are two tables Table A and Table B
Table A
Number Square
2 4
3 9
Table B
Number Cube
2 8
A⨝B
3 27
Output
Number Square Cube
2 4 8
3 9 27
Explanation - Column Number is available in both tables Hence we write the "Number
column once " after combining both tables.
2) Outer Join
Outer join is a type of join that retrieves matching as well as non-matching records from
related tables. There are three types of outer join
Left outer join
Right outer join
Full outer join
A⟕B
5 125
Output
Number Squar Cube
e
2 4 8
3 9 27
4 16 NULL
Explanation: Since we know in the left outer join we take all the columns from the left table
(Here Table A) In the table A we can see that there is no Cube value for number 4. so we
mark this as NULL.
(b) Right Outer Join
It is also called a right join. This type of outer join retrieves all records from the right table
and retrieves matching records from the left table. And for the record which doesn't lies in
Left table will be marked as NULL in result Set.
Definition of View
A view is a table whose rows are not explicitly stored, a view is a virtual table based on the
result-set of an SQL statement. A view can contain all rows of a table or select rows from a
table. A view can be created from one or many tables which depends on the written SQL
query to create a view.
A view is generated to show the information that the end-user requests the data according to
specified needs rather than complete information of the table.
Creating Views
Database views are created using the CREATE VIEW statement. Views can be created from a
single table, multiple tables or another view.
To create a view, a user must have the appropriate system privilege according to the specific
implementation.
Syntax in Mysql
CREATE VIEW view_name AS
SELECT column1, column2, ...
FROM table_name
WHERE condition;
Example:
CREATE VIEW Students_CSE AS
SELECT Roll_no,Name
FROM Students
WHERE Branch = 'CSE';
Key Terms:
view_name: Name for the View
table_name: Name of the table
condition: Condition to select rows
Example 1: Creating a Simple View from a Single Table
Example 1.1: In this example, we will create a View named DetailsView from the
table StudentDetails.
Query:
CREATE VIEW DetailsView AS
SELECT NAME, ADDRESS
FROM StudentDetails
WHERE S_ID < 5;
Use the below query to retrieve the data from this view
SELECT * FROM DetailsView;
Output:
Name Address
Harsh Kolkata
Ashish Durgapur
Pratik Delhi
Dhanraj Bihar
Example 1.2: Here, we will create a view named StudentNames from the table
StudentDetails.
Query:
CREATE VIEW StudentNames AS
SELECT S_ID, NAME
FROM StudentDetails
ORDER BY NAME;
If we now query the view as,
SELECT * FROM StudentNames;
Output:
S_ID Name
2 Ashish
4 Dhanraj
1 Harsh
3 Pratik
5 Ram
Harsh Kolkata 90
Pratik Delhi 80
Dhanra Bihar 95
j
Ram Rajsthan 85
OR
Querying a View
We can query the view as follows
Syntax in Mysql
SELECT * FROM view_name
Example:
SELECT * FROM Students_CSE;
Dropping a View
In order to delete a view in a database, we can use the DROP VIEW statement.
Database migration tool
Syntax in Mysql
DROP FROM view_name
Example:
DROP FROM Students_CSE;
Normalization:
b) Deletion anomalies: Deletion anomalies occur when deleting a record from a database
and can result in the unintentional loss of data. For example, if a database contains
information about customers and orders, deleting a customer record may also delete all
the orders associated with that customer.
c) Updation anomalies: Updation anomalies occur when modifying data in a database and
can result in inconsistencies or errors. For example, if a database contains information
about employees and their salaries, updating an employee’s salary in one record but not in
all related records could lead to incorrect calculations and reporting.
Functional dependencies:
In relational database management, functional dependency is a concept that specifies the
relationship between two sets of attributes where one attribute determines the value of
another attribute. It is denoted as X → Y, where the attribute set on the left side of the arrow,
X is called Determinant, and Y is called the Dependent.
A functional dependency (FD) is a relationship between two attributes, typically between the
PK and other non-key attributes within a table. For any relation R, attribute Y is functionally
dependent on attribute X (usually the PK), if for every valid instance of X, that value of X
uniquely determines the value of Y. This relationship is indicated by the representation
below :
X ———–> Y
The left side of the above FD diagram is called the determinant, and the right side is
the dependent. Here are a few examples.
In the first example, below, SIN determines Name, Address and Birthdate. Given SIN, we can
determine any of the other attributes within the table.
For the second example, SIN and Course determine the date completed (Date Completed).
This must also work for a composite PK.
Pointing arrows determines the depending attribute and the origin of the arrow
determines the determinant set.
Armstrong’s Axioms/Properties of Functional Dependency in DBMS
Axioms
Armstrong's Axioms refer to a set of inference rules, introduced by William W. Armstrong,
that are used to test the logical implication of functional dependencies. Given a set of
functional dependencies F, the closure of F (denoted as F+) is the set of all functional
dependencies logically implied by F. Armstrong's Axioms, when applied repeatedly, help
generate the closure of functional dependencies.
These axioms are fundamental in determining functional dependencies in databases and are
used to derive conclusions about the relationships between attributes.
Axioms
1. Axiom of Reflexivity
The Axiom of Reflexivity is the foundational principle stating that if you have a set of
attributes, a functional dependency exists between that set and itself. In simpler terms, it
means that any set of attributes functionally determines itself.
Example: In a student database, if we have an attribute 'Student_ID,' it is trivially true that
'Student_ID' determines 'Student_ID.'
If A is a set of attributes and B is a subset of A, then the functional dependency A → B holds
true.
For example, { Employee_Id, Name } → Name is valid.
2. Axiom of Augmentation
The Axiom of Augmentation tells us that if a functional dependency exists between two sets
of attributes, adding more attributes to both sides of the dependency does not change the
dependency.
1. If a functional dependency A → B holds true, then appending any number of the attribute
to both sides of dependency doesn't affect the dependency. It remains true.
o For example, X → Y holds true then, ZX → ZY also holds true.
o For example, if { Employee_Id, Name } → { Name } holds true then, { Employee_Id,
Name, Age } → { Name, Age }
Example:
If 'Student_ID' determines 'Student_Name,' then it also implies that 'Student_ID,
Course_Code' determines 'Student_Name, Course_Code.'
3. Axiom of Transitivity
The Axiom of Transitivity states that if we have two dependencies, where one attribute set
determines another, and the second set determines a third set, then we can infer that the first
set determines the third set.
2. If two functional dependencies X → Y and Y → Z hold true, then X → Z also holds true
by the rule of Transitivity.
o For example, if { Employee_Id } → { Name } holds true and { Name } → { Department
} holds true, then { Employee_Id } → { Department } also holds true.
Example:
If 'Student_ID' determines 'Course_Code' and 'Course_Code' determines 'Course_Name,' then
'Student_ID' determines 'Course_Name.'
Example:
Let’s assume the following functional dependencies:
{A} → {B}
{B} → {C}
{A, C} → {D}
1. Reflexivity: Since any set of attributes determines its subset, we can immediately infer the
following:
{A} → {A} (A set always determines itself).
{B} → {B}.
{A, C} → {A}.
2. Augmentation: If we know that {A} → {B}, we can add the same attribute (or set of
attributes) to both sides:
From {A} → {B}, we can augment both sides with {C}: {A, C} → {B, C}.
From {B} → {C}, we can augment both sides with {A}: {A, B} → {C, B}.
3. Transitivity: If we know {A} → {B} and {B} → {C}, we can infer that:
{A} → {C} (Using transitivity: {A} → {B} and {B} → {C}).
Although Armstrong's axioms are sound and complete, there are additional rules for
functional dependencies that are derived from them. These rules are introduced to simplify
operations and make the process easier.
Secondary Rules
In addition to the primary axioms, Armstrong also introduced several secondary rules:
1) Union
This rule suggests that if two tables are separate, and the PK is the same, you may want to
consider putting them together. It states that if X determines Y and X determines Z then X
must also determine Y and Z
2) Decomposition
Decomposition is the reverse of the Union rule. If you have a table that appears to contain
two entities that are determined by the same PK, consider breaking them up into two
tables. This rule states that if X determines Y and Z, then X determines Y and X
determines Z separately
Pseudo Transitivity: If A→B holds and BC→D holds, then AC→D holds.
If X→Y and YZ→W then XZ→W.
Example:
Let’s assume we have the following functional dependencies in a relation schema:
{A} → {B}
{A} → {C}
{X} → {Y}
{Y, Z} → {W}
Now, let's apply the Secondary Rules to derive new functional dependencies.
1. Union Rule: If A → B and A → C, then by the Union Rule, we can infer:
A → BC This means if A determines both B and C, it also determines their
combination, BC.
2. Composition Rule: If A → B and X → Y hold, then by the Composition Rule, we can
infer:
AX → BY
3. Decomposition Rule: If A → BC holds, then by the Decomposition Rule, we can
infer:
A → B and A → C
4. Pseudo Transitivity Rule: If A → B and BC → D hold, then by the Pseudo
Transitivity Rule, we can infer:
AC → D
Decomposition:
Decomposition in the context of database design refers to the process of breaking down a
single table into multiple tables in order to eliminate redundancy, reduce data anomalies,
and achieve normalization. Decomposition is typically done using rules defined by
normalization forms.
However, while decomposition can be helpful, it is not without challenges. Done
incorrectly, decomposition can lead to its own set of problems.
Decomposition in DBMS involves dividing a table into multiple tables, aiming to
eradicate redundancy, inconsistencies, and anomalies. This process, represented as {X1,
X2,……Xn}, ensures dependency preservation and losslessness. When a relational
model's relation lacks appropriate normal form, decomposition becomes necessary to
address issues like information loss, anomalies, and redundancy, ultimately enhancing the
overall design quality and efficiency of the database.
There are two types of decomposition as shown below:
Types of Decomposition
Decomposition is of two major types in DBMS:
Lossless
Lossy
1. Lossless Decomposition
R1 ⋈ R2 ⋈ R3 ……. ⋈ Rn = R
join would eventually result in the original relation that is very similar.
A B
1 2
2 5
3 3
R1( A , B )
B C
2 1
5 3
3 3
R2( B , C )
R1 ⋈ R2 = R
For lossless decomposition, we must have-
Now, if we perform the natural join ( ⋈ ) of the sub relations R1 and R2 , we get-
A B C
1 2 1
2 5 3
3 3 3
2. Lossy Decomposition
Consider there is a relation R which is decomposed into sub relations R1 , R2 , …. , Rn.
This decomposition is called lossy join decomposition when the join of the sub relations
does not result in the same relation R that was decomposed.
The natural join of the sub relations is always found to have some extraneous tuples.
For lossy join decomposition, we always have-
R1 ⋈ R2 ⋈ R3 ……. ⋈ Rn ⊃ R
A B C
1 2 1
2 5 3
3 3 3
R( A , B , C )
Consider this relation is decomposed into two sub relations as R1( A , C ) and R2( B , C )-
A C
1 1
2 3
3 3
R1( A , B )
B C
2 1
5 3
3 3
R2( B , C )
R1 ⋈ R2 ⊃ R
For lossy decomposition, we must have-
Now, if we perform the natural join ( ⋈ ) of the sub relations R1 and R2 we get-
A B C
1 2 1
2 5 3
2 3 3
3 5 3
3 3 3
Clearly, R1 ⋈ R2 ⊃ R.
This relation is not same as the original relation R and contains some extraneous tuples.
Data Normalization
Normal Forms
There are four types of normal forms that are usually used in relational databases as you can
see in the following figure:
<EmployeeDetail>
Employee Code Employee Name Employee Phone Number
101 John 98765623,998234123
101 John 89023467
102 Ryan 76213908
103 Stephanie 98132452
Here, the Employee Phone Number is a multi-valued attribute. So, this relation is not in 1NF.
To convert this table into 1NF, we make new rows with each Employee Phone Number as a
new row as shown below:
<EmployeeDetail>
Employee Code Employee Name Employee Phone Number
101 John 998234123
101 John 98765623
101 John 89023467
102 Ryan 76213908
103 Stephanie 98132452
To remove partial dependencies from this table and normalize it into second normal form, we
can decompose the <EmployeeProjectDetail> table into the following three tables:
<EmployeeDetail>
Employee Code Employee Name
101 John
101 John
102 Ryan
103 Stephanie
<EmployeeProject>
Employee Code Project ID
101 P03
101 P01
102 P04
103 P02
<ProjectDetail>
Project ID Project Name
P03 Project103
P01 Project101
P04 Project104
P02 Project102
Thus, we’ve converted the <EmployeeProjectDetail> table into 2NF by decomposing it into
<EmployeeDetail>, <ProjectDetail> and <EmployeeProject> tables. As you can see, the
above tables satisfy the following two rules of 2NF as they are in 1NF and every non-prime
attribute is fully dependent on the primary key.
The relations in 2NF are clearly less redundant than relations in 1NF. However, the
decomposed relations may still suffer from one or more anomalies due to the transitive
dependency. We will remove the transitive dependencies in the Third Normal Form.
In this example, the non-primary key column "Customer City" is transitively dependent on
the primary key. That is, it depends on "Customer ID", which is not part of the primary key,
instead of depending directly on the primary key "Order ID". To bring this table to 3NF, we
can split it into two tables ?
Table 1: Customers
Customer ID Customer Name Customer City
100 John Smith New York
101 Jane Doe Los Angeles
102 Bob Johnson San Francisco
Table 2: Orders
Order ID Customer ID Order Date Order Total
1 100 2022-01-01 100
2 101 2022-01-02 200
3 102 2022-01-03 300
Now, the "Customer City" column is no longer transitively dependent on the primary key and
is instead in a separate table that has a direct relationship with the primary key. This makes
the table 3NF-compliant.
For a relational table to be in Boyce-Codd normal form, it must satisfy the following rules:
1. The table must be in the third normal form.
2. For every non-trivial functional dependency X -> Y, X is the super key of the table. That
means X cannot be a non-prime attribute if Y is a prime attribute.
A super key is a set of one or more attributes that can uniquely identify a row in a database
table.
Let us take an example of the following <EmployeeProjectLead> table to understand how to
normalize the table to the BCNF:
<EmployeeProjectLead>
Employee Code Project ID Project Leader
101 P03 Grey
101 P01 Christian
102 P04 Hudson
103 P02 Petro
The above table satisfies all the normal forms till 3NF, but it violates the rules of BCNF
because the candidate key of the above table is {Employee Code, Project ID}. For the non-
trivial functional dependency, Project Leader -> Project ID, Project ID is a prime attribute but
Project Leader is a non-prime attribute. This is not allowed in BCNF.
To convert the given table into BCNF, we decompose it into three tables:
<EmployeeProject>
Employee Code Project ID
101 P03
101 P01
102 P04
103 P02
<ProjectLead>
Project Leader Project ID
Grey P03
Christian P01
Hudson P04
Petro P02
Thus, we’ve converted the <EmployeeProjectLead> table into BCNF by decomposing it into
<Employee Project> and <Project Lead> tables.
Advantages of Normalization
Normalization eliminates data redundancy and ensures that each piece of data is stored in
only one place, reducing the risk of data inconsistency and making it easier to maintain
data accuracy.
By breaking down data into smaller, more specific tables, normalization helps ensure that
each table stores only relevant data, which improves the overall data integrity of the
database.
Normalization simplifies the process of updating data, as it only needs to be changed in
one place rather than in multiple places throughout the database.
Normalization enables users to query the database using a variety of different criteria, as
the data is organized into smaller, more specific tables that can be joined together as
needed.
Normalization can help ensure that data is consistent across different applications that use
the same database, making it easier to integrate different applications and ensuring that all
users have access to accurate and consistent data.
Disadvantages of Normalization
Normalization can result in increased performance overhead due to the need for
additional join operations and the potential for slower query execution times.
Normalization can result in the loss of data context, as data may be split across multiple
tables and require additional joins to retrieve.
Proper implementation of normalization requires expert knowledge of database design
and the normalization process.
Normalization can increase the complexity of a database design, especially if the data
model is not well understood or if the normalization process is not carried out correctly.