0% found this document useful (0 votes)
25 views106 pages

Database Normalization and SQL Basics

Spm module 3

Uploaded by

pradeepshettar50
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
25 views106 pages

Database Normalization and SQL Basics

Spm module 3

Uploaded by

pradeepshettar50
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Database

Management
Systems (BCS403) -
2023-24 - Module 3
Dr. Narender M
Department of CS&E
The National Institute of Engineering
Introduction to Normalization using
Functional and Multivalued
Dependencies
• Informal design guidelines for relation
schema
• Functional Dependencies
Topics • Normal Forms based on Primary Keys
• Second and Third Normal Forms
• Boyce-Codd Normal Form
• Multivalued Dependency and Fourth
Normal Form
• Join Dependencies and Fifth Normal
Form
SQL
• SQL data definition and data types
• Schema change statements in SQL
• Specifying constraints in SQL
Topics • Retrieval queries in SQL
• INSERT, DELETE, and UPDATE
statements in SQL
• Additional features of SQL
• We have assumed that attributes are grouped
to form a relation schema by using the
Introduction to common sense of the database designer or
by mapping a database schema design from
Normalization a conceptual data model.
using • We need some formal way of analyzing why
Functional and one grouping of attributes into a relation
schema may be better than another.
Multivalued • Goodness of relation schemas: The first is the
Dependencies logical (or conceptual) level—how users
interpret the relation schemas and the
meaning of their attributes.
• The second is the implementation (or
physical storage) level—how the tuples in a
base relation are stored and updated.
Introduction to • Database design may be performed using two
approaches: bottom-up or top-down.
Normalization
• A bottom-up design methodology (also called
using design by synthesis) considers the basic
Functional and relationships among individual attributes as
the starting point and uses those to construct
Multivalued relation schemas.
Dependencies • Not very popular in practice because it
suffers from the problem of having to collect
a large number of binary relationships among
attributes as the starting point.
• A top-down design methodology (also called
design by analysis) starts with a number of
groupings of attributes into relations that exist
Introduction to together naturally.
• The relations are then analyzed individually and
Normalization collectively, leading to further decomposition
using until all desirable properties are met.
Functional and • The implicit goals of the design activity are
information preservation and minimum
Multivalued redundancy.
Dependencies • Information preservation - maintaining all
concepts, including attribute types, entity types,
relationship types and generalization/
specialization relationships.
• The relational design must preserve all of
Introduction to these concepts, which are originally
Normalization captured in the conceptual design.
using • Minimizing redundancy implies minimizing
redundant storage of the same
Functional and information and reducing the need for
Multivalued multiple updates to maintain consistency
Dependencies across multiple copies of the same
information.
Informal • Four informal guidelines that may be
design used as measures to determine the
quality of relation schema design:
guidelines for • Making sure that the semantics of
relation the attributes is clear in the schema
schema • Reducing the redundant information
in tuples
• Reducing the NULL values in tuples
• Disallowing the possibility of
generating spurious tuples
1. Imparting Clear Semantics to Attributes in
Informal Relations
• Whenever we group attributes to form a
design relation schema, we assume that attributes
guidelines for belonging to one relation have certain real-
world meaning.
relation • The semantics of a relation refers to its
meaning resulting from the interpretation of
schema attribute values in a tuple.
• The meaning of the EMPLOYEE relation
schema is simple: Each tuple represents an
employee, with values for the employee’s
name (Ename), Social Security number (Ssn),
birth date (Bdate), and address (Address),
and the number of the department that the
employee works for (Dnumber).
Informal
design • The ease with which
guidelines for the meaning of a
relation’s attributes
relation can be explained is
schema an informal
measure of how
well the relation is
designed.
Guideline 1
Informal • Design a relation schema so that it is
easy to explain its meaning.
design • Do not combine attributes from
guidelines for multiple entity types and relationship
types into a single relation.
relation • If a relation schema corresponds to one
schema entity type or one relationship type, it is
straightforward to explain its meaning.
• Otherwise, if the relation corresponds
to a mixture of multiple entities and
relationships, semantic ambiguities will
result, and the relation cannot be easily
explained.
Informal design
guidelines for relation
schema
• A tuple in the EMP_DEPT relation schema
represents a single employee but
includes, along with the Dnumber and
additional information.
• They violate Guideline 1 by mixing
attributes from distinct real-world entities:
EMP_DEPT mixes attributes of employees
and departments, and EMP_PROJ mixes
attributes of employees and projects and
the WORKS_ON relationship.
Informal design
guidelines for relation
schema
• Hence, they fare poorly against
the above measure of design
quality.
• They may be used as views, but
they cause problems when used
as base relations
2. Redundant Information in Tuples and
Informal Update Anomalies
design • One goal of schema design is to minimize
the storage space used by the base
guidelines for relations.
relation • Grouping attributes into relation schemas
has a significant effect on storage space.
schema • Storing natural joins of base relations
leads to an additional problem referred to
as update anomalies.
• These can be classified into insertion
anomalies, deletion anomalies, and
modification anomalies.
• Insertion Anomalies
• To insert a new employee tuple into EMP_DEPT,
Informal we must include either the attribute values for
the department that the employee works for, or
design NULLs
• It is difficult to insert a new department that has
guidelines for no employees as yet in the EMP_DEPT relation.
relation The only way to do this is to place NULL values in
the attributes for employee. This violates the
schema entity integrity for EMP_DEPT because its
primary key Ssn cannot be null.
• Deletion Anomalies
• If we delete from EMP_DEPT an employee tuple
that happens to represent the last employee
working for a particular department, the
information concerning that department is lost
inadvertently from the database.
• Modification Anomalies.
Informal • In EMP_DEPT, if we change the value
design of one of the attributes of a particular
guidelines for department—say, the manager of
department 5—we must update the
relation tuples of all employees who work in
schema that department; otherwise, the
database will become inconsistent.
• Three anomalies are undesirable and
cause difficulties to maintain
consistency of data as well as require
unnecessary updates.
Informal Guideline 2
design • Design the base relation schemas so
that no insertion, deletion, or
guidelines for modification anomalies are present in
relation the relations. If any anomalies are
present, note them clearly and make
schema sure that the programs that update the
database will operate correctly.
• These guidelines may sometimes have
to be violated in order to improve the
performance of certain queries.
3. NULL Values in Tuples
Informal • If many of the attributes in a ‘fat’ relation do
not apply to all tuples in the relation, we
design end up with many NULLs in those tuples.
guidelines for • This can waste space at the storage level
and may also lead to problems with
relation understanding the meaning of the
attributes.
schema • Another problem with NULLs is how to
account for them when aggregate
operations such as COUNT or SUM are
applied.
• SELECT and JOIN operations involve
comparisons; if NULL values are present,
the results may become unpredictable.
• NULLs can have multiple interpretations, such as
the following:
• The attribute does not apply to this tuple. For
Informal example, Visa_status may not apply to U.S.
design students.
• The attribute value for this tuple is unknown. For
guidelines for example, the Date_of_birth may be unknown for
an employee.
relation • The value is known but absent; that is, it has not
been recorded yet. For example, the
schema Home_Phone_Number for an employee may
exist but may not be available and recorded yet.
Guideline 3
• As far as possible, avoid placing attributes in a base
relation whose values may frequently be NULL. If
NULLs are unavoidable, make sure that they apply
in exceptional cases only and do not apply to a
majority of tuples in the relation.
4. Generation of Spurious Tuples
Informal • Because of bad schema design we
might not be able to recover the
design information that was originally in
guidelines for relations.
relation • If we attempt a NATURAL JOIN operation
on EMP_PROJ1 and EMP_LOCS, the
schema result produces many more tuples than
the original set of tuples in EMP_PROJ.
• Additional tuples that were not in
EMP_PROJ are called spurious tuples
because they represent spurious
information that is not valid.
Informal design guidelines for
relation schema
Informal
design
guidelines
for relation
schema
• This is because in this case Plocation happens to
be the attribute that relates EMP_LOCS and
Informal EMP_PROJ1, and Plocation is neither a primary
design key nor a foreign key in either EMP_LOCS or
EMP_PROJ1.
guidelines for Guideline 4
relation • Design relation schemas so that they can be
joined with equality conditions on attributes that
schema are appropriately related (primary key, foreign
key) pairs in a way that guarantees that no
spurious tuples are generated.
• Avoid relations that contain matching attributes
that are not (foreign key, primary key)
combinations because joining on such attributes
may produce spurious tuples.
Functional Dependencies
• A formal tool for analysis of relational schemas that enables us to
detect and describe some of the above-mentioned problems in
precise terms.
• The single most important concept in relational schema design
theory is that of a functional dependency.
Definition of Functional Dependency
• A functional dependency, denoted by X → Y, between two sets of
attributes X and Y that are subsets of R specifies a constraint on
the possible tuples that can form a relation state r of R. The
constraint is that, for any two tuples t1 and t2 in r that have t1[X] =
t2[X], they must also have t1[Y] = t2[Y].
Functional Dependencies
• This means that the values of the Y component of a tuple in r
depend on, or are determined by, the values of the X component.
• Alternatively, the values of the X component of a tuple uniquely (or
functionally) determine the values of the Y component.
• We also say that there is a functional dependency from X to Y, or
that Y is functionally dependent on X.
• Relation extensions r(R) that satisfy the functional dependency
constraints are called legal relation states (or legal extensions) of
R.
Functional Dependencies
• Consider the relation schema EMP_PROJ; from the semantics of the
attributes and the relation, we know that the following functional
dependencies should hold:
a. Ssn → Ename
b. Pnumber → {Pname, Plocation}
c. {Ssn, Pnumber} → Hours
Functional Dependencies
• These functional dependencies specify that
(a) the value of an employee’s Social Security
number (Ssn) uniquely determines the
employee name (Ename)
(b) the value of a project’s number (Pnumber)
uniquely determines the project name
(Pname) and location (Plocation)
(c) a combination of Ssn and Pnumber values
uniquely determines the number of hours the
employee currently works on the project per
week (Hours).
• We may think that Text → Course, we cannot
Functional confirm this unless we know that it is true for all
possible legal states of TEACH.
Dependencies • It is sufficient to demonstrate a single
counterexample to disprove a functional
dependency.
• For example, because ‘Smith’ teaches both ‘Data
Structures’ and ‘Database Systems,’ we can
conclude that Teacher does not functionally
determine Course.
Functional • The following FDs may hold because the four
tuples in the current extension have no violation
Dependencies of these constraints: B → C; C → B; {A, B} → C;
{A, B} → D; and {C, D} → B.
• The following do not hold because we already
have violations of them in the given extension:
A → B (tuples 1 and 2 violate this constraint);
B → A (tuples 2 and 3 violate this constraint);
D → C (tuples 3 and 4 violate it).
Normal Forms based on Primary Keys
Normalization of Relations
• The normalization process, as first proposed by Codd, takes a relation
schema through a series of tests to certify whether it satisfies a certain
normal form.
• The process, which proceeds in a top-down fashion by evaluating each
relation against the criteria for normal forms and decomposing relations
as necessary, can thus be considered as relational design by analysis.
• Codd proposed three normal forms, which he called first, second, and
third normal form.
Normal Forms based on Primary Keys
• A stronger definition of 3NF—called Boyce-Codd normal form
(BCNF)—was proposed later by Boyce and Codd.
• All these normal forms are based on a single analytical tool: the
functional dependencies among the attributes of a relation.
• Later, a fourth normal form (4NF) and a fifth normal form (5NF)
were proposed, based on the concepts of multivalued
dependencies and join dependencies.
Normal Forms based on Primary Keys
• Normalization of data can be considered a process of analyzing
the given relation schemas based on their FDs and primary keys to
achieve the desirable properties of
(1) minimizing redundancy
(2) minimizing the insertion, deletion, and update anomalies
• A relation schema that does not meet the condition for a normal
form is decomposed into smaller relation schemas that contain a
subset of the attributes and meet the test that was otherwise not
met by the original relation.
Normal Forms based on Primary Keys
• The normal form of a relation refers to the highest normal form
condition that it meets, and hence indicates the degree to which it has
been normalized.
• The process of normalization through decomposition must also
confirm the existence of additional properties that the relational
schemas.
• These would include two properties:
• The nonadditive join or lossless join property, which guarantees
that the spurious tuple generation problem does not occur with
respect to the relation schemas created after decomposition.
• The dependency preservation property, which ensures that each
functional dependency is represented in some individual relation
resulting after decomposition.
Normal Forms based on Primary Keys
Practical Use of Normal Forms
• Normalization is carried out in practice so that the resulting
designs are of high quality and meet the desirable properties.
• The practical utility of these normal forms becomes questionable
when the constraints on which they are based are hard to
understand or to detect.
• The database designers need not normalize to the highest
possible normal form (usually up to 3NF, BCNF or 4NF)
• Denormalization: The process of storing the join of higher normal
form relations as a base relation—which is in a lower normal form.
Normal Forms based on Primary Keys
Definitions of Keys and Attributes Participating in Keys
• A superkey of a relation schema R = {A1, A2, ...., An} is a set of
attributes S subset-of R with the property that no two tuples t1
and t2 in any legal relation state r of R will have t1[S] = t2[S].
• A key K is a superkey with the additional property that removal of
any attribute from K will cause K not to be a superkey anymore.
• If a relation schema has more than one key, each is called a
candidate key.
• One of the candidate keys is arbitrarily designated to be the
primary key, and the others are called secondary keys.
Normal Forms based on Primary Keys
• A Prime attribute must be a member of some candidate key
• A Nonprime attribute is not a prime attribute—that is, it is not a member of
any candidate key.
First Normal Form
• Disallows
• composite attributes
• multivalued attributes
• nested relations; attributes whose values for an individual tuple are non-
atomic
• Considered to be part of the definition of relation and all relations in a
relational schema are in 1NF by default.
Normal
Forms based
on Primary
Keys
Normal
Forms based
on Primary
Keys
Second Normal Form
• Uses the concepts of FDs and primary key.
• Definition:
• Full functional dependency: a FD Y → Z where removal of any
attribute from Y means the FD does not hold any more.
• A functional dependency X → Y is a partial dependency if some
attribute A ε X can be removed from X and the dependency still
holds; that is, for some A ε X, (X − {A}) → Y.
• Examples:
• {SSN, PNUMBER} -> HOURS is a full FD since neither SSN -> HOURS
nor PNUMBER -> HOURS hold
• {SSN, PNUMBER} -> ENAME is not a full FD (it is called a partial
dependency ) since SSN -> ENAME also holds.
Second Normal Form
• A relation schema R is in second normal form (2NF) if every non-
prime attribute A in R is fully functionally dependent on the
primary key.
• R can be decomposed into 2NF relations via the process of 2NF
normalization.
Third Normal Form
• Definition:
• Transitive functional dependency: a FD X -> Z that can be
derived from two FDs X -> Y and Y -> Z
• Examples:
• SSN -> DMGRSSN is a transitive FD
• Since SSN -> DNUMBER and DNUMBER -> DMGRSSN hold
• SSN -> ENAME is non-transitive
• Since there is no set of attributes X where SSN -> X and X ->
ENAME
Third Normal Form
• A relation schema R is in third normal form (3NF) if it is in 2NF, and no
non-prime attribute A in R is transitively dependent on the primary key
• R can be decomposed into 3NF relations via the process of 3NF
normalization
• NOTE:
• In X -> Y and Y -> Z, with X as the primary key, we consider this a
problem only if Y is not a candidate key.
• When Y is a candidate key, there is no problem with the transitive
dependency .
• E.g., Consider EMP (SSN, Emp#, Salary ).
• Here, SSN -> Emp# -> Salary and Emp# is a candidate key.
Third
Normal
Form
Third
Normal
Form
Informally saying

2nd normal 3rd normal


1st normal form
form form
• All attributes • All attributes • All attributes
depend on depend on depend on
the key the whole nothing but
key the key
Boyce-Codd Normal Form
• A relation schema R is in Boyce-Codd Normal Form (BCNF) if
whenever an FD X -> A holds in R, then X is a superkey of R
• Each normal form is strictly stronger than the previous one
• Every 2NF relation is in 1NF
• Every 3NF relation is in 2NF
• Every BCNF relation is in 3NF
• There exist relations that are in 3NF but not in BCNF
• The goal is to have each relation in BCNF (or 3NF)
Boyce-
Codd
Normal
Form
Boyce-
Codd
Normal
Form
Boyce-Codd Normal Form
• Two FDs exist in the relation TEACH:
• fd1: { student, course} -> instructor
• fd2: instructor -> course
• {student, course} is a candidate key for this relation and that the
dependencies shown follow the pattern in Figure 10.12 (b).
• So this relation is in 3NF but not in BCNF
• A relation NOT in BCNF should be decomposed so as to meet this
property, while possibly forgoing the preservation of all functional
dependencies in the decomposed relations.
Multivalued Dependency and Fourth Normal
Form
• The Fourth Normal Form (4NF) is a level of database normalization
where there are no non-trivial multivalued dependencies other
than a candidate key.
• It builds on the first three normal forms (1NF, 2NF, and 3NF) and
the Boyce-Codd Normal Form (BCNF).
• It states that, in addition to a database meeting the requirements
of BCNF, it must not contain more than one multivalued
dependency.
Multivalued Dependency and Fourth Normal
Form
• Multivalued dependency would occur whenever two separate
attributes in a given table happen to be independent of each other.
• Both depend on another third attribute.
• The multivalued dependency contains at least two of the
attributes dependent on the third attribute.
• This is the reason why it always consists of at least three of the
attributes.
• MVD between a and b is denoted by a --> --> b or a->>b.
Multivalued Dependency and Fourth Normal
Form
• Condition For MVD
1. R1[a], R2[a], R3[a], R4[a] should
contain the same value.
• R1[a] = R2[a] = R3[a] = R4[a] Name Course_work Hobby

2. The value of R1[b] should be equal to Rahul C++ Painting


the value R3[b] and the value R2[b] Rahul Python Music
should be equal to the value of R4[b]. Rahul C++ Music
• R1[b] = R3[b], R2[b] = R4[b] Rahul Python Painting
• Name --> > Course work and
Name --> > Hobby
Multivalued Dependency and Fourth Normal Form

Name Course_work Hobby


Rahul C++ Painting
Rahul Python Music
Rahul C++ Music
Rahul Python Painting

Name Course_work Name Hobby


Rahul C++ Rahul Painting
Rahul Python Rahul Music
Join Dependencies and Fifth Normal Form
• A given relation ‘R’ is decomposed into 3 relations after 4NF
resulting into R1, R2 and R3.
• Now we perform a join between R1, R2 and R3 in any order. If we
get back original R without loss of data and without spurious
tuples, we say it is in 5NF.
• A relation R is in 5NF if and only if it satisfies the following
conditions:
1. R should be already in 4NF.
2. It cannot be further non loss decomposed (join dependency).
Join Dependencies Agent Company Product
and Fifth Normal
Form A1 PQR Nut

• Join dependency A1 PQR Bolt


exists if we are not
able to get original A1 XYZ Nut
table R when
decomposed tables
A1 XYZ Bolt
are joined.

A2 PQR Nut
Join Dependencies and Fifth Normal Form

Agent Company Agent Product Company Product

A1 PQR A1 Nut PQR Nut

A1 XYZ A1 Bolt PQR Bolt

XYZ Nut
A2 PQR A2 Nut
XYZ Bolt
Agent Company Product

A1 PQR Nut
Join
Dependencies A1 PQR Bolt
and Fifth
Normal Form A1 XYZ Nut

When joined, we get the A1 XYZ Bolt


original table R. Hence it
is in 5NF.
A2 PQR Nut
Module 3 Chapter 2
SQL Introduction
• SQL language
• Considered one of the major reasons for the commercial success of
relational databases
• SQL
• The origin of SQL is relational predicate calculus called tuple calculus
which was proposed initially as the language SQUARE.
• SQL Actually comes from the word “SEQUEL” which was the original term
used in the paper: “SEQUEL TO SQUARE” by Chamberlin and Boyce. IBM
could not copyright that term, so they abbreviated to SQL and copyrighted
the term SQL.
• Now popularly known as “Structured Query language”.
• SQL is an informal or practical rendering of the relational data model with
syntax.
SQL data definition and data types
• Terminology:
• Table, row, and column used for relational model terms relation, tuple,
and attribute
• CREATE statement
• Main SQL command for data definition.
• SQL schema
• Identified by a schema name
• Includes an authorization identifier and descriptors for each element
• Schema elements include
• Tables, constraints, views, domains, and other constructs
• Each statement in SQL ends with a semicolon
SQL data definition and data types
• CREATE SCHEMA statement
• CREATE SCHEMA COMPANY AUTHORIZATION ‘Jsmith’;
• Catalog
• Named collection of schemas in an SQL environment
• SQL also has the concept of a cluster of catalogs.
CREATE TABLE Command
• Specifying a new relation
• Provide name of table
• Specify attributes, their types and initial constraints
SQL data definition and data types
• Can optionally specify schema:
• CREATE TABLE [Link] ...
• or
• CREATE TABLE EMPLOYEE ...
• Base tables (base relations)
• Relation and its tuples are actually created and stored as a file
by the DBMS
• Virtual relations (views)
• Created through the CREATE VIEW statement. Do not
correspond to any physical file.
SQL data
definition and
data types
SQL data
definition and
data types
SQL data definition and data types
Attribute Data Types and Domains in SQL
• Numeric data types
• Integer numbers: INTEGER, INT, and SMALLINT
• Floating-point (real) numbers: FLOAT or REAL, and DOUBLE
PRECISION
• Character-string data types
• Fixed length: CHAR(n), CHARACTER(n)
• Varying length: VARCHAR(n), CHAR VARYING(n), CHARACTER
VARYING(n)
• Bit-string data types
• Fixed length: BIT(n)
• Varying length: BIT VARYING(n)
SQL data definition and data types
• Boolean data type
• Values of TRUE or FALSE or NULL
• DATE data type
• Ten positions
• Components are YEAR, MONTH, and DAY in the form YYYY-
MM-DD
• Multiple mapping functions available in RDBMSs to change
date formats
SQL data definition and data types
Additional data types
• Timestamp data type
• Includes the DATE and TIME fields
• Plus a minimum of six positions for decimal fractions of seconds
• Optional WITH TIME ZONE qualifier
• INTERVAL data type
• Specifies a relative value that can be used to increment or
decrement an absolute value of a date, time, or timestamp
• DATE, TIME, Timestamp, INTERVAL data types can be cast or
converted to string formats for comparison.
SQL data definition and data types
• Domain
• Name used with the attribute specification
• Makes it easier to change the data type for a domain that is
used by numerous attributes
• Improves schema readability
• Example: CREATE DOMAIN SSN_TYPE AS CHAR(9);
• TYPE
• User Defined Types (UDTs) are supported for object-oriented
applications. Uses the command CREATE TYPE
The DROP Command
• The DROP command can be used to drop
named schema elements, such as tables,
domains, or constraints.
Schema • One can also drop a schema. For example, if
change a whole schema is no longer needed, the
DROP SCHEMA command can be used.
statements • There are two drop behavior options:
in SQL CASCADE and RESTRICT.
• For example, to remove the COMPANY
database schema and all its tables, domains,
and other elements, the CASCADE option is
used as follows:
• DROP SCHEMA COMPANY CASCADE;
• If the RESTRICT option is chosen in place
of CASCADE, the schema is dropped only
Schema if it has no elements in it; otherwise, the
DROP command will not be executed.
change • To use the RESTRICT option, the user must
statements first individually drop each element in the
schema, then drop the schema itself.
in SQL • The DROP TABLE command not only
deletes all the records in the table if
successful, but also removes the table
definition from the catalog.
The ALTER Command
• The definition of a base table or of other
named schema elements can be changed
Schema by using the ALTER command.
change • For base tables, the possible alter table
actions include adding or dropping a
statements column (attribute), changing a column
in SQL definition, and adding or dropping table
constraints.
• ALTER TABLE [Link] ADD
COLUMN Job VARCHAR(12);
• To drop a column, we must choose either
CASCADE or RESTRICT for drop behavior.
• If CASCADE is chosen, all constraints and
Schema views that reference the column are
dropped automatically from the schema,
change along with the column.
statements • If RESTRICT is chosen, the command is
successful only if no views or constraints
in SQL (or other schema elements) reference the
column.
• ALTER TABLE [Link] DROP
COLUMN Address CASCADE;
Specifying constraints in SQL
Basic constraints:
• Relational Model has 3 basic constraint types that are
supported in SQL:
• Key constraint: A primary key value cannot be duplicated
• Entity Integrity Constraint: A primary key value cannot be null
• Referential integrity constraints : The “foreign key “ must have a
value that is already present as a primary key or may be null.
Specifying constraints in SQL
Attribute Constraints
• Default value of an attribute
• DEFAULT <value>
• NULL is not permitted for a particular attribute (NOT NULL)
• CHECK clause
• Dnumber INT NOT NULL CHECK (Dnumber > 0 AND Dnumber <
21);
Specifying constraints in SQL
Specifying Key and Referential Integrity Constraints
• PRIMARY KEY clause
• Specifies one or more attributes that make up the primary key
of a relation
• Dnumber INT PRIMARY KEY;
• UNIQUE clause
• Specifies alternate (secondary) keys (called CANDIDATE keys
in the relational model).
• Dname VARCHAR(15) UNIQUE;
Specifying constraints in SQL
• FOREIGN KEY clause
• Default operation: reject update on violation
• Attach referential triggered action clause
• Options include SET NULL, CASCADE, and SET DEFAULT
• Action taken by the DBMS for SET NULL or SET DEFAULT is the same
for both ON DELETE and ON UPDATE CASCADE option suitable for
“relationship” relations
Giving Names to Constraints
• Using the Keyword CONSTRAINT
• Name a constraint
• Useful for later altering
Specifying
constraints in
SQL
Specifying constraints in SQL
• Additional Constraints on individual tuples within a relation are
also possible using CHECK
• CHECK clauses at the end of a CREATE TABLE statement
• Apply to each tuple individually
• CHECK (Dept_create_date <= Mgr_start_date);
Retrieval queries in SQL
• SELECT statement
• One basic statement for retrieving information from a database
• SQL allows a table to have two or more tuples that are identical in
all their attribute values
• Unlike relational model (relational model is strictly set-theory
based)
• Multiset or bag behavior
• Tuple-id may be used as a key
Retrieval Basic form of the SELECT statement:

queries in SQL
Retrieval queries in SQL
• Logical comparison operators
• =, <, <=, >, >=, and <>
• Projection attributes
• Attributes whose values are to be retrieved
• Selection condition
• Boolean condition that must be true for any retrieved tuple.
Selection conditions include join conditions when multiple
relations are involved.
Retrieval queries in SQL
Retrieval queries in SQL
Ambiguous Attribute Names
• Same name can be used for two (or more) attributes in different
relations
• As long as the attributes are in different relations
• Must qualify the attribute name with the relation name to
prevent ambiguity
Retrieval queries in SQL
Aliasing and Renaming
• Aliases or tuple variables
• Declare alternative relation names E and S to refer to the EMPLOYEE
relation twice in a query:
• For each employee, retrieve the employee’s first and last name and
the first and last name of his or her immediate supervisor.
SELECT [Link], [Link], [Link], [Link]
FROM EMPLOYEE AS E, EMPLOYEE AS S
WHERE E.Super_ssn=[Link];
• Recommended practice to abbreviate names and to prefix same or
similar attribute from multiple tables.
Retrieval queries in SQL
• The attribute names can also be renamed
• EMPLOYEE AS E(Fn, Mi, Ln, Ssn, Bd, Addr, Sex, Sal, Sssn, Dno)
• Note that the relation EMPLOYEE now has a variable name E
which corresponds to a tuple variable
• The “AS” may be dropped in most SQL implementations
Retrieval queries in SQL
Unspecified WHERE Clause and Use of the Asterisk
• Missing WHERE clause
• Indicates no condition on tuple selection
• Effect is a CROSS PRODUCT
• Result is all possible tuple combinations (or the Algebra
operation of Cartesian Product)
Retrieval queries in SQL
• Specify an asterisk (*)
• Retrieve all the attribute values of the selected tuples
• The * can be prefixed by the relation name
Retrieval queries in SQL
Tables as Sets in SQL
• SQL does not automatically eliminate duplicate tuples in query results
• For aggregate operations duplicates must be accounted for
• Use the keyword DISTINCT in the SELECT clause
• Only distinct tuples should remain in the result
Retrieval queries in SQL
Set operations
• UNION, EXCEPT (difference), INTERSECT
• Corresponding multiset operations: UNION ALL, EXCEPT ALL,
INTERSECT ALL)
• Type compatibility is needed for these operations to be valid
Retrieval queries in SQL
Substring Pattern Matching and Arithmetic Operators
• LIKE comparison operator
• Used for string pattern matching
• % replaces an arbitrary number of zero or more characters
• underscore (_) replaces a single character
• Examples: WHERE Address LIKE ‘%Houston,TX%’;
• WHERE Ssn LIKE ‘_ _ 1_ _ 8901’;
• BETWEEN comparison operator
• WHERE(Salary BETWEEN 30000 AND 40000) AND Dno = 5;
Retrieval queries in SQL
• Standard arithmetic operators:
• Addition (+), subtraction (–), multiplication (*), and division (/)
may be included as a part of SELECT
• Show the resulting salaries if every employee working on the
‘ProductX’ project is given a 10 percent raise.
SELECT [Link], [Link], 1.1 * [Link] AS Increased_sal
FROM EMPLOYEE AS E, WORKS_ON AS W, PROJECT AS P
WHERE [Link]=[Link] AND [Link]=[Link] AND
[Link]=‘ProductX’;
Retrieval queries in SQL
Ordering of Query Results
• Use ORDER BY clause
• Keyword DESC to see result in a descending order of values
• Keyword ASC to specify ascending order explicitly
• Typically placed at the end of the query
• ORDER BY [Link] DESC, [Link] ASC, [Link] ASC
Complete syntax of SELECT
SELECT attribute list
Retrieval FROM table list
queries in WHERE conditions
SQL GROUP BY grouping attribute
HAVING condition
ORDER BY attribute
INSERT, DELETE, and UPDATE statements in
SQL
• Three commands used to modify the database:
• INSERT, DELETE, and UPDATE
• INSERT typically inserts a tuple (row) in a relation (table)
• UPDATE may update a number of tuples (rows) in a relation (table)
that satisfy the condition
• DELETE may also update a number of tuples (rows) in a relation
(table) that satisfy the condition
INSERT, DELETE, and UPDATE statements in
SQL
INSERT
• In its simplest form, it is used to add one or more tuples to a
relation
• Attribute values should be listed in the same order as the
attributes were specified in the CREATE TABLE command
• Constraints on data types are observed automatically
• Any integrity constraints as a part of the DDL specification are
enforced
INSERT, DELETE, and UPDATE statements in
SQL
• Specify the relation name and a list of values for the tuple. All
values including nulls are supplied.
• The variation below inserts multiple tuples where a new table is
loaded values from the result of a query.
INSERT, DELETE, and UPDATE statements in
SQL
BULK LOADING OF TABLES
• Another variation of INSERT is used for bulk-loading of several tuples
into tables
• A new table TNEW can be created with the same attributes as T and
using LIKE and DATA in the syntax, it can be loaded with entire data.
• EXAMPLE:
CREATE TABLE D5EMPS LIKE EMPLOYEE
(SELECT E.*
FROM EMPLOYEE AS E
WHERE [Link] = 5) WITH DATA;
INSERT, DELETE, and UPDATE statements in
SQL
DELETE
• Removes tuples from a relation
• Includes a WHERE-clause to select the tuples to be deleted
• Referential integrity should be enforced
• Tuples are deleted from only one table at a time (unless CASCADE is
specified on a referential integrity constraint)
• A missing WHERE-clause specifies that all tuples in the relation are
to be deleted; the table then becomes an empty table
• The number of tuples deleted depends on the number of tuples in
the relation that satisfy the WHERE-clause
INSERT, DELETE, and UPDATE statements in
SQL
• Includes a WHERE clause to select the tuples to be deleted. The
number of tuples deleted will vary.
INSERT, DELETE, and UPDATE statements in
SQL
UPDATE
• Used to modify attribute values of one or more selected tuples
• A WHERE-clause selects the tuples to be modified
• An additional SET-clause specifies the attributes to be modified
and their new values
• Each command modifies tuples in the same relation
• Referential integrity specified as part of DDL specification is
enforced
INSERT, DELETE, and UPDATE statements in
SQL
• Example: Change the location and controlling department number of
project number 10 to 'Bellaire' and 5, respectively
UPDATE PROJECT
SET Plocation = ‘Bellaire’, Dnum = 5
WHERE Pnumber = 10;
• Several tuples can be modified with a single UPDATE command. An
example is to give all employees in the ‘Research’ department a 10%
raise in salary.
UPDATE EMPLOYEE
SET Salary = Salary * 1.1
WHERE Dno = 5;
Additional features of SQL
• Techniques for specifying complex retrieval queries
• Writing programs in various programming languages that include
SQL statements: Embedded and dynamic SQL, SQL/CLI (Call
Level Interface) and its predecessor ODBC, SQL/PSM (Persistent
Stored Module)
• Set of commands for specifying physical database design
parameters, file structures for relations, and access paths, e.g.,
CREATE INDEX
Additional features of SQL
• Transaction control commands
• Specifying the granting and revoking of privileges to users
• Constructs for creating triggers
• Enhanced relational systems known as object- relational define
relations as classes. Abstract data types (called User Defined
Types- UDTs) are supported with CREATE TYPE
• New technologies such as XML and OLAP are added to versions of
SQL
End of Module 3

You might also like