DB DESIGN
UNIT III DATABASE DESIGN
Functional dependencies – Normalization – Normal forms based on primary keys
(1NF, 2NF, 3NF, BCNF, 4NF, 5NF) – Triggers – Cursor
1. Functional Dependencies
Definition
A functional dependency, denoted by X ->Y, between two sets of attributes X and Y that are
subsets of R specifies a constraint on the possible tuples that can form a relation state r of R. The
constraint is that, for any two tuples t1 and t2 in r that have t1[X] = t2[X], they must also have
t1[Y]= t2[Y].
This means that the values of the Y component of a tuple in r depend on, or are determined by,
the values of the X component; alternatively, the values of the X component of a tuple uniquely
(or functionally) determine the values of the Y component. We also say that there is a functional
dependency from X to Y, or that Y is functionally dependent on X. The abbreviation for
functional dependency is FD or f.d. The set of attributes X is called the left-hand side of the FD,
and Y is called the right-hand side.
Thus, X functionally determines Y in a relation schema R if, and only if, whenever two tuples of
r(R) agree on their X-value, they must necessarily agree on their Yvalue. Note the following:
• If a constraint on R states that there cannot be more than one tuple with a given X-
value inany relation instance r(R)—that is, X is a candidate key of R—this implies
that X->Y for any subset of attributes Y of R (because the key constraint implies that
no two tuples in any legal state r(R) will have the same value of X). If X is a
candidate key of R, then X->R.
• If X->Y in R, this does not say whether or not Y->X in R.
Consider the relation schema EMP_PROJ in Figure 2.14(b); from the semantics of the
attributes andthe relation, we know that the following functional dependencies should hold:
a. Ssn->Ename
1
DB DESIGN
b. Pnumber ->{Pname, Plocation}
c. {Ssn, Pnumber}->Hours
These functional dependencies specify that (a) the value of an employee’s Social Security
number(Ssn) uniquely determines the employee name (Ename), (b) the value of a project’s
number(Pnumber) uniquely determines the project name (Pname) and location (Plocation), and
(c) acombination of Ssn and Pnumber values uniquely determines the number of hours the
employeecurrently works on the project per week (Hours). Alternatively,we say that Ename
is functionallydetermined by (or functionally dependent on) Ssn, or given a value of Ssn, we
know the value of Ename, and so on.
2
DB DESIGN
3
DB DESIGN
2. Normalization Definition
Takes a relation schema through a series of tests
• Certify whether it satisfies a certain normal form
• Proceeds in a top-down fashion
•
Normal form tests
First Normal Form (1NF)
• First Normal Form is defined to disallow multivalued attributes, composite
attributes, andtheir combinations.
• It states that the domain of an attribute must include only atomic (simple, indivisible)
values and that the value of any attribute in a tuple must be a single value from the
domain of that attribute
• The only attribute values permitted by 1NF are single atomic (or indivisible) values.
Exmaple
There are two ways we can look at the Dlocations attribute:
4
DB DESIGN
1. The domain of Dlocations contains atomic values, but some tuples can have a set of
these values. In this case, Dlocations is not functionally dependent on the primary key
number.
2. The domain of Dlocations contains sets of values and hence is nonatomic. In this case,
Dnumber→Dlocations because each set is considered a single member of the attribute
domain.
In either case, the DEPARTMENT relation in Figure is not in [Link] are three main
techniques toachieve first normal form for such a relation:
1. Remove the attribute Dlocations that violates 1NF and place it in a separate relation
DEPT_LOCATIONS along with the primary key Dnumber of DEPARTMENT. The primary
key of this relation is the combination {Dnumber, Dlocation}, as shown in Figure. A distinct
tuple in DEPT_LOCATIONS exists for each location of a department. This decomposes the
non-1NF relationinto two 1NF relations.
2. Expand the key so that there will be a separate tuple in theoriginal DEPARTMENT
relation for each location of a DEPARTMENT, as shown in Figure. In this case, the primary
key becomes the combination {Dnumber, Dlocation}. This solution has the disadvantage of
introducing redundancy in the relation.
5
DB DESIGN
If a maximum number of values is known for the attribute—for example, if it is known that at
most three locations can exist for a department—replace the Dlocations attribute by three
atomic attributes:Dlocation1, Dlocation2, and Dlocation3. This solution has the disadvantage
of introducing NULL values if most departments have fewer than three locations. It further
introduces spurious semantics about the ordering among the location values that is not
originally intended. Querying on thisattribute becomes more difficult; for example, consider
how you would write the query: List the departments that have ‘Bellaire’ as one of their
locations in this design.
6
DB DESIGN
DEPARTMRNT
Dna Dnu Dmg Dloc Dloc Dloc Dloc
me mber r_ssn 1 2 3 4
Of the three solutions above, the first is generally considered best because it does not suffer
from redundancy and it is completely general, having no limit placed on a maximum number
of values.
Second Normal Form (2NF)
Second normal form (2NF) is based on the concept of full functional dependency.
A functional dependency X --> Y is a full functional dependency if removal of any attribute
A from X means that the dependency does not hold any more; that is, for any attribute A ε X, (X
– {A}) does not functionally determine Y.
A functional dependency X-->Y is a partial dependency if some attribute A ε X can be
removed from
X and the dependency still holds; that is, for some A ε X, (X – {A}) --> Y.
In Figure, {Ssn, Pnumber} -->Hours is a full dependency (neither Ssn -->Hours nor
Pnumber--
>Hours holds). However, the dependency {Ssn, Pnumber}-->Ename is partial because Ssn--
>Ename holds.
Definition. A relation schema R is in 2NF if every nonprime attribute A in R is fully
functionally dependent on the primary key of R.
Example
The EMP_PROJ relation in Figure is in 1NF but is not in 2NF. The nonprime attribute Ename
7
DB DESIGN
violates 2NF because of FD2, as do the nonprime attributes Pname and Plocation because of
FD3. The functional dependencies FD2 and FD3 make Ename, Pname, and Plocation partially
dependenton the primary key {Ssn, Pnumber} of EMP_PROJ, thus violating the 2NF test.
Third Normal Form (3NF)
Third normal form (3NF) is based on the concept of transitive dependency. A functional
pendency X-->Y in a relation schema R is a transitive dependency if there exists a set of
attributes Z in R that isneither a candidate key nor a subset of any key of R,and both X-->Z and
Z-->Y hold.
➢ A relation schema R is in 3NF if it satisfies 2NF and no nonprime attribute of R is
transitivelydependent on the primary key.
Definition;
A relation schema R is in third normal form (3NF) if, whenever a nontrivial functional
dependency
X A holds in R,
either (a) X is a superkey of R, or (b) A is a prime attribute of R..
Example:
The dependency Ssn->Dmgr_ssn is transitive through Dnumber in EMP_DEPT in Figure,
becauseboth the dependencies Ssn → Dnumber and
8
DB DESIGN
Dnumber → Dmgr_ssn hold and Dnumber is neither a key itself nor a subset of the
key ofEMP_DEPT.
9
• Dnumber → Dmgr_ssn in this FD Dnumber is not a superkey of
EMP_DEPT and Dmgr_ssnis not prime attribute .so it violates the rules of
3NF so we decompose it in order to normalize.
Boyce-Codd Normal Form (BCNF)
Boyce-Codd normal form (BCNF) was proposed as a simpler form of 3NF, but it
was found to be stricter than 3NF. That is, every relation in BCNF is also in 3NF;
however, a relation in 3NF is not necessarily in BCNF.
Definition. A relation schema R is in BCNF if whenever a nontrivial functional
dependency X-->A
holds in R, then X is a superkey of R.
Example:
According to the definition of 3NF the above state of the relationcLOTS1A is in
a good normal form. Now we add one more dependency to this state shown in
figure.
➢ Area Country-name new dependency added to LOTS1A.
LOT
Now it violates the properties of BCNF..ie in this FD5 { Area Country-name }
according to thedefinition of BCNF area must be primary [Link] it is not [Link]
we decompose it to normalize
Fourth Normal Form (4NF)
Multivalued Dependency:
Definition. A multivalued dependency X-->>Y specified on relation schema R,
where X and Y are both subsets of R, specifies the following constraint on any
relation state r of R: If two tuples t1 andt2 exist in r such that t1[X] = t2[X], then
two tuples t3 and t4 should also exist in r with the followingproperties, where we use
Z to denote (R – (X 𝖴 Y)).
■ t3[X] = t4[X] = t1[X] = t2[X].
■ t3[Y] = t1[Y] and t4[Y] = t2[Y].
■ t3[Z] = t2[Z] and t4[Z] = t1[Z].
Definition. A relation schema R is in 4NF with respect to a set of dependencies F
(that includes functional dependencies and multivalued dependencies) if, for every
nontrivial multivalued dependency X →→ Y in F+. X is a superkey for R.
In the EMP relation of Figure , the values ‘X’ and ‘Y’ of Pname are repeated with
each value of Dname (or, by symmetry, the values ‘John’ and ‘Anna’of Dname are
repeated with each value of Pname). This redundancy is clearly
[Link], the EMP schema is in BCNF because no functional
dependencies hold in EMP. Therefore, we need to define a fourth normal form that
is stronger than BCNF and disallows relation schemas such as EMP. Notice that
relations containing nontrivial MVDs tend to be all-key relations—that is, their key
is all their attributes taken together. Furthermore, it is rare that such all-key relations
with a combinatorial occurrence of repeated values would be designed in practice.
However, recognition of MVDs as a potential problematic dependencyis essential in
relational design.
Join Dependencies and Fifth Normal Form
In our discussion so far, we have pointed out the problematic functional dependencies
and showed how they were eliminated by a process of repeated binary decomposition
to remove them during the process of normalization to achieve 1NF, 2NF, 3NF and
BCNF. These binary decompositions must obey the NJB property from Section
16.2.4 that we referenced while discussing the decomposition to achieve BCNF.
Achieving 4NF typically involves eliminating MVDs by repeated binary
decompositions as well. However, in some cases there may be no non additive join
decomposition of R into two relation schemas, but there may be a non additive join
decomposition into more than two relation schemas. Moreover, there may be no
functional dependency in R that violates any normal form up to BCNF, and there
may be no nontrivial MVD present in R either that violates [Link] then resort to
another dependency called the join dependency and, if it is present, carry out a multi
way decomposition into fifth normal form (5NF). It is important to note that such a
dependency is a very peculiar semantic constraint that is very difficult to detect in
practice; therefore, normalization into 5NF is very rarely done in practice.
For an example of a JD, consider once again the SUPPLY all-key relation in Figure.
Suppose that the following additional constraint always holds:Whenever a supplier s
supplies part p, and a project j uses part p, and the supplier s supplies at least one
part to project j, then supplier s will also be
4
1
supplying part p to project j. This constraint can be restated in other ways and
specifies a joindependency JD(R1, R2, R3) among the three projections R1(Sname,
Part_name), R2(Sname, Proj_name),and R3(Part_name, Proj_name) of SUPPLY. If
this constraint holds, the tuples below the dashed line in Figure must exist in any
legal state of the SUPPLY relation that also contains thetuples above the dashed
line. Figure shows how the SUPPLY relation with the join dependency is
decomposed into three relations R1,R2, and R3 that are each in 5NF. Notice that
applying a naturaljoin to any two of these relations produces spurious tuples, but
applying a natural join to all threetogether does not. The reader should verify this
on the sample relation in Figure and its projections inFigure. This is because only
the JD exists, but no MVDs are specified. Notice, too, that the JD(R1,R2, R3) is
specified on all legal relation states, not just on the one shown in Figure.
a. Lossless-Join Decomposition
First present a criterion for determining whether a decomposition is lossy. Let R be
a relation schema,and let F be a set of functional dependencies on R. Let R1 and R2
form a decomposition of R. This decomposition is a lossless-join decomposition of
R if at least one of the following functional dependencies is in F+:
In other words, if R1 ∩ R2 forms a superkey of either R1 or R2, the decomposition
of R is a lossless- join decomposition. We can use attribute closure to efficiently test
for superkeys, as we have seen earlier. We now demonstrate that our decomposition
of Lending-schema is a lossless-join decomposition by showing a sequence of steps
that generate the decomposition.
We begin by decomposing Lending-schema into two schemas:
Branch-schema = (branch-name, branch-city, assets)
Loan-info-schema = (branch-name, customer-name, loan-
number, amount)
• Since branch-name → branch-city assets, the augmentation rule for
functional dependenciesimplies that branch-name → branch-name branch-
city assets
• Since Branch-schema ∩ Loan-info-schema = {branch-name}, it follows
that our initialdecomposition is a lossless-join decomposition.
Next, we decompose Loan-info-schema into
Loan-schema = (loan-number, branch-name, amount)Borrower-
schema = (customer-name, loan-number)
This step results in a lossless-join decomposition, since loan-number is a common
attribute and loan- number → amount branch-name. For the general case of
decomposition of a relation into multiple parts at once, the test for lossless join
decomposition is more complicated. See the bibliographical notes for references on
the topic. While the test for binary decomposition is clearly a sufficient condition for
lossless join, it is a necessary condition only if all constraints are functional
dependencies. We shall see other types of constraints later (in particular, a type of
constraint called multivalued dependencies), that can ensure that a decomposition is
lossless join even if no functional dependencies are present.
b. Dependency Preservation
There is another goal in relational-database design: dependency preservation.
When an update ismade to the database, the system should be able to check that the
update will not create an illegal relation—that is, one that does not satisfy all the
given functional dependencies. If we are to check updates efficiently, we should
design relational-database schemas that allow update validation without the
computation of joins.
To decide whether joins must be computed to check an update, we need to determine
what functional dependencies can be tested by checking each relation individually.
Let F be a set of functional dependencies on a schema R, and let R1, R2, . . . , Rn be
a decomposition of R. The restriction of F to Ri is the set Fi of all functional
dependencies in F+ that include only attributes of Ri. Since all functional
dependencies in a restriction involve attributes of only one relation schema, it is
possible totest such a dependency for satisfaction by checking only one relation. Note
that the definition of restriction uses all dependencies in F+, not just those in F.
For instance, suppose F = {A → B, B → C}, and we have a decomposition into AC
and AB. The restriction of F to AC is then A → C, since A → C is in F+, even though
it is not in F.
The set of restrictions F1, F2, . . . , Fn is the set of dependencies that can be checked
efficiently. We now must ask whether testing only the restrictions is sufficient. Let
F_ = F1 𝖴 F2 𝖴 · · · 𝖴 Fn. F_ is a set of functional dependencies on schema R, but,in
general, F_ _= F. However, even if F_ _= F, it may be that F_+ = F+. If the latter is
true, then every dependency in F is logically implied by F_, and,if we verify that F_
is satisfied, we have verified that F is satisfied. We say that a decomposition having
the property F_+ = F+ is a dependency-preserving decomposition.
TRIGGERS:
• A trigger is a special type of stored procedure that automatically runs when an event
occurs in the database server.
• DML triggers run when a user tries to modify data through a data manipulation
language (DML) event. DML events are INSERT, UPDATE, or DELETE
statements on a table or view.
• A trigger is a named PL/SQL block stored in the Oracle Database and executed
automatically when a triggering event takes place.
• The event can be any of the following:
o A data manipulation language (DML) statement executed against a table
e.g., INSERT, UPDATE, or DELETE. For example, if you define a trigger
that fires before an INSERT statement on the customers table, the trigger
will fire once before a new row is inserted into the customers table.
o A data definition language (DDL) statement executes e.g., CREATE or
ALTER statement. These triggers are often used for auditing purposes to
record changes of the schema.
o A system event such as startup or shutdown of the Oracle Database.
o A user event such as login or logout.
• The act of executing a trigger is also known as firing a trigger. We say that the
trigger is fired.
Uses of Triggers:
Triggers are useful in many cases such as the following:
• Enforcing complex business rules that cannot be established using integrity
constraint such as UNIQUE, NOT NULL, and CHECK.
• Preventing invalid transactions.
• Gathering statistical information on table accesses.
• Generating value automatically for derived columns.
• Auditing sensitive data.
To create a new trigger, use the following CREATE TRIGGER statement:
CREATE [OR REPLACE] TRIGGER trigger_name
{BEFORE | AFTER } triggering_event ON table_name
[FOR EACH ROW]
[FOLLOWS | PRECEDES another_trigger]
[ENABLE / DISABLE ]
[WHEN condition]
DECLARE
declaration statements
BEGIN
executable statements
EXCEPTION
exception_handling statements
END;
The above trigger has two parts:
1. Trigger header
2. Trigger body
Eg:
CREATE OR REPLACE TRIGGER customers_audit_trg
AFTER UPDATE OR DELETE ON customers
FOR EACH ROW
DECLARE
l_transaction VARCHAR2(10);
BEGIN
-- determine the transaction type
l_transaction := CASE
WHEN UPDATING THEN 'UPDATE'
WHEN DELETING THEN 'DELETE'
END;
-- insert a row into the audit table
INSERT INTO audits (table_name, transaction_name, by_user, transaction_date)
VALUES('CUSTOMERS', l_transaction, USER, SYSDATE);
END;
/
The following statement updates the credit limit of the customer 10 to 2000.
UPDATE customers
SET credit_limit = 2000
WHERE customer_id =10;
Now, check the contents of the table audits to see if the trigger was fired:
SELECT * FROM audits;
Output is,
This DELETE statement deletes a row from the customers table.
DELETE FROM customers
WHERE customer_id = 10;
And view the data of the audits table:
SELECT * FROM audits;
Output is,
CURSORS:
• Oracle creates a memory area, known as the context area, for processing an SQL
statement, which contains all the information needed for processing the statement;
for example, the number of rows processed, etc.
• A cursor is a pointer to this context area. PL/SQL controls the context area through
a cursor.
• A cursor is a pointer that points to a result of a query.
• PL/SQL has two types of cursors:
o Implicit cursors
▪ Whenever Oracle executes an SQL statement such as SELECT
INTO, INSERT, UPDATE, and DELETE, it automatically creates
an implicit cursor.
▪ Oracle internally manages the whole execution cycle of implicit
cursors and reveals only the cursor’s information and statuses such
as SQL%ROWCOUNT, SQL%ISOPEN, SQL%FOUND, and
SQL%NOTFOUND.
▪ The implicit cursor is not elegant when the query returns zero or
multiple rows which cause NO_DATA_FOUND or
TOO_MANY_ROWS exception respectively.
o Explicit cursors
▪ An explicit cursor is an SELECT statement declared explicitly in the
declaration section of the current block or a package specification.
▪ For an explicit cursor, you have control over its execution cycle
from OPEN, FETCH, and CLOSE.
▪ Oracle defines an execution cycle that executes an SQL statement
and associates a cursor with it.
The following illustration shows the execution cycle of an explicit
cursor:
➢ Declare a Cursor
o Before using an explicit cursor, need to declare it in the declaration section of a
block or package as follows:
CURSOR cursor_name IS query;
➢ Open a Cursor
o Before start fetching rows from the cursor, you must open it. To open a cursor, you
use the following syntax:
OPEN cursor_name;
o When opening a cursor, Oracle parses the query, binds variables, and executes the
associated SQL statement.
o Oracle also determines an execution plan, associates host variables and cursor
parameters with the placeholders in the SQL statement, determines the result set,
and sets the cursor to the first row in the result set.
➢ Fetch from a Cursor
o The FETCH statement places the contents of the current row into variables. The
syntax of FETCH statement is as follows:
FETCH cursor_name INTO variable_list;
o To retrieve all rows in a result set, you need to fetch each row till the last one.
➢ Closing a Cursor
o After fetching all rows, you need to close the cursor with the CLOSE statement:
CLOSE cursor_name;
o Closing a cursor instructs Oracle to release allocated memory at an appropriate
time.
Explicit Cursor Attributes
A cursor has four attributes to which you can reference in the following format:
cursor_name%attribute
1. %ISOPEN - attribute is TRUE if the cursor is open or FALSE if it is not
2. %FOUND - This attribute has four values:
NULL before the first fetch
TRUE if a record was fetched successfully
FALSE if no row returned
INVALID_CURSOR if the cursor is not opened
3. %NOTFOUND - This attribute has four values:
NULL before the first fetch
FALSE if a record was fetched successfully
TRUE if no row returned
INVALID_CURSOR if the cursor is not opened
4. %ROWCOUNT - The %ROWCOUNT attribute returns the number of rows fetched from
the cursor. If the cursor is not opened, this attribute returns INVALID_CURSOR.
PL/SQL Cursor Example:
The following statement creates a view that returns the sales revenues by customers:
CREATE VIEW sales AS
SELECT customer_id,
SUM(unit_price * quantity) total,
ROUND(SUM(unit_price * quantity) * 0.05) credit
FROM order_items
INNER JOIN orders USING (order_id)
WHERE status = 'Shipped'
GROUP BY customer_id;
The values of the credit column are 5% of the total sales revenues.
Suppose need to develop a anonymous block that:
o Reset credit limits of all customers to zero.
o Fetch customers sorted by sales in descending order and gives them new credit limits from a
budget of 1 million.
The following anonymous block illustrates the logic:
DECLARE
l_budget NUMBER := 1000000;
-- cursor
CURSOR c_sales IS
SELECT * FROM sales
ORDER BY total DESC;
-- record
r_sales c_sales%ROWTYPE;
BEGIN
-- reset credit limit of all customers
UPDATE customers SET credit_limit = 0;
OPEN c_sales;
LOOP
FETCH c_sales INTO r_sales;
EXIT WHEN c_sales%NOTFOUND;
-- update credit for the current customer
UPDATE customers
SET credit_limit =
CASE WHEN l_budget > r_sales.credit
THEN r_sales.credit
ELSE l_budget
END
WHERE
customer_id = r_sales.customer_id;
-- reduce the budget for credit limit
l_budget := l_budget - r_sales.credit;
DBMS_OUTPUT.PUT_LINE( 'Customer id: ' ||r_sales.customer_id || ' Credit: ' || r_sales.credit || '
Remaining Budget: ' || l_budget );
-- check the budget
EXIT WHEN l_budget <= 0;
END LOOP;
CLOSE c_sales;
END;
/