Database Design Guidelines and Anomalies
Database Design Guidelines and Anomalies
EMPLOYEE DEPARTMENT
SSN NAME ADDRESS BDATE DNUMBER
111 John 1 DNAME DNUMBER MGR_SSN
222 Ram 1 Research 1 101
333 Sita 2 Admin 2 104
444 Kishan 2 Headqtrs 3 105
555 Mary 3
Figure A
EMP_DEPT
SSN NAME ADDRESS DNUMBER DNAME MGR_SSN
111 John 1 Research 101
222 Ram 1 Research 101
333 Sita 2 Admin 104
444 Kishan 2 Admin 104
555 Mary 3 Headqtrs 105
Figure B
In Figure A department’s information appears only once in the DEPARTMENT relation. Only the Dnumber
is repeated in the EMPLOYEE relation for each employee who works in that department as a foreign key.
Another serious problem with using the relation in Figure B as Base relation is Update Anomalies.
Explain the different update Anomalies of tables? (5marks)
Update anomalies can be classified into:
❖ Insertion anomalies
❖ Deletion anomalies
❖ Modification anomalies
Insertion anomalies
Insertion anomalies can be differentiated into two types :
1. To insert a new employee into EMP_DEPT we must include either the attribute values for the
department that the employee works for, or null.
o Example: To insert a new tuple for an employee who works in department number 5, the
attribute values of department 5 should be entered correctly so that they are consistent with
values for department 5 in other tuples in EMP_DEPT.
o In figure A we need not worry about this because we enter only department number in
employee tuple , other attribute values of department 5 are recorded only once in the database
as a single tuple in DEPARTMENT relation.
2. It is the difficult to insert a new department that has no employees as yet in EMP_DEPT relation.
To do this place null values in the attributes of EMPLOYEE relation. This cause problem because
SSN is the primary key of EMP_DEPT, and each tuple is used to represent an employee entity.
o This problem does not occur in the design of Figure A because a department is entered in the
DEPARTMENT relation whether or not any employees work for it, and whenever an employee
is assigned to that department, a corresponding tuple is inserted in EMPLOYEE
Deletion Anomalies
o If we delete from EMP_DEPT an employee tuple that happens to represent the last employee working
for a particular department, the information concerning that department is lost from the database
o This problem does not occur in the database of Figure A because DEPARTMENT tuples are stored
separately.
Modification anomalies
o In EMP_DEPT, if we change the values of one of the attributes of a particular employee, we must update
the tuples of the all employees who works in that department; otherwise, the database will become
inconsistent.
o If we fail to update some tuples, the same department will be shown to have two different values, which
would be wrong.
Guideline 2:
o Design a base relation schema such that no insertion, deletion, or modification anomalies are present
in the relations. If any anomalies are present, note them and make sure that the programs that update
the database will operate correctly.
[Link] values in tuples
o If many of the attributes do not apply to all tuples in the relation, we end up with many NULLs in
those tuples
- this can waste space at the storage level
- may lead to problems with understanding the meaning of the attributes
- may also lead to problems with specifying JOIN operations
- how to account for them when aggregate operations such as COUNT or SUM are
Applied
o The nulls can have multiple interpretations, such as the following:
• The attribute does not apply to this tuple.
• The attribute value for this tuple is unknown.
• The value is known but absent; that is, it has not been recorded yet.
Guideline #3:
Avoid placing attributes in a base relation whose values may frequently be null and nulls are unavoidable,
make sure that they apply in exceptional cases only and do not apply to a majority of tuples in the relation.
[Link] of spurious (fake) tuples
Guideline #4: Design relation schemas so that they can be joined with equality conditions on attributes
that are either primary keys or foreign keys in a way that guarantees that no spurious tuples are generated.
As shown in figure1 we have used EMP_PROJ1 and EMP_LOCs as the base relations instead of EMP_PROJ
Figure 1
If we attempt NATURAL JOIN operation on EMP_PROJ1 and EMP_LOCS, the result produces many more tuples
than the original set of tuples in EMP_PROJ additional tuples that were not in EMP_PROJ are called spurious
tuples because they represent wrong information ―that is not valid‖
➢ Decomposing EMP_PROJ into EMP_PROJ1 and EMP_LOCS and is undesirable because, when we join them
back using natural join, we do not get the correct original information.
This means that the values of the Y component of a tuple in r depend on, or are determined by the values of the X
components.
The values of the X component of a tuples uniquely determine the values of the Y component therefore there is a
functional dependency from X to Y, or that Y is functionally dependent on X.
A FD: X→ Y is a fully functional dependency if removal of any attribute from X means that the dependency does
not hold any more; otherwise, it is a partial functional dependency.
Example:-
a) fd1:SSN→Ename
b) fd2:Pnumber → {Pname, Plocation}
c) fd3 :{SSN, Pnumber} →hours
The functional dependency specify that
(a) The value of an employee SSN uniquely determines the employee name (E name).
(b) The value of a P number uniquely determines the Pname and Plocation.
(c) A combination of SSN and Pnumber values uniquely determine the no of hours the employee currently works
on the project per week.
Normalization of Relations
Normalization is a process of analysing the given relation schemas based on their FDs and primary keys to achieve
the desirable properties .i.e.,
1. Minimizing redundancy
2. Minimizing the insertion, deletion and update anomalies
The normalization procedures provides database designer with the following approaches:
A formal frame work for analyzing relation schemas based on their keys and on the functional dependencies
among their attributes.
A serious of normal form tests that can be carried out on individual relation schemas so that the relational
database can be normalized to any desired degree.
The process of normalization through decomposition must also confirm the existence of additional properties that
the relational would include two properties.
1. The lossless join or non–additive join property which guarantees that the spurious tuples generation problem
does not occur with respect to the relation schemas created after decomposition
2. The Dependency preservation property, which ensures that each FD is represented in some individual relation
resulting after decomposition.
The database designer need not normalize to the highest possible normal form relations may be left in a lower
normalization states that the process of storing the join of higher normal form relations as a base relation is known
as de-normalization
Definition:
A super key of a relation schema R{A1,A2,…..An} is a set of attributes SR with the property that no two tuples
t1 and t2 in any legal relation state r of R will have t1[S]=t2[S]. A key K is a super key with the additional property
that removal of any attributes from K will cause K not be a super key any more.
The difference between a key and a superkey is that a key has to be minimal; that is, if we have a key K = {A1,
A2, ... , Ak} of R, then K - {Ai} 1≤ i≤ k is not a key of R.
For example {SSN} is a key for EMPLOYEE, whereas {SSN}, {SSN, ENAME}, {SSN, ENAME, BDATE},
and any set of attributes that includes SSN are all superkeys.
If a relation schema has more than one key, each is called a candidate key. One of the candidate keys is arbitrarily
designated to be the primary key, and the others are called secondary keys or alternate keys.
Definition:
An attribute of relation schema R is called a prime attribute of R if it is a member of some candidate key of R.
An attribute is called nonprime if it is not a prime attribute-that is, if it is not a member of any candidate key.
Fig 5.4 Normalization into 1NF. (a)A relation schema that is not in 1NF. (b) Example state of relation
DEPARTMENT. (c) 1NF version of same relation with redundancy.
Each department can have any number of locations, which is it does not satisfy the 1NF because DLOCATIONS
is not an atomic attribute.
These are the three main techniques to achieve first normal form for such a relation:-
1. Remove the attributes DLOCATIONS that violates 1NF and place it in a separate relation DEPT_LOCATION
along with the primary key DNUMBER of DEPARTMENT. The primary key of DEPT_LOCATION is the
combination {DNUMBER, DLOCATIONS}.
Expand the key so that there will be a separate tuple in the original DEPARTMENT relation for each location of a
DEPARTMENT. The primary key becomes the combination {DNUMBER, DLOCATION} is as shown in figure
5.4(c). This solution has the disadvantage of introducing redundancy in the relation.
2. If a maximum numbers of values are known for the attribute-for example, if it is known that at most three
locations can exist for a department-replace the DLOCATIONS attribute by three atomic attributes:
DLOCATIONl, DLOCATION2, and DLOCATION3. This solution has the disadvantage of introducing null
values if most departments have fewer than three locations.
Among the three solutions above, the first is generally considered best because it does not suffer from redundancy.
In Figure 5.6, {SSN, PNUMBER} → HOURS is a full dependency because HOURS is dependent on primary key
{SSN, PNUMBER}. However, SSN → ENAME and PNUMBER→PNAME dependency is partial because
ENAME and PNAME are dependent on part of the primary key i.e SSN and PNUMBER.
➢ The nonprime attribute ENAME violates 2NF because of FD2, as do the nonprime attributes PNAME and
PLOCATION because of FD3.
➢ The functional dependencies FD2 and FD3 make ENAME, PNAME, and PLOCATION partially
dependent on the primary key {SSN, PNUMBER} of EMP_PROJ, thus violating the 2NF.
➢ The functional dependencies FDI, FD2, and FD3 in Figure 5.6 hence lead to the decomposition of
EMP_PROJ into the three relation schemas EPl, EP2, and EP3 shown in Figure 5.6, each of which is in
2NF.
➢ Example: The dependency SSN → DMGRSSN is transitive through DNUMBER in relation EMP_DEPT
of Figure 5.7 because both the dependencies SSN →DNUMBER and DNUMBER → DMGRSSN hold
and DNUMBER is neither a key itself nor a subset of the key of EMP_DEPT.
➢ However, EMP_DEPT is not in 3NF because of the transitive dependency of DMGRSSN and also
DNAME on SSN via DNUMBER.
➢ We can normalize EMP_DEPT by decomposing it into the two 3NFrelation schemas EDl and ED2 shown
in Figure 5.7. A NATURAL JOIN operation on EDI and ED2 will recover the original relation EMP_DEPT
without generating spurious tuples.
Based on the two candidate keys PROPERTY_ID# and {COUNTY_NAME, LOT#}, the functional dependencies
FD1 and FD2 of Figure 5.8(a) satisfies 2NF.
FIGURE 5.8 Normalization into 2NF . (a) The LOTS relation with its functional dependencies FDl through FD4. (b)
Decomposing into the 2NF relations LOTSl and LOTS2
The LOTS relation schema FD3 violates 2NF because TAX_RATE is partially dependent on the candidate key
{COUNTY_NAME, LOT#}. To normalize LOTS into 2NF, we decompose it into the two relations LOTSl and
LOTS2, shown in Figure 5.8(b).
Construct LOTS1 by removing the attribute TAX_RATE that violates 2NF from LOTS and placing it with
COUNTYNAME into another relation LOTS2. Both LOTSl and LOTS2 are in 2NF. Notice that FD4 does not
violate 2NF and is carried over to LOTSl.
FIGURE 5.9 Normalization into 3NF (c) Decomposing LOTSl into the 3NF relations LOTSIA and LOTSIB.
(d) Summary of the progressive normalization of LOTS.
As shown in figure5.9 (b) LOTS2 is in 3NF. However, FD4 in LOTSl violates 3NF because AREA is not a
superkey and PRICE is not a prime attribute in LOTSl. To normalize LOTSl into 3NF, we decompose it into the
relation schemas LOTSlA and LOTSlB shown in Figure 5.9(c).
We construct LOTSlA by removing the attribute PRICE that violates 3NF from LOTSl and placing it with AREA
(the left-hand side of FD4 that causes the transitive dependency) into another relation LOTSlB. Both LOTSlA and
LOTSlB are in 3NF.
FIGURE 5.10 Boyce-Codd normal form. (a) BCNF normalization of LOTS1A with the functional dependency FD2 being
lost in the decomposition. (b) A schematic relation with FDS; it is in 3NF, but not in BCNF.
As shown in figure 5.10(a) FD5 violates BCNF in LOTSIA because AREA is not a super key of LOTSlA. But
FD5 satisfies 3NF in LOTSIA because COUNTY_NAME is a prime attribute (condition (b)), but this condition
does not exist in the definition of BCNF.
We can decompose LOTSIA into two BCNF relations LOTSlAX and LOTSlAY as shown in Figure 5.10(a).
➢ A tuple in this EMP relation represents the fact that an employee whose name is Ename works on the
project whose name is Pname and has a dependent whose name is Dname.
➢ An employee may work on several projects and may have several dependents and the employee’s projects
and dependents are independent of one another.
➢ To keep the relation state consistent and to avoid any spurious relationship between the two independent
attributes, we must have a separate tuple to represent every combination of an employee’s dependent and
an employee’s project.
➢ In the relation state shown in EMP the employee with Ename Smith works on two projects ‘X’ and ‘Y’
and has two dependents ‘John’ and ‘Anna’, and therefore there are four tuples to represent these facts
together.
➢ The relation EMP is an all-key relation (with key made up of all attributes) and therefore has no f.d.’s and
as such qualifies to be a BCNF relation.
➢ There is an redundancy in the relation EMP—the dependent information is repeated for every project and
the project information is repeated for every dependent.
➢ To address this situation, the concept of multivalued dependency(MVD) was proposed and based on this
dependency, the fourth normal form was defined.
➢ Multivalued dependencies are a consequence of 1NF which disallows an attribute in a tuple to have a set
of values, and the accompanying process of converting an unnormalized relation into 1NF.
➢ Informally, whenever two independent 1:N relationships are mixed in the same relation, R(A, B, C), an
MVD may arise.
➢ We now present the definition of fourth normal form (4NF), which is violated when a relation has
undesirable multivalued dependencies, and hence can be used to identify and decompose such relations
Definition. A relation schema R is in 4NF with respect to a set of dependencies F (that includes functional
dependencies and multivalued dependencies) if, for every nontrivial multivalued dependency X →→ Y in F+, X
is a superkey for R.
The process of normalizing a relation involving the nontrivial MVDs that is not in 4NF consists of decomposing it
so that each MVD is represented by a separate relation where it becomes a trivial MVD.
CHAPTER 6: SQL
CREATE command
An SQL schema is identified by a schema name, and includes an authorization identifier to indicate the user
or account who owns the schema, as well as descriptors for each element in
the schema.
Schema elements include tables, constraints, views, domains, and other constructs that describe the schema.
A schema is created via the CREATE SCHEMA statement, which can include all the schema elements
definitions.
For example, the following statement creates a schema called COMPANY, owned by the user with
authorization identifier ‘MKUMAR’.
CREATE SCHEMA COMPANY AUTHORIZATION ‘MKUMAR’;
CREATE TABLE Command:
• The CREATE TABLE command is used to specify a new relation by giving it a name and specifying its
attributes and initial constraints. The attributes are specified first, and each attribute is given a name, a data
type to specify its domain of values, and any attribute constraints, such as NOT NULL.
• Alternatively, we can explicitly attach the schema name to the relation name, separated by a period. For
example,
CREATE TABLE [Link] ...
rather than
CREATE TABLE EMPLOYEE
( Fname VARCHAR(15) NOT NULL,
Minit CHAR,
Lname VARCHAR(15) NOT NULL,
Ssn CHAR(9) NOT NULL,
Bdate DATE,
Address VARCHAR(30),
Sex CHAR,
Salary DECIMAL(10,2),
Super_ssn CHAR(9),
Dno INT NOT NULL,
PRIMARY KEY (Ssn),
FOREIGN KEY (Super_ssn) REFERENCES EMPLOYEE(Ssn),
FOREIGN KEY (Dno) REFERENCES DEPARTMENT(Dnumber) );
The relations declared through CREATE TABLE statements are called base tables (or base relations); this
means that the relation and its tuples are actually created and stored as a file by the DBMS.
➢ constraint can be given a constraint name, followed by a keyword CONSTRAINT. The names of all constraints
within a particular schema must be unique.
➢ Example Create table Employee
(Dno int NOT NULL DEFAULT 1,
CONSTRAINT EMPPK primary key(SSN),
.....) ;
4. Specifying Constraints on Tuples Using CHECK
➢ Table constraints can be specified through additional CHECK clauses at the end of a CREATE TABLE statement.
These can be called tuple-based constraints because they apply to each tuple individually and are checked whenever
a tuple is inserted or modified.
➢ For example, suppose that the DEPARTMENT table had an additional attribute Dept_create_date, which stores the
date when the department was created. Then we could add the following CHECK clause at the end of the CREATE
TABLE statement for the DEPARTMENT table to make sure that a manager’s start date is later than the department
creation date.
CHECK (Dept_create_date <= Mgr_start_date);
.
Explain different schema change statements in SQL with examples?
The schema change statement in SQL are
1. The DROP command
2. The ALTER command
[Link] command- can be used to drop named schema elements such as tables,domains or constraints. One can also
drop a schema.
➢ There are two drop behaviour options- CASCADE and RESTRICT.
CASCADE- For example, to remove the COMPANY database schema and all its tables,domains and other
elements, the CASCADE option is used as follows
DROP SCHEMA COMPANY CASCADE ;
RESTRICT- If the RESTRICT option is chosen in place of CASCADE, the schema is dropped only if it has no
elements in it.
➢ If a base relation within a schema is not needed any longer,the relation and its definition can be deleted by using the
DROP TABLE command.
For example,if we no longer wish to keep track of dependents of employees in the COMPANY database then we
can get rid of the DEPENDENT relation by using the command.
DROP TABLE DEPENDENT CASCADE ;
➢ If the RESTRICT option is chosen instead of CASCADE, a table is dropped only if it is not referenced in any other
relation(as foreign key).
➢ With the CASCADE option, all constraints and views that reference the table are dropped automatically from the
schema, along with the table itself.
[Link] command- The definition of a base table or of other named schema elements can be changed by using the
ALTER command.
➢ ALTER table actions include
• Adding or dropping a column
• Changing a column definition
• Adding or dropping table constraints
➢ Adding or dropping column- To add an attribute Job to EMPLOYEE relation in the COMPANY schema, we can
use the command
ALTER TABLE [Link] ADD COLUMN JOB VARCHAR(12) ;
• To drop a column(attribute), we must choose either CASCADE or RESTRICT for drop behavior.
• If CASCADE is chosen all constraints and views that reference the column are dropped automatically from the
schema along with the column.
• If RESTRICT is chosen, the command is successful only if no views or constraints reference the column.
• For example, the following command removes the attribute ADDRESS from the EMPLOYEE base table.
ALTER TABLE [Link] DROP COLUMN Address CASCADE;
➢ Change column definition:
• It is also possible to alter a column definition by dropping an existing default clause or by defining a new default
clause. The following examples illustrate this clause:
ALTER TABLE [Link] ALTER COLUMN Mgr_ssn DROP DEFAULT;
➢ DISTINCT: To eliminate duplicate tuples from the result of an SQL query, we use the keyword DISTINCT in the
SELECT clause, meaning that only distinct tuples should remain in the result.
➢ Example : Retrieve all distinct salary values
Query: SELECT DISTINCT Salary
FROM EMPLOYEE;
➢ SQL has directly incorporated some of the set operations of relational algebra. There are set union(UNION), set
difference( EXCEPT) and set intersection( INTERSECT) operations.
4. Substring Pattern Matching and Arithmetic Operators
➢ LIKE OPERATOR:This can be used for string pattern matching. Partial strings are specified using two reserved
characters:
• % replaces an arbitrary number of zero or more characters,
• underscore (_) replaces a single character.
➢ Example : Retrieve all employees whose address is in mangalore
Query: SELECT Fname,Lname
FROM EMPLOYEE
WHERE Address LIKE ’%Mangalore%’;
➢ In SQL query,the standard arithmetic operators for addition (+), subtraction (–), multiplication (*), and division (/)
can be applied to numeric values or attributes with numeric domains.
➢ Example: List employees with their salaries if 10% rise is given for all.
Query: SELECT fname,Lname,1.1* salary as “NEW SALARY”
FROM EMPLOYEE;
➢ BETWEEN operator: selects values within a given range. The values can be numbers,text or dates.
➢ Syntax: SELECT <column_name(s)>
FROM <table_name>
WHERE column_name BETWEEN value1 AND value2;
➢ Example: Retrieve all employees in department 5 whose salary is between 30000 and 40000
Query: SELECT fname,lname
FROM EMPLOYEE
WHERE (salary BETWEEN 30000 AND 40000) AND Dno=5;
5. Ordering of Query Results:
➢ SQL allows the user to order the tuples in the result of a query by the values of one or more of the attributes that
appear in the query result, by using the ORDER BYclause.
➢ Example: Display name of employees in ascending order on Fname.
Query: SELECT fname,lname
FROM EMPLOYEE
ORDER BY Fname;
➢ The default order is in ascending order. We can specify keyword DESC if we want to see the result in a descending
order of values. The keyword ASC can be used to specify ascending order explicitly.
➢ For example, if we want descending alphabetical order on Dname and ascending order on Lname, Fname, then the
ORDER BY clause for retrieving a list of employees and the projects they are working on,can be written as
SELECT Dname, Lname, Fname, .Pname
FROM DEPARTMENT , EMPLOYEE , WORKS_ON ,PROJECT
WHERE Dnumber= Dno AND Ssn= Essn AND Pno= Pnumber
ORDER BY Dname DESC, Lname ASC, Fname ASC;
❖ Specify the relation name and a list of values for the tuple. The values should be listed in the
same order in which the corresponding attributes were specified in the CREATE TABLE
command.
For example, to add a new tuple to the STUDENT relation
INSERT INTO STUDENT VALUES(101,’RAM’,600);
❖ A second form of the INSERT statement allows the user to specify explicit attribute names that
correspond to the values provided in the INSERT command. This is useful if a relation has many
attributes but only a few of those attributes are assigned values in the new tuple. However, the
values must include all attributes with NOT NULL specification and no default value. Attributes
with NULL allowed or DEFAULT values are the ones that can be left out.
For example, to enter a tuple for a newSTUDENT for whom we know only the ROLLNO,NAME
then INSERT INTO STUDENT (ROLLNO,NAME)VALUES (‘103’, ‘Marini’);
It is also possible to insert into a relation multiple tuples separated by commas in a single
INSERT command. The attribute values forming each tuple are enclosed in parentheses.
❖ A DBMS that fully implements SQL should support and enforce all the integrity constraints that
can be specified in the DDL. For example, if we issue the command in U2 on the database the
DBMS should reject the operation because no STUDENT tuple exists in the database with
Dnumber = 2.
QUESTIONS
1. What is a Normalization? Explain the 1NF, 2NF & 3NF with examples
2. Explain informal design guidelines for relational schema design.
3. Explain the types of update anomalies in SQL with an example
4. Write the syntax of INSERT ,DELETE and UPDATE statements in SQL and explain with suitable
examples.
5. Illustrate the following with suitable examples
a. Datatypes in SQL
b. Substring Pattern matching in SQL.(REFER THE NOTES FOR EXAMPLE)
6. Explain the basic datatypes available for attributes in SQL.
7. Demonstrate the following constraints in SQL with suitable examples.
[Link] NULL [Link] Key [Link] key d. DEFAULT e. CHECK
8. Explain different schema change statements in SQL with examples?