0% found this document useful (0 votes)
14 views53 pages

Understanding the Relational Model

Uploaded by

grew.extent1j
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views53 pages

Understanding the Relational Model

Uploaded by

grew.extent1j
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Relational Model

CSC 343 H5S: Winter 2024


Michael Liut ([Link]@[Link]) – Course Coordinator
Arnold Rosenbloom (arnold@[Link])
Ahmed Ashraf ([Link]@[Link])

D E PA RT M E N T O F M AT H E M AT I C A L A N D CO M P U TAT I O N A L S C I E N C ES
U N I V E RS I T Y O F TO RO N TO M I S S I S S AU G A
Describing Data: Data Models
A data model is a collection of concepts for describing data.
A schema is a description of a particular collection of data, using a given data
model.

The relational model of data is the most widely used today.


◦ Main concept: relation, basically a table with rows and columns.
◦ Use tables to represent data and relationships.
◦ Every relation has a schema, which describes the columns, or attributes.

2
Relational Model
Proposed by Edgar. F. Codd in 1970 as a data model which strongly supports
data independence.
Made available in commercial DBMSs in 1981.
◦ It is not easy to implement data independence efficiently and reliably!

It is based on (a variant of) the mathematical notation of relation.


Relations are represented as tables.

3
Relational Database: Definitions
Relational Database: a set of relations. # of Rows = cardinality
# of Columns = degree/arity
Relation: made of two parts
1. Instance à a table with rows and columns.
2. Schema à specifies name of a relation, and, name and type of each column.

To make it easier, think of a relation as a set of rows and tuples


◦ i.e. all rows are distinct!

rows, not e.g. Students(sid: integer,


columns!!! name: string, gpa: real)

4
A Relation is a Table
Attribute Names
Relation Name
STUDENTS

SID Name GPA

5551234 John Smith 3.4


Tuples
(Records)
7771234 Jessica Jones 3.7

The set of the permitted values for an attribute is called the attribute of domain.
e.g. domain(SID) = {5551234, 7771234}.

5
A Relation is a Table
Attribute Names
Relation Name
STUDENTS

SID Name GPA

5551234 John Smith 3.4


Tuples
(Records)
7771234 Jessica Jones 3.7

Cardinality = 2 Degree = 3 Note: all rows are distinct!

6
Question?
Do all entries in a column, in a relation instance, have to be distinct?

Pause and think about this for a


moment. Discuss it with a peer.
NO! Unless we rule the column to be “unique”
columns do not need to be distinct.
E.g., multiple students can have the same first
name or the same birth date.

7
Relational Data Model
Relation Schema à relation name and attribute list.
◦ Optionally: types of attributes. For example:
◦ Students (id, name)
◦ Students (id: string, name: string)

Relation à set of tuples conforming to a schema.


◦ e.g. {(5551234, John Smith), (7771234, Jessica Jones)…}

Database à set of relations.

Database Schema à set of all relation schemas in the database.

8
Clarification of Terminology
1. Relation is a Table.

2. Attribute is a column.

3. Tuple is a row.

4. Arity/Degree of a relation is the number of attributes (columns).

5. Cardinality of a relation is the number of tuples (rows).

9
Why Relations?
Very simple model.

Often matches how we think about data

Abstract model that underlies SQL, the most important


database language!

10
Relations are Unordered
The order of tuples is irrelevant. Tuples may be stored in any arbitrary order.
e.g. instructor relation with unordered tuples.

ID name dept_name salary


12345 Einstein Physics 95000
45411 Turing Comp Sci 72500
35521 Mozart Music 45000
98002 Dexter Biology 67200
45556 Aristotle Physics 86000
72331 Khan History 55500
42399 Taylor Mathematics 78800
53440 Bond Chemistry 110000

11
Database
Information about enterprise is broken up into parts: The need for NULL values.
instructor e.g. represent a student with no
student instructor
advisor

Bad design:
univ (instructor_ID, dept_name, salary, student_ID, …)

Normalization Theory deals with how to design “good” relational schemas.

Repetition of information is bad!


e.g. two students have the same instructor

12
Relational Query Languages
A major strength of the relational model: supports simple, powerful querying of data.

Queries can be written intuitively, and the DBMS is responsible for efficient evaluation.

◦ The key: precise semantics for relational queries.

◦ Allows the optimizer to extensively re-order operations, and still ensure that the
answer does not change.

13
The SQL Query Language
Developed by IBM (system R) in the 1970s.

There was a need for a standard since it was used by many vendors.
Standards:
◦ SQL–86
◦ SQL–89 (minor revisions)
◦ SQL–92 (major revisions)
◦ SQL–99 (major extensions)
◦ …
◦ SQL-2011 (major extensions, specifically in temporal databases)
◦ SQL:2016 (8th revision of the standard, a lot of JSON functionality)

14
Temporal Database
Efficiently and effectively store data relating to instances of time.

Timestamps are stored in RDBMSs, but are inefficient.

Two major methods (attributes) of tracking time:


1. Transaction Time
2. Valid Time

15
Temporal Database
1. Transaction Time
◦ The time period during which a row is committed to or recorded in the database.

2. Valid Time
◦ The time period during which a row is regarded as correctly reflecting reality by
the user of the database.

The combination of the attributes ‘Transaction


Time’ and ‘Valid Time’ forms Bitemporal Time

16
Transaction Time Example

ID name dept_name transaction_from transaction_to


12345 Einstein Physics January 2000 ∞
45411 Turing Comp Sci September 1986 August 2000
35521 Mozart Music October 1999 ∞
98002 Dexter Biology May 2018 ∞
45556 Aristotle Physics January 2018 ∞

17
Valid Time Example

ID name dept_name valid-to


12345 Einstein Physics ∞
45411 Turing Comp Sci August 2000
35521 Mozart Music ∞
98002 Dexter Biology August 2018
45556 Aristotle Physics December 2018

18
Database Schemas in SQL
SQL is primarily a query language.
◦ Used for retrieving information from a database.

SQL also includes data-definition component for describing schemas.

19
Levels of Abstraction
View 1 View 2 View 3
Many views, single conceptual (logical)
schema and physical schema.
◦ Views describe how users see the data. Conceptual Schema

◦ Conceptual schema defines logical structure.


◦ Physical schema describes the files and Physical Schema
indexes used.

v Schemas are defined using DDL (data definition language); DB


v Data is modified/queried using DML (data manipulation language).

20
Example: University Database
Conceptual Data:
◦ Students(sid:string, name:string, login:string, age:integer, gpa:real)
◦ Courses(cid:string, cname:string, credits:integer)
◦ Enrolled (sid:string, cid:string, grade:string)
Physical Schema:
◦ Relations stored as unordered files.
◦ Index on first column of Students.

External Schema (i.e. View):


◦ Course_info(cid:string, enrollment:integer)

21
Integrity Constraints
An integrity constraint is a property that must be satisfied by all
meaningful database instances.
A constraint can be seen as a predicate.
◦ A database is legal if it satisfies all integrity constraints.
Types of constraints:
1. Intra-relational constraints
◦ e.g. domain constraints and tuple constraints.

2. Inter-relational constraints
◦ The most common is referential constraint.

22
Rationale For Integrity Constraints
Useful for describing the application in greater detail.
Contributes to data quality (avoids data entry errors).
An element in the design process; normal forms (discussed later).
Used by the system in choosing a strategy for query processing

Note: It is not the case that all desirable properties of the data in a
database can be described by means of integrity constraints.
e.g. “data in the relation Instructor must be correct”

23
Tuple and Domain Constraints
A tuple constraint expresses conditions on the values of each
tuple, independently of other tuples.
e.g., (GPA > 3.0) OR Net = Amount – Deduction OR …

A domain constraint is a tuple constraint that involves a single


attribute.
e.g., (GPA ≤ 4.0) AND (GPA ≥ 0.0)

24
Unique Values For Tuples
RegNum Surname FirstName BirthDate Program
22335 Jones Jenny 04/12/89 History
22567 Jones Robby 17/02/91 Engineering
43754 Jones Joe 28/03/91 Computing
38267 Smith Melissa 13/08/90 Computing
17382 Smith Melissa 08/09/90 English

Registration number identifies students


i.e. there is no pair of tuples with the same value for RegNum.
Personal data could identify students as well
i.e. there is no pair of tuples with the same values for all of
Surname, FirstName, BirthDate.

25
Keys
A key is a set of attributes that uniquely identifies tuples in a relation.

More Precisely:
• A set of attributes K is a superkey for a relation R if R cannot
contain two distinct tuples t1 and t2 s.t. t1[K]= t2[K];
• K is a candidate key for R is K is a minimal superkey.
• There exists no other superkey K’ of R that is contained in K as
a proper subset.
• i.e. K’ ⊂ K

26
Example
RegNum Surname FirstName BirthDate Program
22335 Jones Jenny 04/12/89 History
22567 Jones Robby 17/02/91 Engineering
43754 Jones Joe 28/03/91 Computing
38267 Smith Melissa 13/08/90 Computing
17382 Smith Melissa 08/09/90 English

RegNum is a key!
i.e. RegNum is a superkey and contains a sole attribute, thus, it is minimal.
{Surname, FirstName, BirthDate} is another key!

27
Question?
RegNum Surname FirstName BirthDate Program
22335 Jones Jenny 04/12/89 History
22567 Jones Robby 17/02/91 Engineering
43754 Jones Joe 28/03/91 Computing
38267 Smith Melissa 13/08/90 Computing
17382 Smith Melissa 08/09/90 English

Can {Surname, Program} be a key?

NO! It may not be apparent with this example, however, there could
be combinations of Surnames and Programs that are the same.
i.e. You could potentially have students with the same Surname in
the same Program.
28
Another Question? Foreshadowing!
RegNum Surname FirstName BirthDate Program
22335 Jones Jenny 04/12/89 History
22567 Jones Robby 17/02/91 Engineering
43754 Jones Joe 28/03/91 Computing
38267 Smith Melissa 13/08/90 Computing
17382 Smith Melissa 08/09/90 English

Any ideas on how we would address this issue?


i.e. How do we ensure that only a certain set of keys can be used to obtain
the data sets that we want?

Normalization and Database Design (we will discuss in future!)

29
Existence of Keys
Relations are sets;
◦ each relation is composed of distinct tuples.

The whole set of attributes for a relation defines a superkey.

Each relation has a key, which is a set of all its attributes, or a


subset thereof.

30
Existence of Keys

Keys guarantee that each piece of data in the database can be accessed.

Keys are a major feature of the Relational Model


◦ Allow us to say that it is “value-based”.

31
Keys and Null Values
The presence of NULL values negatively impacts keys:

◦ unique identification is no longer guaranteed.

◦ no assistance in establishing correspondence between data in


different relations.

32
Keys and Null Values
RegNum Surname FirstName BirthDate Program

NULL Jones Robby NULL Computing

43754 Jones Melissa 28/03/91 Engineering

38267 Smith Melissa NULL NULL

NULL Smith Melissa 08/09/90 Engineering

Questions to think about?


1. How do we access the 1st tuple?
2. Are the 3rd and 4th tuples the same?

33
Keys and Null Values
The presence of NULL values negatively impacts keys:
Maps to
◦ unique identification is no longer guaranteed.
Q1

Maps to ◦ no assistance in establishing correspondence between data in


Q2 different relations.

34
Primary Keys
In general, the presence of NULL values in key has to be limited.
◦ In most cases (and preferably) non-existent!
Each relations must have a primary key, where NULL values are not allowed to
occur in any attribute.
References between relations are realized through primary keys.

Notation:
• The attribute of the primary key are underlined!

35
Primary Keys
Notation:
• The attribute of the primary key are underlined!

RegNum Surname FirstName BirthDate Program

22567 Jones Robby NULL Computing

43754 Jones Melissa 28/03/91 Engineering

38267 Smith Melissa NULL NULL

17383 Smith Melissa 08/09/90 Engineering

36
DO We Always Have Primary Keys?
Ordinarily, we have reasonable primary keys.
e.g. Client/Account Number, Student Number, SIN, etc...

There potentially could be multiple keys, however, one will be designated as


the primary.

37
Recap
A set of fields is a key for a relation if:
1. No two distinct tuples can have the same value in all key fields, and
2. This is not true for any subset of the key.
◦ If false, then a superkey.
If there are more than one key for a relation, one is selected to be the primary key.
STUDENTS
SID Name GPA
5551234 John Smith 3.4
7771234 Jessica Jones 3.7
e.g. SID is a key for Students. The tuple {SID, GPA} is a superkey.
38
Primary and Candidate Keys
“For a given student and course, there is a single
grade.” vs. “Students can take only one course,
and receive a single grade for that course;
further, no two students in a course receive the Enrolled(SID, CID, grade)
same grade.”
Enrolled(SID, CID, grade)
Be careful to define Integrity Constraints (ICs)
correctly at design time. Enrolled(SID, CID, grade)
• Key {CID, grade}
ICs are checked when data is updated.

39
Primary and Candidate Keys

Be Careful!
• If an IC is used carelessly it can prevent the storage of database
instances that arise in practice!

40
Foreign Keys
Pieces of data in different relations are correlated by means of values of
primary keys.

Referential integrity constraints are imposed in order to guarantee that the


values refer to existing tuples in the referenced relation.

A foreign key requires that the values on a set X of attributes of a relation R1


must appear as values for the primary key of another relation R2.
◦ i.e. set of attributes in one relation that is used to ‘refer’ to a tuple in another relation.
These must correspond to primary key of the second relation! Like a ‘logical pointer’!

41
Referential Integrity
e.g. SID is a foreign key referring to Students:
Enrolled(SID: string, CID: string, grade: string)

◦ If all foreign key constraints are enforced, referential integrity is achieved.


i.e. No dangling references exist!

42
Referential Integrity
Only students listed in the Student relation should be allowed to enroll for courses.

Enrolled
Students
SID CID Grade
SID LastName Login Age GPA
22555 Biology101 A
22555 White white@sci 21 3.4
23555 History407 B
23555 Jones jones@his 20 3.2
23455 English201 A
23455 Taylor taylor@eng 20 3.6
22555 Genetics304 C

43
Enforcing Referential Integrity
Consider the relations: Students and Enrolled
◦ SID in Enrolled is a foreign key that references Students.

QUESTION?!
What should be done if an Enrolled tuple with a non-existent SID is inserted?

REJECT IT!

44
Enforcing Referential Integrity
ANOTHER QUESTION?!
If a Students tuple is deleted, what should occur?

1. Delete all Enrolled tuples that reference it.


2. Prevent deletion of a Students tuple that it is being
referenced.
3. Set SID in Enrolled tuples that refer to it to a default SID.
4. Set SID in Enrolled tuples that refer to it to NULL.

Similar if the primary key of Students tuple is updated. This is called


“Cascading”

45
Integrity Constraints (a.k.a. ICs)
RECALL
ICs are based upon the semantics of the real-world enterprise. Semantics is the meaning
or relationship of meaning
◦ This is being described in the database relations. of a sign or set of signs.
We CAN check a database instance to see if an IC is violated.
HOWEVER
We CANNOT infer that an IC is true by looking at an instance.
◦ An IC is a statement about all possible instances.
Key and Foreign Key ICs are the most common; more general ICs are also
supported.

46
Where do ICs Come From?
Let’s work through an example!
Information:
3 relations: Offences, Officers, and Cars.
1. Offences: {Code, Date, Officer, Dept, Registration}
2. Officers: {RegNum, Surname, FirstName}
3. Cars: {Registration, Dept, Owner}

Now let’s add some data!

47
Code Date Officer Dept Registration
83746 25/05/2016 567 45 416 YYZ
29374 26/05/2016 984 45 747 LHR
Offences
39374 28/05/2016 984 45 416 YYZ
27483 28/05/2016 567 45 747 LHR
95848 28/05/2016 363 82 333 VCE

RegNum Surname FirstName


567 Hopkins John
Officers
984 Walker Johnny
363 Ross George

Registration Dept Owner


416 YYZ 45 Michael Liut
747 LHR 45 Jessica Jones
Cars
333 VCE 82 Giovanni Saputo
647 YUL 45 Michael Liut
48
Code Date Officer Dept Registration
83746 25/05/2016 567 45 416 YYZ
29374 26/05/2016 984 45 747 LHR
Offences
39374 28/05/2016 984 45 416 YYZ
27483 28/05/2016 567 45 747 LHR
95848 28/05/2016 363 82 333 VCE

RegNum Surname FirstName


567 Hopkins John
Officers 984 Walker Johnny NOTE:
363 Ross George Offences[Officer] ⊆ Officers[RegNum]
Offences[Registration, Dept] ⊆
Registration Dept Owner
Cars[Registration,Dept]
416 YYZ 45 Michael Liut

Cars 747 LHR 45 Jessica Jones


333 VCE 82 Giovanni Saputo
647 YUL 45 Michael Liut
49
Violation of Foreign Keys
Code Date Officer Dept Registration
Offences 83746 25/05/2016 912 37 416 YYZ
29374 26/05/2016 912 45 747 LHR

RegNum Surname FirstName


567 Hopkins John
Officers
984 Walker Johnny

Registration Dept Owner


416 YYZ 45 Michael Liut
Cars
747 LHR 45 Jessica Jones
333 VCE 82 Giovanni Saputo

50
Referential Constraints
General comments that should be noted:

Referential Constraints play an important role in making the Relational Model


value-based.

Care is needed in case of referential constraints that involve two or more


attributes.

51
Examples
Code Dept1 Registration1 Dept2 Registration2
Accidents 6002 45 416 YYZ 82 333 VCE
6437 45 747 LHR 82 333 VCE

Registration Dept Owner …


416 YYZ 45 Michael Liut …
747 LHR 45 Jessica Jones …
Cars
333 VCE 82 Giovanni Saputo …
647 YUL 45 Michael Liut …

Here we have two Referential Constraints for Accidents:


{Registration1, Dept1} to Cars and {Registration2, Dept2} to Cars.

52
Citations, Images and Resources
Database Management Systems (3rd Ed.), Ramakrishnan & Gehrke
Some content is based off the slides of Dr. Fei Chiang - [Link]
[Link]
[Link]
[Link]
[Link]
fca33c1fc217/1428388432254_null_pointer_2_large
[Link]

53

You might also like