0% found this document useful (0 votes)
66 views84 pages

Understanding Relational Schema Basics

The document outlines the course CIS 5450 for Spring 2025, focusing on relational schemas, ER diagrams, and normal forms. It includes course logistics, upcoming deadlines, and a discussion on the importance of schemas in ensuring data integrity and ease of computation. Additionally, it covers functional dependencies and their implications in database design.

Uploaded by

john.coyle198
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
66 views84 pages

Understanding Relational Schema Basics

The document outlines the course CIS 5450 for Spring 2025, focusing on relational schemas, ER diagrams, and normal forms. It includes course logistics, upcoming deadlines, and a discussion on the importance of schemas in ensuring data integrity and ease of computation. Additionally, it covers functional dependencies and their implications in database design.

Uploaded by

john.coyle198
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Relational schema

CIS 5450, Spring 2025

Ryan Marcus
Outline

Course logistics

Brief review

Relational schema

Understanding schema with ER diagrams

Understanding schema with normal forms
Outline

Course logistics

Brief review

Relational schema

Understanding schema with ER diagrams

Understanding schema with normal forms
Upcoming Deadlines

HW1 due Feb 15th @ 10pm (11 days)

Next time: guest lecture from Jeff Tao
Outline

Course logistics

Brief review

Relational schema

Understanding schema with ER diagrams

Understanding schema with normal forms
Combining Data

Combining data that ●
Combining data that
was intended to be was not intended to
combined be combined
Course Classroom Company Founded
CIS 5450
Joining
Meyerson B1 Data Integration
Apple 1976
CIS 5500 Towne 100 Microsoft 1975
Course Instructor Stock Value
- More complex
-CIS
Pure computation
5450 Mar & Gar -AAPL 188.64
Requires knowledge
CIS 5500 Dav & Nai MSFT 410.37
Single Source of Truth

Joins may seem inconvenient, but can greatly
increase data quality.
Professor Dept Admin
Marcus CIS Mari
Redundant data!
Gardner CIS Mari
Gerbner COM Andy

What happens if the CIS admin changes?


→ Have to update multiple cells
- Bad for performance
- Error-prone
Joins in Pandas
Joins in Pandas

How many movies starring Angelina Jolie were produced by Sony?

company_name title name

id id id
name title name

movie_companies cast_info
Entity tables
company_id movie_id
movie_id person_id
Joins in Pandas

How many movies starring Angelina Jolie were produced by Sony?

company_name title name

id id id
name title name

movie_companies cast_info Links


Links actors to
companies movies
to movies company_id movie_id
movie_id person_id
Joins in Pandas

How many movies starring Angelina Jolie were produced by Sony?

company_name title name

id
name Relationship tables id
title
id
name

movie_companies cast_info

company_id movie_id
movie_id person_id
Joins in Pandas

How many movies starring Angelina Jolie were produced by Sony?

company_name title name

id id id
name title name

movie_companies cast_info

company_id movie_id
movie_id person_id
Outline

Course logistics

Brief review

Relational schema

Understanding schema with ER diagrams

Understanding schema with normal forms
Relational schema

“Relational” = tabular = tables = dataframes

Relational schema is what the tables and
columns represent.
Employees
Department
Payroll
E_id
D_id
E_join_date E_id
D_name
E_name P_salary
D_boss_id
D_id

Some exceptions apply


Relational schema in SQL
Relation

Employees

e_id
e_join_date
e_name
d_id

Columns / attribues
Data types
Datatypes

SQLite
Datatypes

SQLite

DuckDB
Primary and foreign keys
Primary key: a unique identifier for each row of the table.
Foreign key: a non-unique reference to a unique column of another table.

Employees

e_id
e_join_date
e_name
d_id
Relational schema in SQL

Unique key groups

Checked constraints

Custom data types

External tables
Why do we have schemas?

Schemas ensure all of our data follows the
same rules
– Benefit 1: helps keep data clean (integrity)
– Benefit 2: makes computation easier
– Benefit 3: provides a “map” of the data that is
guaranteed to be true
Outline

Course logistics

Brief review

Relational schema

Understanding schema with ER diagrams

Understanding schema with normal forms
ER Diagrams

ER (entity-relationship) diagrams are a common
tool to design and document databases.

Visual representation of the “nouns” and “verbs”
inside of a schema.
ER: Entities

Entities are the objects of the schema (rectangles)

Actor Movie
ER: Entities

Attributes are things that describe entities (ovals)

name title

year
note Actor Movie
region

language
ER: Entities

Compound attributes are made up of other attributes (double oval)

year
month
name title
dob
day
year
note Actor Movie
region

language
ER: Entities

Computed attributes are attributes that can be
computed from other attributes (dashed oval)
year
month
name title
dob
day
year
note Actor Movie
region
age
language
ER: Entities

Relationships are the verbs of the schema (diamonds)

year
month
name title
dob
day
year
note Actor Credit Movie
region
age
language
ER: Entities

Relationships can have attributes too!

year
month
name title
dob
day
year
note Actor Credit Movie
region
age
language
role
ER: Entities

Relationships have a degree

year
month
name title
dob
day
n m year
note Actor Credit Movie
region
age
language
role
year
month
name title
dob
day
n m year
note Actor Credit Movie
region
age m
language
role
Born

Country name

continent
year
month
name id id title
dob
day
n m year
note Actor Credit Movie
region
age m
language
role
Born

Country name

continent
ER Diagrams
Actor Entities; the objects or nouns of the schema

Credit Relationships; the verbs or linkages of the schema

year Attributes, descriptors of entities or relationships

x1
x Compound attributes are composed of other attributes
x2

age Computed attributes can be determined from other attrs


From ER to Schema

Given an ER diagram, how to get a schema?

Each entity becomes a table, each attribute a
column
– Compound and computed attributes are special cases

Each relationship becomes either:
– A table with two foreign keys (many-to-many)
– A foreign key in another table (one-to-many)
year
month
name title
dob
day
n m year
note Actor Credit Movie
region
age m
language
role
Born

Country name

continent
ER Diagrams

Advantages:
– Visual
– Easy to understand
– Clear correspondence to the real world

Disadvantages:
– Imprecise: how do we know we have the right # of tables?
– Subjective: difficult to decide if an ER diagram is “right” or
“wrong”

Is Country related to Continent or is Continent an attribute?
Outline

Course logistics

Brief review

Relational schema

Understanding schema with ER diagrams

Understanding schema with normal forms
First Normal Form - 1NF

Today we will learn 1NF, 2NF,
3NF.
– Each applies to a table, such
that a table is either “in” a
normal form or “violating” a
normal form.
– Each is defined in terms of the
previous form.

1NF is the simplest: all values
Edgar F. Codd are themselves not tables
First Normal Form - 1NF

Today we will learn 1NF, 2NF,
3NF.
– Each applies to a table, such
that a table is either “in” a
normal form or “violating” a
normal form.
– Each is defined in terms of the
previous form.

1NF is the simplest: all values
are themselves not tables
First Normal Form - 1NF

TodayCustID
Cust we willT_ID
learn Date
1NF, 2NF,
Amt
3NF. 1
Abraham 12890 2003-10-14 -87
– Each
Abraham 1 applies
12904to 2003-10-15
a table, such
-50
Isaac that
2 a table is either
12898 “in” a -21
2003-10-14
Jacob
normal
3
form or “violating” a
12907 2003-10-15 -18
normal form.
Jacob 3 14920 2003-11-20 -70
– Each is defined in terms of the
Jacob 3 15003 2003-11-27 -60
previous form.

1NF is the simplest: all values
are themselves not tables
Functional Dependencies

Recall: the “vertical line test”

y y

x x
Functional Dependencies

Recall: the “vertical line test”

y y

x x
Not a function A function
Functional Dependencies

More precisely:
– Say f(x) is a function iff for all x, f(x) has only 1 value
– Or: Ex:

Given f(x1) = y1 and f(x2) = y2,
If f(x1) = 5 and f(x2) = 7

And given y1 != y2,

Then: x1 must not equal x2 Then EITHER:
x1 != x2 OR f(x) is not a function

We say a table has a functional dependency
between attributes A and B, written A → B, iff B is a
function of A. A → B read aloud is “A determines B”
Functional Dependencies
if f(x1) = y1 and f(x2) = y2, then if y1 != y2, x1 must not equal x2

Does Dept → Professor?


In other words: is there a function f(department) → professor?

Professor Dept Admin


Marcus CIS Mari
Gardner CIS Mari
Gerbner COM Andy
Functional Dependencies
if f(x1) = y1 and f(x2) = y2, then if y1 != y2, x1 must not equal x2

Does Dept → Professor? No. Proof:


X1 = CIS Y1 = Marcus
X2 = CIS Y2 = Gardner

Y1 != Y2, but X1 = X2 – no function!

Professor Dept Admin


Marcus CIS Mari
Gardner CIS Mari
Gerbner COM Andy
Functional Dependencies
if f(x1) = y1 and f(x2) = y2, then if y1 != y2, x1 must not equal x2

Does Dept → Admin?

Professor Dept Admin


Marcus CIS Mari
Gardner CIS Mari
Gerbner COM Andy
Functional Dependencies
if f(x1) = y1 and f(x2) = y2, then if y1 != y2, x1 must not equal x2

Does Dept → Admin? Yes. Proof:


X1 = CIS Y1 = Mari X1 = CIS Y1 = Mari
X2 = CIS Y2 = Mari X2 = COM Y2 = Andy

Check every pair of rows – no violation.

Professor Dept Admin


Marcus CIS Mari
Gardner CIS Mari
Gerbner COM Andy
Functional Dependencies
if f(x1) = y1 and f(x2) = y2, then if y1 != y2, x1 must not equal x2

Does Prof → Dept?

Professor Dept Admin


Marcus CIS Mari
Gardner CIS Mari
Gerbner COM Andy
Functional Dependencies
if f(x1) = y1 and f(x2) = y2, then if y1 != y2, x1 must not equal x2

Does Prof → Dept? Yes. Proof:


X1 = Marcus Y1 = CIS X1 = Marcus Y1 = CIS X1 = Gardner Y1 = CIS
X2 = Gardner Y2 = CIS X2 = Gerbner Y2 = COM X2 = Gerbner Y2 = COM

Check every pair of rows – no violation.

Professor Dept Admin


Marcus CIS Mari
Gardner CIS Mari
Gerbner COM Andy
Functional Dependencies

Intuitive way to check for Fds:
– Say that A → B if, given A, you can
always say the exact value of B.

Given a professor, can you say Prof → Dept Yes!
what dept they are in?

Given a department, can you tell Dept → Admin Yes!
me the department’s admin?

Given an admin, can you tell me Admin → Prof No!
the professor?
Algebra of FDs

Armstrong’s axioms

Reflexivity: Y is a subset of X, then X → Y
– Ex: the prof and the dept determine the dept

Augmentation: if X → Y, then for all Z, XZ → YZ
– Ex: prof → dept, so (prof, admin) → (dept, admin)

Transitivity: if X → Y and Y → Z, then X → Z
– Ex: prof → dept and dept → admin, so prof → admin

These axioms are complete and sound


Proving more properties

Decomposition:
– if X → YZ, then X → Y

Reflex: Y in X, X → Y
Augm: X → Y, for any Z, XZ → YZ
Proof: Trans: X → Y and Y→ Z, X → Z

Step 1: X → YZ (given)
Step 2: YZ → Y (reflex)
Step 3: X → Y (trans of 1 + 2)
Proving more properties

Union:
– if X→Y and X→Z, then X→YZ
Reflex: Y in X, X → Y
Proof:
Augm: X → Y, for any Z, XZ → YZ
Trans: X → Y and Y→ Z, X → Z
Step 1: X → Y (given)
Step 2: X → Z (given)
Step 3: X → XZ (augm 2 with X)
Step 4: XZ → YZ (augm 1 with Z)
Step 5: X → YZ (trans 3 + 4)
Proving more properties

Prove: if X→Y and YZ→W, then XZ→W

Reflex: Y in X, X → Y
Augm: X → Y, for any Z, XZ → YZ
Trans: X → Y and Y→ Z, X → Z
Proving more properties

Prove: if X→Y and YZ→W, then XZ→W

Reflex: Y in X, X → Y
Proof:
Augm: X → Y, for any Z, XZ → YZ
Trans: X → Y and Y→ Z, X → Z
Step 1: X → Y (given)
Step 2: YZ → W (given)
Step 3: XZ → YZ (augm 1 with Z)
Step 4: XZ → W (trans 3 and 2)
Lossless Joins
Professor Dept Admin Professor Dept Admin
Marcus CIS Mari Marcus CIS Mari
Gardner CIS Mari Gardner CIS Mari
Gerbner COM Andy Gerbner COM Andy

Professor Admin
Professor Dept
Marcus Mari
Marcus CIS
Gardner Mari Dept Admin
Gardner CIS Dept Admin
Gerbner Andy CIS Mari
Gerbner COM CIS Mari
CIS Mari
COM Andy
COM Andy

Good! … bad!
Lossless Joins

When can we turn one relation R into R 1 and
R2?
– When the join between them is lossless
(reconstructs the original table)

Theorem (lossless join decomposition):
– (R1, R2) is a lossless decomposition of R iff
– R1 ∩ R2 → R2
Lossless Joins
Theorem (lossless join decomposition):
– Given R1 ∪ R2 = R, when does R1 ⋈ R2 = R?
– (R1, R2) is a lossless decomposition of R iff
– R1 ∩ R 2 → R 2
R = (Prof, Dept, Admin)
R1 = (Prof, Dept)
R2 = (Dept, Admin)
R1 ∩ R2 = (Dept)
Does Dept → Dept, Admin?
Yes, by decomposition
Lossless Joins
Theorem (lossless join decomposition):
– Given R1 ∪ R2 = R, when does R1 ⋈ R2 = R?
– (R1, R2) is a lossless decomposition of R iff
– R1 ∩ R 2 → R 2
R = (Prof, Prof_Sal, Dept, Admin)
R1 = (Prof, Dept)
R2 = (Dept, Admin, Prof_Sal)
R1 ∩ R2 = (Dept)
Does Dept → Dept, Admin, Prof_Sal?
No.
Keys

Now that we know when a decomposition is
lossless, now we need to figure out which
decomposition to pick!

Next, we need to understand keys.
Keys

Suppose R = {A, B, C, D, E}
{ABCDE} and {ABCD} – AB → CE, CD → BE
are super keys

Any set of attributes that
{ABD} and {ACD} are determines R is a superkey
candidate keys
{ABD} can be picked

If a superkey is minimal, we
as primary key call it a candidate key

We pick one candidate key
to be our primary key
Keys

Suppose R = {A, B, C, D, E}
{ABCDE} and {ABCD} – AB → CE, CD → BE
are super keys

Any Ansetattribute
of attributes that
that is part of any
{ABD} and {ACD} are determines
candidateR isisaa superkey
key a prime
candidate keys attribute
{ABD} can be picked

If a superkey is minimal, we
as primary key call it a candidate key

We pick one candidate key
to be our primary key
Second Normal Form

R is in 2NF iff: R is in 1NF and no non-prime attribute is
determined by any proper subset of a candidate key.
Professor Dept Admin

R = {A, B, C, D, E} Marcus
Gardner
CIS
CIS
Mari
Mari
AB → CE, CD → BE Gerbner COM Andy

{ABD} and {ACD} are P → D, D → A


candidate keys Candidate keys: {P}

Violates 2NF! AB → E All good!


Second Normal Form
R = {A, B, C, D, E} R1 = R – RHS of FD
AB → CE, CD → BE R1 = {A, B, C, D}
AB → C CKs: {ABD} {ACD}
{ABD} and {ACD} are CD → B

candidate keys
R2 = LHS + RHS of FD
Violates 2NF! AB → E R2 = {A, B, E}
AB → E CKs: {AB}
Second Normal Form
R = {A, B, C, D, E} R1 = R – RHS of FD
AB → CE, CD → BE R1 = {A, B, C, D}
AB → C CKs: {ABD} {ACD}
{ABD} and {ACD} are CD → B

candidate keys All good!


R2 = LHS + RHS of FD
Violates 2NF! AB → E R2 = {A, B, E}
AB → E CKs: {AB}
R is in 2NF iff: no non-prime
attribute is determined by any
proper subset of a candidate key. All good!
Second Normal Form
If I always use this criteria to pick R1 = R – RHS of FD
a decomposition, will I always get R1 = {A, B, C, D}
a lossless join? AB → C CKs: {ABD} {ACD}
CD → B
Recall that a join is lossless if:

R1 ∩ R2 → R2
R2 = LHS + RHS of FD
R2 = {A, B, E}
AB → E CKs: {AB}
Third Normal Form

R is in 1NF if every value is a value

R is in 2NF if:
– R is in 1NF AND
– R is in 1NF and no non-prime attribute is determined by
any proper subset of a candidate key.

R is in 3NF if:
– R is in 2NF AND
– No non-prime attribute determines a different non-prime
attribute
Third Normal Form
No non-prime
Professor Dept Admin
attribute determines a
Marcus CIS Mari
different non-prime
Gardner CIS Mari
attribute
Gerbner COM Andy

P → D, D → A
Candidate keys: {P}
Third Normal Form
No non-prime
Professor Dept Admin
attribute determines a
Marcus CIS Mari
different non-prime
Gardner CIS Mari
attribute
Gerbner COM Andy

P → D, D → A
Candidate keys: {P}
Third Normal Form
No non-prime
Professor Dept Admin
attribute determines a
Marcus CIS Mari
different non-prime
Gardner CIS Mari
attribute
Gerbner COM Andy
R1 = R – RHS of FD
R1 = {Prof, Dept}
P → D, D → A
Candidate keys: {P}
R2 = LHS + RHS of FD
R2 = {Dept, Admin}
Third Normal Form
Every non-prime attribute is determined by…

The keys (by definition)

The whole keys (2NF)

… and nothing but the keys (3NF)

Edgar F. Codd
The 3NF Algorithm
Repeat until all relations are in 3NF:
1) Pick a relation we haven’t checked yet.
2) Ensure the relation is in 1NF.
3) Write down the FDs
4) Identify the CKs
5) Check for 2NF violations
If any, decompose on a violating FD, go to (1)
6) Check for 3NF violations
If any, decompose on a violating FD, go to (1)
Actor Country Country
ActorID ActorDOB Movies
Name Name Continent
1 Jolie, Angeli June 4th, 1975 US NA [{“role”: “Lara
Croft”, “movieID”:
1, “movieName”:
“Tomb Raider”},
…]
2 Li, Jet April 26th, 1963 CN AS [{“role”: “Huo”,
“movieID”: 2,
“movieName”:
“Fearless”},
{“role”: “Liu”,
“movieID”: 3,
“movieName”:
Kiss of the
Dragon”}, …}
Actor Country Country
ActorID ActorDOB Movies
Name Name Continent
1 Jolie, Angeli June 4th, 1975 US NA [{“role”: “Lara
Croft”, “movieID”:
1, “movieName”:
“Tomb Raider”},
…]
2 Li, Jet April 26th, 1963 CN AS [{“role”: “Huo”,
“movieID”: 2,
“movieName”:
“Fearless”},
{“role”: “Liu”,
“movieID”: 3,
“movieName”:
Kiss of the
Step 1: Dragon”}, …}
Not in 1NF! Table-in-a-table.
Actor Actor Country Country Movie Movie
ActorDOB Role
ID Name Name Cont ID Name
1 Jolie, Angeli June 4th, 1975 US NA Lara 1 Tomb Raider

1 Jolie, Angeli June 4th, 1975 US NA Margaret 4 The Good


Shepherd
2 Li, Jet April 26th, 1963 CN AS Huo 2 Fearless
2 Li, Jet April 26th, 1963 CN AS Liu 3 Kiss of the Dragon

Step 2: Write down our FDs: Step 4: Check for 2NF


ActorID → ActorName, ActorDOB, CountryName, CountryCont Are any non-prime
CountryName → CountryCont attributes determined by a
MovieID → MovieName subset of a candidate key?
Step 3: Find candidate keys (minimal keys) Violation!
CKs: {ActorID, MovieID, Role}
R = {ActorID, ActorName, ActorDOB, CountryName, CountryCont, Role, MoveID, MovieName}

Step 2: Write down our FDs: Step 4: Check for 2NF


ActorID → ActorName, ActorDOB, CountryName, CountryCont Are any non-prime
CountryName → CountryCont attributes determined by a
MovieID → MovieName subset of a candidate key?
Step 3: Find candidate keys (minimal keys) Violation!
CKs: {ActorID, MovieID, Role}

Step 5: Decompose with the violating FD


R1 = R – RHS of FD
R1 = {ActorID, Role, MovieID, MovieName}

R2 = LHS + RHS of FD
R2 = {ActorID, ActorName, ActorDOB, CountryName, CountryCont}
R1 = {ActorID, Role, MovieID, MovieName}
R2 = {ActorID, ActorName, ActorDOB, CountryName, CountryCont}
R1 = {ActorID, Role, MovieID, MovieName} Step 1: Check for 1NF
R2 = {ActorID, ActorName, ActorDOB, CountryName, CountryCont} … generally always good

Step 2: Write down our FDs: Step 4: Check for 2NF


MovieID → MovieName Are any non-prime
attributes determined by a
Step 3: Find candidate keys (minimal keys) subset of a candidate key?
CKs: {ActorID, Role, MovieID} Violation!

Step 5: Decompose with the violating FD


R3 = R1 – RHS of FD
R3 = {ActorID, Role, MovieID}

R4 = LHS + RHS of FD
R4 = {MovieID, MovieName}
R2 = {ActorID, ActorName, ActorDOB, CountryName, CountryCont}
R3 = {ActorID, Role, MovieID}
R4 = {MovieID, MovieName}
R2 = {ActorID, ActorName, ActorDOB, CountryName, CountryCont} Step 1: 1NF
R3 = {ActorID, Role, MovieID} … generally good
R4 = {MovieID, MovieName}
Step 2: Write down our FDs: Step 4: Check for 2NF
ActorID → ActorName, ActorDOB, CountryName, CountryCont Are any non-prime
CountryName → CountryCont attributes determined by a
subset of a candidate key?
Step 3: Find candidate keys (minimal keys)
None found!
CKs: {ActorID}

Step 7: Decompose
Step 5: Decompose with the violating FD (none)! R5 = R2 – RHS of FD
Step 6: Check for 3NF R5 = {ActorID, ActorName,
ActorDOB, CountryName}
Are any non-prime attributes determined by another
non-prime attribute?
R6 = LHS + RHS of FD
Violation! R6 = {CountryName, CountryCont}
R3 = {ActorID, Role, MovieID}
R4 = {MovieID, MovieName}
R5 = {ActorID, ActorName, ActorDOB, CountryName}
R6 = {CountryName, CountryCont}
R3 = {ActorID, Role, MovieID} 1NF + no non-trivial FDs,
must be in 3NF already

R4 = {MovieID, MovieName}

1NF + singleton CK with no other


R5 = {ActorID, ActorName, ActorDOB, CountryName} non-trivial FDs, must be in 3NF
already

R6 = {CountryName, CountryCont}
R3 = {ActorID, Role, MovieID}

R4 = {MovieID, MovieName}

R5 = {ActorID, ActorName, ActorDOB, CountryName}

R6 = {CountryName, CountryCont}

Select a CK from each relation as


the primary key. Underline it.
year
month
name id id title
dob
day
n m
Actor Credit Movie

m
role
Born
{ActorID, Role, MovieID}
{MovieID, MovieName}
1 {ActorID, ActorName, ActorDOB, CountryName}
{CountryName, CountryCont}
Country name

continent
Textbook chapters with
alternative treatments of FDs
and ER diagrams

Zack’s lecture on ER
diagrams (extended)

You might also like