0% found this document useful (0 votes)
44 views57 pages

Overview of Database Management Systems

A Database Management System (DBMS) is software that efficiently stores, manages, and retrieves data, acting as an interface between users and databases. It addresses issues like data redundancy, inconsistency, and security, while providing features such as data integrity, multi-user access, and powerful query capabilities. The document also covers database applications, design processes, data models, and the relational model, emphasizing the importance of constraints and ER diagrams in ensuring data integrity.

Uploaded by

lpuclasses
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
44 views57 pages

Overview of Database Management Systems

A Database Management System (DBMS) is software that efficiently stores, manages, and retrieves data, acting as an interface between users and databases. It addresses issues like data redundancy, inconsistency, and security, while providing features such as data integrity, multi-user access, and powerful query capabilities. The document also covers database applications, design processes, data models, and the relational model, emphasizing the importance of constraints and ER diagrams in ensuring data integrity.

Uploaded by

lpuclasses
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

DATABASE MANAGEMENT SYSTEMS

Introduction to DBMS
A Database Management System (DBMS) is software designed to store, manage, and
retrieve data in an organized and efficient manner. It acts as an interface between end-users or
application programs and the physical database.

What is a Database?

A database is a structured collection of related data stored electronically. Unlike simple file
storage, a database is designed to handle large amounts of data and support multiple users and
applications simultaneously.

Why Use a DBMS?

Managing data through regular files (like spreadsheets or text files) is inefficient and prone to
problems such as:

 Data redundancy (duplicate data).


 Data inconsistency.
 Difficulty in data sharing.
 Poor data security.
 No standard way to access or update data.

A DBMS solves these problems by providing:

 Efficient data storage and retrieval.


 Support for multiple concurrent users.
 Data integrity and security enforcement.
 Data abstraction and independence.
 Backup and recovery mechanisms.
 Powerful query capabilities using languages like SQL.

Key Features of a DBMS

 Data Abstraction: Hides the complexities of data storage from users.


 Data Integrity: Maintains correctness and accuracy of data.
 Data Security: Protects data from unauthorized access.
 Multi-user Access: Allows many users to access the database concurrently without
conflicts.
 Transaction Management: Ensures reliable processing of operations.
 Backup and Recovery: Protects against data loss.
 Data Independence: Physical or logical changes in data storage do not affect application
programs.

Examples of Popular DBMS

 Relational DBMS: Oracle, MySQL, Microsoft SQL Server, PostgreSQL.


 NoSQL DBMS: MongoDB, Cassandra, Couchbase.
 Object-oriented DBMS: db4o, ObjectDB.

1. Database Applications
Databases are essential in many fields where organized storage, efficient retrieval, and
management of data are required. Some common database applications include:

 Banking Systems: Manage customer accounts, transactions, loans, and ATM operations
securely and reliably.
 Airline Reservation Systems: Handle flight schedules, seat bookings, ticketing,
cancellations, and passenger information.
 E-commerce Websites: Store product details, customer information, orders, payment
data, and shipping information.
 Telecommunication Systems: Maintain subscriber records, call logs, billing data, and
service plans.
 Universities: Manage student records, course registrations, grades, faculty information,
and schedules.
 Healthcare: Keep patient records, appointment schedules, prescriptions, and billing
details.
 Inventory Management: Track stock levels, supplier data, sales, and purchase orders.
 Social Media Platforms: Store user profiles, posts, messages, and connections.

These applications demand fast, reliable, and secure access to data for multiple users
simultaneously, which is enabled by DBMS.

2. Purpose of Database Systems


The primary purposes of using a Database Management System (DBMS) are:

 Efficient Data Management: DBMS provides a systematic way to store, retrieve, and
manage large amounts of data efficiently.
 Data Sharing: Multiple users and applications can access and share data concurrently
without interference.
 Data Integrity: DBMS enforces rules and constraints to ensure that data is accurate,
consistent, and reliable.
 Data Security: It provides mechanisms for user authentication, authorization, and access
control to protect sensitive information.
 Reduced Data Redundancy: DBMS minimizes duplication by centralizing data storage,
which reduces storage costs and inconsistency.
 Backup and Recovery: Automatic backup and recovery processes protect data against
loss due to hardware failures or other disasters.
 Data Independence: It allows changes to data structure without affecting applications,
facilitating flexibility and scalability.
 Support for Complex Queries: DBMS supports query languages like SQL, enabling
complex data retrieval and manipulation.
 Enforcement of Standards: Helps maintain standards for data format, integrity, and
access across the organization.

1. Components of DBMS
A DBMS consists of several key components that work together to store, manage, and retrieve
data efficiently:

 Hardware: Physical devices like servers, storage drives, and network devices that host
the database.
 Software: The DBMS software itself, which manages the database and provides
interfaces for users and applications.
 Data: The actual information stored in the database.
 Users: People who interact with the database, such as:
o Database Administrators (DBA): Manage and maintain the database system.
o Application Programmers: Develop applications that use the database.
o End Users: Use applications or query tools to access data.
 Procedures: Instructions and rules for designing, using, and managing the database.

2. DBMS Architecture
DBMS uses a three-level architecture for data abstraction and independence:

 Internal Level (Physical Level): How data is physically stored on hardware (e.g., file
structures, indexing).
 Conceptual Level (Logical Level): The overall logical structure of the entire database,
describing what data is stored and relationships.
 External Level (View Level): Different user views of the database tailored to specific
needs.

This architecture helps isolate users and applications from physical data storage details,
improving flexibility and security.

3. Different Data Models


Data models define how data is organized and represented in the database:

 Hierarchical Model: Organizes data in a tree structure with parent-child relationships.


 Network Model: Uses graph structures allowing many-to-many relationships.
 Relational Model: Represents data in tables (relations) with rows and columns; widely
used in modern DBMS.
 Object-oriented Model: Stores data as objects, similar to object-oriented programming.
 Entity-Relationship (ER) Model: Used mainly for database design, representing
entities, attributes, and relationships.

4. Data Independence
Data independence means that changes in one level of the database architecture do not affect
other levels:

 Physical Data Independence: Changes in the physical storage (like changing file
structure) do not affect the conceptual schema or user applications.
 Logical Data Independence: Changes in the conceptual schema (like adding new fields
or tables) do not affect user views or applications.

Data independence makes database systems easier to maintain and evolve.

5. Various Types of Constraints


Constraints ensure data integrity and correctness in the database:

 Entity Integrity Constraint: Primary keys must be unique and not null.
 Referential Integrity Constraint: Foreign keys must match primary keys in referenced
tables or be null.
 Domain Constraints: Values must be within a specified domain or type.
 Unique Constraint: Ensures all values in a column are unique.
 Not Null Constraint: Specifies that a column cannot have null values.
 Check Constraint: Enforces conditions on data values (e.g., salary > 0).

Absolutely! Here’s a detailed explanation of Database Design and the Entity-Relationship


(ER) Model, covering the design process overview and the ER model basics:

1. Overview of the Database Design Process


Database design is a crucial step to ensure that the database effectively supports the requirements
of an organization. The process typically involves the following stages:

1.1 Requirement Analysis

 Understand and gather the data requirements from stakeholders, users, and business
processes.
 Identify what data needs to be stored and how it will be used.

1.2 Conceptual Design

 Create a high-level data model that represents the data requirements without worrying
about physical details.
 The Entity-Relationship (ER) Model is commonly used here to visually represent
entities, relationships, and constraints.

1.3 Logical Design

 Transform the conceptual model into a logical model suitable for the chosen database
model (usually the relational model).
 Define tables, attributes, keys, and relationships formally.

1.4 Physical Design

 Decide how the logical design will be implemented on physical storage.


 Choose file organizations, indexing methods, and optimize for performance.
1.5 Implementation and Maintenance

 Build the database using a DBMS.


 Continuously maintain and update the database as requirements evolve.

2. Entity-Relationship (ER) Model


The ER model is a graphical approach to database design that helps represent real-world entities
and their relationships clearly.

2.1 Key Concepts of ER Model

 Entity: A real-world object or thing with a distinct existence (e.g., Student, Employee,
Book).
o Represented by a rectangle.
 Attributes: Properties or details that describe an entity (e.g., Student has attributes:
StudentID, Name, Age).
o Represented by ovals connected to entities.
 Entity Set: A collection of similar entities (e.g., all students in a university).
 Relationship: An association between two or more entities (e.g., a Student enrolls in a
Course).
o Represented by a diamond shape.
 Relationship Set: A collection of similar relationships.

2.2 Types of Attributes

 Simple (Atomic) Attribute: Cannot be divided further (e.g., Age, Salary).


 Composite Attribute: Can be divided into smaller parts (e.g., Name → First Name, Last
Name).
 Derived Attribute: Can be derived from other attributes (e.g., Age from Date of Birth).
 Multivalued Attribute: Can have multiple values (e.g., Phone Numbers).

2.3 Key Attribute

 An attribute (or a set) that uniquely identifies an entity in an entity set (e.g., StudentID).

2.4 Cardinality Constraints

Defines the number of entities that can participate in a relationship:

 One-to-One (1:1): One entity from A relates to one entity from B.


 One-to-Many (1:N): One entity from A relates to many entities from B.
 Many-to-Many (M:N): Many entities from A relate to many entities from B.

2.5 Participation Constraints

 Total Participation: Every entity must participate in the relationship (depicted by a


double line).
 Partial Participation: Some entities may not participate (depicted by a single line).

Example

Consider a university database:

 Entities: Student, Course


 Relationship: Enrolls
 Attributes:
o Student: StudentID (key), Name, Age
o Course: CourseID (key), Title, Credits
 Relationship Attributes: Grade

The ER diagram would visually show these entities and the enrolls relationship between students
and courses.

Summary
Step Description
Database Design Process Requirement analysis → Conceptual → Logical → Physical design
Entity Real-world object represented by rectangle
Attribute Property of entity represented by oval
Relationship Association between entities represented by diamond
Key Attribute Unique identifier for entities
Cardinality Defines relationship participation (1:1, 1:N, M:N)
Participation Total or partial involvement in relationships

1. Constraints in ER Model
Constraints restrict the possible data values and relationships to ensure data integrity:
 Key Constraints: Ensure that each entity in an entity set is uniquely identifiable by a key
attribute.
 Participation Constraints:
o Total Participation: Every entity in the set must participate in at least one
relationship (double line).
o Partial Participation: Some entities may not participate (single line).
 Cardinality Constraints: Define the number of entities that can be associated in a
relationship:
o One-to-One (1:1)
o One-to-Many (1:N)
o Many-to-Many (M:N)
 Domain Constraints: Attributes must have values from a specific domain or type (e.g.,
age must be a positive integer).

2. ER Diagrams
An ER Diagram is a graphical representation of the ER model elements:

 Entities: Rectangles (e.g., Student, Course)


 Attributes: Ovals connected to entities (e.g., Name, ID)
 Relationships: Diamonds connecting entities (e.g., Enrolls)
 Keys: Underlined attribute names denote key attributes
 Participation: Single or double lines denote partial or total participation
 Cardinality: Numbers or symbols near relationships show cardinality (1, N, etc.)

ER diagrams help visualize the database structure clearly.

3. ER Design Issues
When designing an ER model, consider:

 Naming Conventions: Use meaningful and consistent names for entities, attributes, and
relationships.
 Redundancy Avoidance: Avoid storing duplicate information.
 Choosing Keys: Identify suitable key attributes to uniquely identify entities.
 Designing Relationships: Determine the correct cardinality and participation constraints.
 Handling Multi-valued Attributes: Decide whether to represent them as separate
entities or attributes.
 Normalization: Make sure the design avoids update anomalies and maintains integrity.
4. Weak Entity Sets
A Weak Entity Set:

 Does not have a key attribute of its own.


 Its existence depends on a related strong entity set.
 Identified uniquely by a combination of:
o A partial key (discriminator) within the weak entity set.
o The key of the related strong entity.
 Connected to the strong entity by an identifying relationship, depicted by a double
diamond.
 Represented with a double rectangle for the weak entity.

Example: In a database, a Dependent entity may be weak because it depends on an Employee


entity. Dependent has no unique key alone but can be identified by the employee’s ID plus
dependent’s name.

5. Extended ER Features
Extended ER (EER) model adds advanced concepts to capture more complex real-world
scenarios:

 Specialization: Process of defining a set of subclasses from a superclass based on some


distinguishing characteristic. (Top-down approach)
 Generalization: Abstracting common features from multiple entity sets into a
generalized superclass. (Bottom-up approach)
 Categorization (Union Types): An entity set that is a subset of the union of multiple
entity sets.
 Aggregation: Treating a relationship set as an abstract entity to relate it with other
entities.
 Inheritance: Subclasses inherit attributes and relationships of their superclasses.

These features make ER models more expressive and flexible.

Summary Table
Topic Description
Constraints Rules like key, participation, cardinality to ensure data integrity
ER Diagrams Visual representation of entities, attributes, relationships
ER Design Issues Naming, redundancy, keys, relationships, normalization
Entities without keys, dependent on strong entities, double
Weak Entity Sets
rectangles/diamonds
Extended ER
Specialization, Generalization, Aggregation, Categorization
Features

1. Relational Model
The Relational Model is the most widely used database model. It represents data in the form of
relations (tables), which are collections of tuples (rows) having attributes (columns).

 A relation is a two-dimensional table with rows and columns.


 Each row (tuple) represents a single record.
 Each column (attribute) represents a data field.
 Each relation has a schema that defines the relation’s name, attributes, and their domains.
 Relations must have a primary key: an attribute or set of attributes that uniquely
identifies each tuple.

2. Structure of Relational Databases


A relational database consists of a collection of relations (tables). Each relation has:

 Relation name: Unique identifier for the table.


 Attributes: Columns with names and associated data types/domains.
 Tuples: Rows containing the actual data.
 Keys:
o Primary Key: Uniquely identifies each tuple.
o Foreign Key: Attribute(s) that refer to the primary key of another relation,
establishing relationships.

Example table for a Student relation:

StudentID (PK) Name Age Major

101 Alice 20 CS
StudentID (PK) Name Age Major

102 Bob 22 Physics

3. Relational Algebra Operations


Relational algebra is a procedural query language that works on relations to produce new
relations. It consists of fundamental, additional, and extended operations.

3.1 Fundamental Operations

1. Selection (σ): Selects rows that satisfy a condition.


Example: σ_Age > 20(Student) — selects students older than 20.
2. Projection (π): Selects certain columns from a relation.
Example: π_Name, Major(Student) — extracts only Name and Major columns.
3. Union (∪): Combines tuples from two relations with the same schema.
Example: Students ∪ Graduates — students who are either current or graduated.
4. Set Difference (-): Tuples in one relation but not in the other.
Example: Students - Graduates — current students who have not graduated.
5. Cartesian Product (×): Combines every tuple of one relation with every tuple of
another.
Example: Students × Courses — all possible student-course pairs.
6. Rename (ρ): Renames the relation or attributes for clarity.
Example: ρ_NewName(Student) — renames relation Student to NewName.

3.2 Additional Operations

1. Intersection (∩): Tuples common to both relations.


Example: Students ∩ Graduates — students who have graduated.
2. Join (⨝): Combines related tuples from two relations based on a common attribute.
Types of join:
o Natural Join: Automatically joins on all common attributes.
o Equi-Join: Join on equality of specified attributes.

Example: Student ⨝ Enrollment (on StudentID) — gives student details with enrollment info.

3.3 Extended Operations

1. Division (÷): Returns tuples in one relation associated with all tuples in another.
Example: Find students who have enrolled in all courses offered.
Summary Table
Operation Symbol Description

Selection σ Select rows based on condition

Projection π Select specific columns

Union ∪ Combine tuples from two relations

Set Difference - Tuples in one relation but not other

Cartesian Product × Combine tuples from two relations

Rename ρ Rename relations/attributes

Intersection ∩ Common tuples between relations

Join ⨝ Combine related tuples

Division ÷ Tuples related to all tuples in another

1. Views in SQL
A View is a virtual table based on the result-set of an SQL query. It does not store data
physically but presents data from one or more tables.

 Views simplify complex queries.


 Enhance security by restricting access to specific data.
 Can be used like a regular table in SELECT queries.

Example:

CREATE VIEW Student_Majors AS


SELECT StudentID, Name, Major
FROM Students
WHERE Major = 'CS';

You can then query:

SELECT * FROM Student_Majors;


2. DDL (Data Definition Language) Statements in SQL
DDL statements define and modify database schema and structure.

 CREATE: Creates database objects like tables, views, indexes.


 CREATE TABLE Students (
 StudentID INT PRIMARY KEY,
 Name VARCHAR(50),
 Age INT,
 Major VARCHAR(30)
 );
 ALTER: Modifies an existing database object.
 ALTER TABLE Students ADD COLUMN Email VARCHAR(50);
 DROP: Deletes database objects.
 DROP TABLE Students;
 TRUNCATE: Removes all records from a table but keeps the structure.
 TRUNCATE TABLE Students;

3. DML (Data Manipulation Language) Statements in SQL


DML statements manage data within tables.

 SELECT: Retrieves data from tables.


 SELECT * FROM Students WHERE Age > 20;
 INSERT: Adds new rows.
 INSERT INTO Students (StudentID, Name, Age, Major)
 VALUES (101, 'Alice', 21, 'CS');
 UPDATE: Modifies existing data.
 UPDATE Students SET Age = 22 WHERE StudentID = 101;
 DELETE: Removes rows.
 DELETE FROM Students WHERE StudentID = 101;

4. JOINS in SQL
JOINS combine rows from two or more tables based on related columns.

Types of JOINS:

 INNER JOIN: Returns rows with matching values in both tables.


 SELECT [Link], [Link]
 FROM Students
 INNER JOIN Enrollment ON [Link] = [Link];
 LEFT (OUTER) JOIN: Returns all rows from the left table and matched rows from the
right table; NULL if no match.
 SELECT [Link], [Link]
 FROM Students
 LEFT JOIN Enrollment ON [Link] = [Link];
 RIGHT (OUTER) JOIN: Returns all rows from the right table and matched rows from
the left table.
 SELECT [Link], [Link]
 FROM Students
 RIGHT JOIN Enrollment ON [Link] = [Link];
 FULL (OUTER) JOIN: Returns rows when there is a match in either table.
 SELECT [Link], [Link]
 FROM Students
 FULL OUTER JOIN Enrollment ON [Link] =
[Link];
 CROSS JOIN: Returns Cartesian product of the tables.
 SELECT [Link], [Link]
 FROM Students
 CROSS JOIN Courses;

🔹 1. CREATE Statement
Used to create database objects like tables, views, databases, indexes, etc.

Syntax:
CREATE TABLE table_name (
column1 datatype [constraints],
column2 datatype [constraints],
...
);

Example:
CREATE TABLE Students (
StudentID INT PRIMARY KEY,
Name VARCHAR(50),
Age INT,
Major VARCHAR(30)
);

🔹 2. ALTER Statement
Used to modify the structure of an existing table by adding, deleting, or modifying columns or
constraints.

Syntax Examples:

 Add a column:

ALTER TABLE table_name ADD column_name datatype;

 Modify a column:

ALTER TABLE table_name MODIFY column_name new_datatype;

 Drop a column:

ALTER TABLE table_name DROP COLUMN column_name;

Example:
ALTER TABLE Students ADD Email VARCHAR(100);

🔹 3. DROP Statement
Used to delete database objects like tables, views, or databases permanently.

Syntax:
DROP TABLE table_name;

Example:
DROP TABLE Students;

⚠️ Once dropped, the data and structure are lost and cannot be recovered.

🔹 4. RENAME Statement
Used to change the name of a table or column (support varies slightly by RDBMS).

Syntax:
 Rename table (MySQL / PostgreSQL):

RENAME TABLE old_table_name TO new_table_name;

 Rename column (MySQL):

ALTER TABLE table_name RENAME COLUMN old_name TO new_name;

Example:
RENAME TABLE Students TO CollegeStudents;

🔹 5. TRUNCATE Statement
Used to remove all rows from a table quickly, but keep the structure of the table intact.

Syntax:
TRUNCATE TABLE table_name;

Example:
TRUNCATE TABLE Students;

⚠️ Faster than DELETE, but you cannot roll it back in some databases.

✅ Summary Table
Command Purpose Affects Data? Can Be Rolled Back?

CREATE Create new database objects No N/A

ALTER Modify existing objects Maybe Sometimes

DROP Permanently delete objects Yes ❌ No

RENAME Rename tables or columns No ✅ Yes (depends)

TRUNCATE Quickly remove all rows Yes ❌ No


🔹 SQL (DML): Data Manipulation Language
DML commands are used to insert, retrieve, modify, and delete data in the database. These
operations are performed on existing tables without affecting their structure.

1. SELECT – Retrieving Data


The SELECT command is used to retrieve data from one or more tables.

Syntax:
SELECT column1, column2, ...
FROM table_name
WHERE condition;

Example:
SELECT Name, Major
FROM Students
WHERE Age > 20;

2. INSERT – Inserting Data


The INSERT command is used to add new rows to a table.

Syntax:
INSERT INTO table_name (column1, column2, ...)
VALUES (value1, value2, ...);

Example:
INSERT INTO Students (StudentID, Name, Age, Major)
VALUES (101, 'Alice', 20, 'Computer Science');

3. DELETE – Deleting Data


The DELETE command is used to remove rows from a table based on a condition.
Syntax:
DELETE FROM table_name
WHERE condition;

Example:
DELETE FROM Students
WHERE StudentID = 101;

⚠️ Omitting the WHERE clause will delete all rows.

4. UPDATE – Modifying Data


The UPDATE command is used to change existing data in a table.

Syntax:
UPDATE table_name
SET column1 = value1, column2 = value2, ...
WHERE condition;

Example:
UPDATE Students
SET Age = 21
WHERE StudentID = 101;

5. Implementation of Constraints in SQL


Constraints are rules applied to table columns to enforce data integrity. They can be defined at
the time of table creation or altered later.

Common Constraints:

Constraint Purpose
PRIMARY KEY Uniquely identifies each record in a table
FOREIGN KEY Maintains referential integrity between tables
NOT NULL Ensures a column cannot have NULL values
UNIQUE Ensures all values in a column are different
Constraint Purpose
CHECK Ensures values satisfy a specific condition
DEFAULT Sets a default value for a column

Example: Creating Constraints


CREATE TABLE Students (
StudentID INT PRIMARY KEY,
Name VARCHAR(50) NOT NULL,
Age INT CHECK (Age > 0),
Major VARCHAR(30),
Email VARCHAR(100) UNIQUE
);

Example: Adding a FOREIGN KEY


CREATE TABLE Enrollments (
EnrollmentID INT PRIMARY KEY,
StudentID INT,
CourseID INT,
FOREIGN KEY (StudentID) REFERENCES Students(StudentID)
);

Summary
Command Purpose Example
SELECT Retrieve data SELECT * FROM Students;
INSERT Add new data INSERT INTO Students ... VALUES ...;
DELETE Remove existing data DELETE FROM Students WHERE ID = 1;
UPDATE Modify existing data UPDATE Students SET Age = 22;

Constraints ensure data remains valid, accurate, and consistent.

1. Implementation of JOINS
JOINS are used to combine rows from two or more tables based on a related column (usually a
key).

a. INNER JOIN (Most common)


Returns only the rows with matching values in both tables.

SELECT [Link], [Link]


FROM Students
INNER JOIN Enrollments ON [Link] = [Link]
INNER JOIN Courses ON [Link] = [Link];

b. LEFT JOIN (or LEFT OUTER JOIN)

Returns all rows from the left table, and matched rows from the right. Fills NULL if no match.

SELECT [Link], [Link]


FROM Students
LEFT JOIN Enrollments ON [Link] = [Link];

c. RIGHT JOIN

Returns all rows from the right table and matched rows from the left.

d. FULL OUTER JOIN (Not supported in all DBMSs)

Returns all rows when there is a match in one of the tables.

2. Nested Subqueries
A subquery is a query within another SQL query.

a. Simple Subquery in WHERE clause


SELECT Name
FROM Students
WHERE StudentID IN (
SELECT StudentID
FROM Enrollments
WHERE CourseID = 'CSE101'
);

b. Subquery in SELECT clause


SELECT Name,
(SELECT COUNT(*) FROM Enrollments WHERE [Link] =
[Link]) AS EnrolledCourses
FROM Students;

c. Subquery with EXISTS


SELECT Name
FROM Students S
WHERE EXISTS (
SELECT * FROM Enrollments E
WHERE [Link] = [Link] AND [Link] = 'CSE101'
);

3. Complex Queries
Complex queries involve multiple clauses, joins, subqueries, groupings, and conditions.

Example:
SELECT [Link], COUNT([Link]) AS TotalCourses
FROM Students
JOIN Enrollments ON [Link] = [Link]
WHERE [Link] = 'Fall2024'
GROUP BY [Link]
HAVING COUNT([Link]) > 2
ORDER BY TotalCourses DESC;

This query:

 Joins students and their enrollments


 Filters by semester
 Groups results by student name
 Shows only students enrolled in more than 2 courses
 Orders results by number of courses

4. Views
A view is a virtual table created by a query, used for simplification and security.

Creating a View:
CREATE VIEW CS_Students AS
SELECT StudentID, Name
FROM Students
WHERE Major = 'Computer Science';

Using the View:


SELECT * FROM CS_Students;
You can also use views in joins and subqueries.

5. Joined Relations
Joined relations are temporary relations formed by joining multiple tables.

Example:
SELECT [Link], [Link], [Link]
FROM Students S
JOIN Enrollments E ON [Link] = [Link]
JOIN Courses C ON [Link] = [Link];

Here, Students, Enrollments, and Courses are joined into one result — a joined relation.

Summary
Feature Description Example

JOIN Combines rows from two or more tables INNER JOIN, LEFT JOIN, etc.

Subquery A query inside another query WHERE IN (SELECT ...)

Multiple operations combined (JOIN + GROUP +


Complex Query Filters & groups data
HAVING)

CREATE VIEW ... AS SELECT


VIEW Virtual table based on a SELECT query ...

Joined
Relation
Temporary relation formed by JOINs Used in reporting/analytics

1. Tuple Relational Calculus (TRC)

 Definition: A non-procedural query language where queries are expressed using tuples
(rows) as variables.
 Format: { t | P(t) }
This denotes the set of all tuples t for which predicate P(t) is true.
 Example:
Get all students with age > 18:
{ t | t ∈ Student ∧ [Link] > 18 }
 Key Concepts:
o Variables represent tuples.
o Predicates filter the tuples based on conditions.
o Similar to first-order logic.

2. Domain Relational Calculus (DRC)

 Definition: A non-procedural query language like TRC, but variables represent values
(domains) rather than tuples.
 Format: { <x₁, x₂, ..., xₙ> | P(x₁, x₂, ..., xₙ) }
 Example:
Get names of students older than 18:
{ <name> | ∃ age (Student(name, age) ∧ age > 18) }
 Key Concepts:
o Uses logical connectives (∧, ∨, ¬), quantifiers (∃, ∀).
o More atomic than TRC, as it works with field values.

3. Query by Example (QBE)

 Definition: A graphical query language where users fill out a table-like structure to
specify a query. It's user-friendly and mainly used in database interfaces.
 Developed by: IBM in the 1970s.
 How it works:
o Tables are shown with blank fields.
o Users enter example values, conditions, or variables.
o The system translates this into SQL or a similar query language.
 Example:
To find students aged over 18, you might enter this in a visual interface:

Name Age

>18

 Key Concepts:
o Intuitive and easier for non-programmers.
o Widely used in tools like Microsoft Access.
Summary Table:

Feature Tuple Relational Calculus Domain Relational Calculus Query by Example

Query Type Non-procedural Non-procedural Visual, Example-based

Variables Tuples Domain values Implied by examples

Expressiveness High High Medium

Ease of Use Medium Medium High

Based On First-order logic First-order logic Table-oriented interface

1. Datalog

 Definition: A declarative logic programming language based on Prolog, used for


querying relational databases.
 Syntax:
o Uses facts, rules, and queries.
o No updates or deletions—purely for querying.
 Structure:
o Fact: parent(alice, bob).
o Rule: grandparent(X, Z) :- parent(X, Y), parent(Y, Z).
o Query: ?- grandparent(alice, Z).
 Key Features:
o Rule-based, uses recursion.
o Useful in deductive databases.

2. Set Operations

Used in SQL to combine results from two or more SELECT queries. They require compatibility
in number and type of columns.

a. UNION

 Definition: Combines results of two queries, removing duplicates.


 Syntax:
 SELECT name FROM Students
 UNION
 SELECT name FROM Teachers;
 Note: Use UNION ALL to include duplicates.

b. INTERSECT

 Definition: Returns rows common to both queries.


 Syntax:
 SELECT name FROM Students
 INTERSECT
 SELECT name FROM Teachers;
c. EXCEPT (or MINUS in some DBMS like Oracle)

 Definition: Returns rows in the first query but not in the second.
 Syntax:
 SELECT name FROM Students
 EXCEPT
 SELECT name FROM Teachers;

3. Aggregate Functions

Used to perform calculations on groups of rows and return a single value.

Function Description

COUNT() Number of rows

SUM() Total sum of a numeric column

AVG() Average value

MAX() Maximum value

MIN() Minimum value

 Example:
 SELECT AVG(age) FROM Students;
 Often used with GROUP BY:
 SELECT department, COUNT(*) FROM Employees GROUP BY department;

4. NULL Values
 Definition: Represents missing or unknown data in a database.
 Key Facts:
o NULL ≠ 0, NULL ≠ '', NULL ≠ NULL
o You cannot use = to compare with NULL.
 Proper Handling:
o Use IS NULL or IS NOT NULL:
o SELECT * FROM Students WHERE email IS NULL;
 In Aggregates:
o Most aggregate functions ignore NULLs (e.g., AVG() skips NULL values).
o COUNT(*) counts all rows, but COUNT(column) ignores NULLs.

1. Features of Relational Database Design


Relational database design ensures that a database is:

 Efficient (minimizes redundancy and storage space)


 Consistent (avoids anomalies in insert/update/delete)
 Scalable and maintainable

Key Features:

Feature Description

Data Integrity Enforces constraints to maintain accurate and valid data

Normalization Organizes data to reduce redundancy and dependency

Data Independence Logical and physical data separation

Flexibility Easy to modify schema without affecting applications

Ease of Querying Supports powerful query languages (e.g., SQL)

2. Atomic Domains and First Normal Form (1NF)


Atomic Domain:

 A domain is the set of valid values for an attribute.


 An atomic domain means the values are indivisible units.
o E.g., phone_number = '9876543210' is atomic, but phone_number =
'9876543210, 9123456789' is not.

First Normal Form (1NF):

 A relation is in 1NF if:


o All attributes contain only atomic values.
o Each record (row) is unique.
o The order of rows and columns is irrelevant.

Example (violating 1NF):


StudentID Name PhoneNumbers

101 Alice 9876543210,9123456

→ Fix for 1NF:

StudentID Name PhoneNumber

101 Alice 9876543210

101 Alice 9123456

3. Functional Dependency (FD) Theory


Definition:

A functional dependency (FD) is a constraint between two sets of attributes in a relation.

 Notation: X → Y
o Means: If two tuples have the same value for X, they must have the same value
for Y.

Examples:

 student_id → name (student ID determines name)


 roll_no, course_code → marks

Types of Functional Dependencies:


Type Description

Trivial A → A or A, B → A

Non-Trivial A → B where B is not a subset of A

Transitive If A → B and B → C, then A → C

4. Decomposition Using Functional Dependencies


Decomposition is the process of breaking a relation into smaller relations to eliminate
redundancy and anomalies, while preserving data and dependencies.

Goals of Decomposition:

 Eliminate redundancy
 Avoid anomalies (insertion, update, deletion)
 Preserve information
 Preserve dependencies (if possible)

Properties of Good Decomposition:

Property Meaning

Lossless Join Decomposed tables can be joined to recreate the original table

Dependency Preservation All FDs can be enforced without joining tables

No Redundancy Eliminate repeated data

Example:

Given a relation:

R(StudentID, Name, Dept, HOD)


FDs:
StudentID → Name, Dept
Dept → HOD

This relation has redundancy (e.g., HOD repeated for every student in a department).

Decompose into:
1. R1(StudentID, Name, Dept)
2. R2(Dept, HOD)

 This decomposition is:


o Lossless (you can join R1 and R2 on Dept to get the original table)
o Dependency Preserving (all FDs are preserved across R1 and R2)

Summary Table

Concept Key Point

Atomic Domains Attributes should contain indivisible values

First Normal Form (1NF) No repeating groups or multi-valued attributes

Functional Dependency Describes relationship between attributes in a relation

Decomposition Breaks a relation into smaller ones to eliminate redundancy

Lossless Join No data lost during decomposition

Dependency Preservation All original FDs are retained in the decomposed schema

1. Multivalued Dependencies (MVDs) & 4NF


1. Definition
o X →→ Y (reads “X multidetermines Y”) when, for a given X, multiple Y and Z
values exist independently
2. Illustrative example
o Attributes {Course, Book, Lecturer}: course independently determines sets of
books and lecturers. Tuples combine all pairs, causing redundancy .
3. Formal condition
o For any t₁, t₂ with t₁[X] = t₂[X], there must be t₃, t₄ combining t₁[Y]/t₂[Y] with
opposite Z portions
4. 4NF condition
o A relation is in 4NF if it is in BCNF and every non-trivial MVD X →→ Y has
X as a superkey
5. Decomposition process
o Identify non-trivial MVD X →→ Y with X not a superkey.
o Decompose R into R₁(X, Y) and R₂(X, R–Y) — ensures lossless join, though may
not preserve all dependencies

2. Higher Normal Forms: 5NF, ETNF, 6NF


1. Fifth Normal Form (5NF / PJNF)
o Handles join dependencies: every non-trivial join dependency must follow from
candidate keys).
o Rarely needed but addresses redundancy in complex relationships.
2. Essential Tuple Normal Form (ETNF)
o Positioned between 4NF and 5NF. Eliminates redundancy like 5NF but with
fewer decompositions
3. Sixth Normal Form (6NF)
o No non-trivial join dependencies remain.
o Useful in temporal databases or anchor modeling; supports interval-based data
decomposition (

3. Database Design Process


1. Requirements & Conceptual Modeling
o Gather entities, relationships, attributes, and business rules.
2. Logical Modeling & Normalization
o Convert ER diagrams into relations.
o Apply 1NF, 2NF, 3NF, and BCNF to eliminate functional-dependency-based
redundancy.
3. Advanced Normalization
o Detect MVDs → decompose to 4NF.
o If necessary, identify join dependencies → decompose to 5NF or ETNF.
o For temporal/incremental modeling, use 6NF.
4. Physical Design & Denormalization
o Optimize via indexes, partitioning, storage strategies.
o Denormalize selectively based on performance needs.
5. Validation & Iteration
o Ensure lossless join and dependency preservation.
o Test design with realistic data workloads and refine.
6. Implementation & Maintenance
o Translate schema into DDL, set constraints, monitor performance.
o Update schema as requirements evolve; manage technical debt from deferred
normalization
Summary of Normal Forms

Normal
Eliminates Condition
Form

BCNF Non-key → non-key FDs Every FD X→Y has X as a superkey

Independent multivalued
4NF Every non-trivial MVD X→→Y has X as a superkey
attributes (MVDs)

5NF Join dependency redundancy All non-trivial JDs are implied by keys

Targets redundancy elimination with minimal


ETNF Redundancy between 4NF and 5NF
decompositions

No non-trivial JDs remain, ideal for temporal/interval-


6NF All non-trivial join dependencies
based modeling

1. Concept of a Transaction
1. A transaction is a sequence of database operations treated as a single logical unit.
2. It must satisfy ACID properties:
o Atomicity
o Consistency
o Isolation
o Durability
3. Common examples include bank transfers, flight bookings, and distributed transactions

2. Transaction States
A transaction progresses through distinct states:

1. Active – Transaction starts and operations begin execution.


2. Partially Committed – Final statement executed, waiting to commit.
3. Committed – All changes are permanent.
4. Failed – Error occurred; transaction must roll back.
5. Aborted – Changes undone and transaction terminated.
6. Terminated – Final state: either committed or aborted.
3. Implementation of Atomicity
Atomicity ensures either all operations succeed or none do. Implementation techniques include:

1. Undo/Redo Logs
o Undo log records old values; used to rollback on failure.
o Redo log captures new values; used for recovery after crashes (
2. Write-Ahead Logging (WAL)
o Changes are first written to durable logs before database updates.
o Enables rollback of uncommitted transactions and redoing of committed ones
post-crash
3. Shadow Paging (Copy-on-Write)
o Operations are applied on a shadow copy; only if committed is a pointer switch
made

4. Implementation of Durability
Durability guarantees that once committed, changes persist permanently. Key mechanisms
include:

1. Write-Ahead Log (WAL)


o Logs are flushed to disk before commit; actual data pages can be written later
2. Checkpointing
o Periodic snapshots reduce recovery time by recording database state on stable
storage
3. Stable & Redundant Storage
o Use of non-volatile disk storage, RAID, and backups ensures data survives both
system and media). failures

5. Summary Table
Property Goal Common Mechanisms

Atomicity Ensure all-or-nothing execution WAL, undo/redo logs, shadow paging

Durability Ensure committed changes survive failures WAL, flush logs, checkpointing, backups
✅ Final Thoughts

 Atomicity is achieved through structured logging and shadow mechanisms.


 Durability relies on stable storage and timely flushing of logs.
 Together, they form the backbone of reliable transaction management and ensure
database consistency even in adversarial conditions.

1. Concurrent Execution & Its Challenges


 Concurrent execution allows multiple transactions to run simultaneously, improving
resource use and throughput—but risks issues like lost updates, dirty reads, and
inconsistent summary results
 Concurrency control protocols—like lock-based or timestamp-based—ensure isolation,
atomicity, consistency, durability, and serializability

2. Serializability
A schedule is serializable if its effect is equivalent to some serial schedule (where transactions
run one after another).

 Conflict serializability: Achieved if the precedence graph (nodes = transactions, edges


= conflict order) is acyclic View serializability: Weaker; focuses on the initial reads,
final writes, and intermediate reads being the same as some serial schedule
 Conflict → View: All conflict-serializable schedules are view-serializable, but not vice
versa

3. Recoverability
Schedules must be recoverable to avoid inconsistency after crashes or failures:

 A recoverable schedule ensures: if Tj reads data from Ti, then Ti must commit before Tj
([Link], [Link]).
 Cascadeless schedules avoid cascading rollbacks by preventing reads from uncommitted
changes (enforced by strict 2PL or timestamp ordering) ([Link]).
 Strict schedules forbid any read or write on uncommitted data, ensuring both
recoverability and cascadeless behavior ([Link]).

4. Implementation of Isolation
Isolation is enforced via several mechanisms:

1. Lock-based (2PL)
o Two-Phase Locking (2PL): Growing (acquire locks), then shrinking (release
locks). Guarantees conflict-serializability and recoverability
o Strict 2PL holds exclusive locks until commit/abort, preventing cascading
rollbacks .
2. Timestamp Ordering
o Assigns each transaction a timestamp. Enforces execution order based on
timestamps
3. Optimistic Concurrency Control (OCC)
o Transactions run without locks; validate on commit. Conflicts cause rollback
and).
4. Multiversion Concurrency Control (MVCC) & Snapshot Isolation
o Maintains multiple versions; readers see a snapshot as of transaction start.
Snapshot Isolation avoids many anomalies, but is not fully serializable

5. Testing for Serializability


 Precedence (Serialization) Graph Method:
1. Create nodes for each transaction.
2. Add directed edges for conflicting operations (WR, RW, WW).
3. Check if the graph is acyclic → then the schedule is conflict-serializable

✅ Quick Table

Aspect Definition/Goal Implementation/Test Method

Lost updates, dirty reads, incorrect


Concurrency Risks Concurrency control protocols
summaries
Aspect Definition/Goal Implementation/Test Method

Conflict Equivalent to serial schedule via non-


Precedence graph with no cycles
Serializability conflicting swaps

View Equivalent final state, not necessarily Compare reads/writes from initial/view
Serializability conflict-equivalent perspective

Recoverability Avoid reading uncommitted changes Enforce via strict protocols (e.g., strict 2PL)

Isolation Transaction-level locks, timestamps,


Locking, timestamps, MVCC, OCC
Mechanisms snapshots, validation

Testing
Graph cycle detection Build conflict graph and ensure acyclicity
Serializability

1. Lock-Based Protocols (Pessimistic Control)


1. Lock types & compatibility
o Shared (S) locks allow multiple readers; Exclusive (X) locks are exclusive to one
writer Compatibility matrix: S–S = ✔, S–X = ✖, X–X = ✖ .
2. Key protocols
o Simplistic: locks acquired per operation, released after use. Easy but prone to
deadlocks
o Pre-claiming: transaction requests all locks upfront. If any unavailable, it waits or
rolls back
o Two-Phase Locking (2PL):
 Growing phase: acquire locks only.
 Shrinking phase: release locks only
o Strict 2PL: holds exclusive locks until commit/abort—prevents dirty reads &
ensures). recoverability
3. Pros & cons
o ✅ Guarantees conflict-serializability and strict recoverability.
o ⚠️ Risk of deadlocks and reduced concurrency also potential starvation

2. Timestamp-Based Protocols (Pessimistic but Non-locking)


1. Mechanism
o Each transaction gets a unique timestamp (start time or counter)
o Each data item tracks RTS (last read TS) and WTS (last write TS)
2. Basic timestamp ordering
o A read/write by transaction T is allowed only if TS rules aren’t violated:
 Read: TS(T) ≥ WTS(X)
 Write: TS(T) ≥ RTS(X) and TS(T) ≥ WTS(X), else abort or skip write
(Thomas’s Rule)
3. Variants
o Strict timestamp ordering ensures recoverability.
o Thomas’s Write Rule skips obsolete writes instead of aborting
4. Advantages & limitations
o ✅ No locks or deadlocks.
o ⚠️ Transactions can be rejected or aborted if TS constraints break.
o Basic form may not guarantee recoverability—requires enhancements

3. Validation-Based / Optimistic Protocols (OCC)


1. Workflow phases :
o Read phase: transaction performs reads/writes in private buffers.
o Validation phase: before commit, ensure no conflicts with concurrent
transactions.
o Write phase: apply updates if validated; otherwise, abort.
2. Best scenario
o Ideal when contention is low—minimal overhead, high concurrency
3. Challenges
o At high contention, frequent aborts degrade performance

4. Multi-Version Concurrency Control (MVCC)


1. Approach
o Maintains multiple versions of each data item with write timestamps
2. Behavior
o Readers always access a snapshot consistent with their start time—never wait.
o Writers create new versions; garbage collection cleans up old ones .
3. Common usage
o Widely used in PostgreSQL, Oracle, MySQL—supports snapshot isolation, less
locking .
📊 Summary Comparison

Protocol Type Locking Deadlock Risk Serializability Recoverable? Best For

2PL / Strict 2PL Yes Yes ✅ Conflict SR ✅ (strict) High contention systems

Timestamp-based No No ✅ if strict ⚠️ Needs strict Moderate contention

OCC (Validation) No No ✅ if validated ⚠️ Possible aborts Low contention systems

MVCC No No Snapshot / SSI* ✅ Mixed reads/writes

* Snapshot Isolation ≠ serializable but many systems enhance it (e.g., PostgreSQL’s SSI).

1. Deadlock Handling
1. Definition:
o A deadlock occurs when two or more transactions are in a state where each is
waiting for the other to release resources, causing a cycle of dependencies that
prevents any of them from proceeding.
2. Conditions for Deadlock:
o Mutual Exclusion: At least one resource is held in a non-shareable mode.
o Hold and Wait: A transaction is holding at least one resource and is waiting to
acquire additional resources held by other transactions.
o No Preemption: Resources cannot be forcibly taken from transactions holding
them; they must be released voluntarily.
o Circular Wait: A set of waiting transactions exists such that each transaction is
waiting for a resource held by the next transaction in the set.
3. Handling Methods:
o Prevention: Design the system in a way that one of the necessary conditions for
deadlock is eliminated.
o Avoidance: Dynamically examine the resource-allocation state to ensure that a
circular wait condition can never exist.
o Detection and Recovery: Allow the system to enter a deadlock state, detect it,
and take action to recover.
o Ignorance (Ostrich Algorithm): Assume that deadlocks will not occur and
ignore the problem.

2. Insert and Delete Operations


1. Concurrency Issues:
o Insert Operations: Concurrent inserts can lead to issues like duplicate entries if
constraints are not properly enforced.
o Delete Operations: Deleting records that are referenced by other records can lead
to referential integrity violations.
2. Locking Mechanisms:
o Row-level Locking: Allows multiple transactions to insert or delete rows
concurrently without interfering with each other.
o Table-level Locking: Prevents other transactions from accessing the table during
an insert or delete operation, ensuring data consistency but reducing concurrency.
3. Best Practices:
o Use of Constraints: Implement primary keys, foreign keys, and unique
constraints to maintain data integrity.
o Transaction Management: Ensure that insert and delete operations are atomic,
consistent, isolated, and durable (ACID properties).
o Optimistic Concurrency Control: Assume that multiple transactions can
frequently complete without interfering with each other and only check for
conflicts at the end.

3. Weak Levels of Consistency


1. Definition:
o Weak consistency models allow for temporary discrepancies between different
replicas of data, prioritizing availability and partition tolerance over immediate
consistency.
2. Types of Weak Consistency Models:
o Eventual Consistency: Guarantees that if no new updates are made to a given
data item, eventually all accesses to that item will return the last updated value.
o Causal Consistency: Ensures that operations that are causally related are seen by
all nodes in the same order.
o Session Consistency: Guarantees that within a single session, a user will always
see their own writes.
3. Advantages:
o High Availability: Systems can continue to operate even if some nodes are
unavailable.
o Scalability: Easier to scale out across multiple nodes or data centers.
o Performance: Can lead to lower latency and higher throughput.
4. Disadvantages:
o Stale Reads: Users may read outdated data.
o Complexity: Developers must handle potential inconsistencies in application
logic.
o Conflict Resolution: Requires mechanisms to resolve conflicts when
discrepancies occur.

1. Data Control Language (DCL)


1.1 GRANT Command

 Purpose: Assigns specific privileges to a user or role, enabling them to perform certain
operations on database objects.
 Syntax:

GRANT privilege_type
ON object_name
TO user_or_role;

 Example:

GRANT SELECT, INSERT


ON Employees
TO JohnDoe;

1.2 REVOKE Command

 Purpose: Removes previously granted privileges from a user or role, restricting their
access to certain operationsSyntax:

REVOKE privilege_type
ON object_name
FROM user_or_role;

 Example:

REVOKE INSERT
ON Employees
FROM JohnDoe;

2. Transaction Control Language (TCL)


2.1 COMMIT Command

 Purpose: Permanently saves all changes made during the current transaction to the
database.(
 Syntax:
COMMIT;

 Example:

BEGIN TRANSACTION;
UPDATE Employees
SET Salary = Salary * 1.10
WHERE Department = 'Sales';
COMMIT;

2.2 ROLLBACK Command

 Purpose: Undoes all changes made during the current transaction, reverting the database
to its last committed state
 Syntax:

ROLLBACK;

 Example:

BEGIN TRANSACTION;
DELETE FROM Employees
WHERE EmployeeID = 101;
ROLLBACK;

2.3 SAVEPOINT Command

 Purpose: Sets a point within a transaction to which you can later roll back without
affecting the entire transaction.
 Syntax:

SAVEPOINT savepoint_name;

 Example:

BEGIN TRANSACTION;
UPDATE Employees
SET Salary = Salary * 1.05
WHERE Department = 'HR';
SAVEPOINT BeforeBonus;
UPDATE Employees
SET Salary = Salary * 1.10
WHERE Department = 'IT';
ROLLBACK TO BeforeBonus;
COMMIT;
1. SQL Aggregate Functions
Aggregate functions perform calculations on a set of values and return a single value. They are
often used with the GROUP BY clause to group rows that have the same values in specified
columns.

1.1 COUNT()

 Purpose: Counts the number of rows in a set.


 Syntax:

SELECT COUNT(column_name) FROM table_name WHERE condition;

 Example:

SELECT COUNT(*) AS TotalEmployees FROM Employee;

This query returns the total number of employees in the Employee table.

1.2 SUM()

 Purpose: Calculates the total sum of a numeric column.


 Syntax:

SELECT SUM(column_name) FROM table_name WHERE condition;

 Example:

SELECT SUM(Salary) AS TotalSalary FROM Employee;

This query calculates the total salary of all employees

1.3 AVG()

 Purpose: Returns the average value of a numeric column.


 Syntax:

SELECT AVG(column_name) FROM table_name WHERE condition;

 Example:

SELECT AVG(Salary) AS AverageSalary FROM Employee;

This query calculates the average salary of all employees


1.4 MIN() and MAX()

 Purpose: MIN() returns the smallest value, and MAX() returns the largest value in a set.
 Syntax:

SELECT MIN(column_name), MAX(column_name) FROM table_name WHERE condition;

 Example:

SELECT MIN(Salary) AS MinSalary, MAX(Salary) AS MaxSalary FROM Employee;

This query finds the minimum and maximum salary among all employees. ([Link],
[Link])

1.5 GROUP_CONCAT() or STRING_AGG()

 Purpose: Concatenates values from multiple rows into a single string.


 Syntax:

SELECT GROUP_CONCAT(column_name) FROM table_name WHERE condition;

 Example:

SELECT GROUP_CONCAT(Name) AS EmployeeNames FROM Employee;

This query concatenates all employee names into a single string.

2. SQL Character Functions


Character functions perform operations on string data types.

2.1 LENGTH() or CHAR_LENGTH()

 Purpose: Returns the number of characters in a string.


 Syntax:

SELECT LENGTH(column_name) FROM table_name WHERE condition;

 Example:

SELECT LENGTH(Name) AS NameLength FROM Employee;

This query returns the length of each employee's name.


2.2 CONCAT()

 Purpose: Concatenates two or more strings into one.


 Syntax:

SELECT CONCAT(string1, string2, ...) FROM table_name WHERE condition;

 Example:

SELECT CONCAT(FirstName, ' ', LastName) AS FullName FROM Employee;

This query combines the first and last names of employees into a full name.

2.3 UPPER() and LOWER()

 Purpose: Converts all characters in a string to uppercase (UPPER) or lowercase


(LOWER).
 Syntax:

SELECT UPPER(column_name) FROM table_name WHERE condition;


SELECT LOWER(column_name) FROM table_name WHERE condition;

 Example:

SELECT UPPER(Name) AS UpperCaseName FROM Employee;


SELECT LOWER(Name) AS LowerCaseName FROM Employee;

These queries convert the employee names to uppercase and lowercase, respectively.
([Link])

2.4 TRIM()

 Purpose: Removes leading and trailing spaces from a string.


 Syntax:

SELECT TRIM(column_name) FROM table_name WHERE condition;

 Example:

SELECT TRIM(Name) AS TrimmedName FROM Employee;

This query removes any leading or trailing spaces from employee names.

2.5 REPLACE()

 Purpose: Replaces all occurrences of a substring within a string with another substring.
 Syntax:

SELECT REPLACE(column_name, 'old_substring', 'new_substring') FROM


table_name WHERE condition;

 Example:

SELECT REPLACE(Address, 'Street', 'St.') AS ShortenedAddress FROM Employee;

Certainly! Here's a structured overview of SQL Numeric Functions and Date & Time
Functions, including their implementations and examples:

1. SQL Numeric Functions


Numeric functions in SQL are used to perform operations on numeric data types such as INT,
FLOAT, DECIMAL, and DOUBLE. They help in manipulating numbers for various calculations,
rounding, and formatting.

1.1 ABS()

 Purpose: Returns the absolute value of a number, removing any negative sign.
 Syntax:

SELECT ABS(number);

 Example:

SELECT ABS(-25); -- Output: 25

1.2 CEIL() or CEILING()

 Purpose: Rounds a number up to the nearest integer, regardless of the decimal part.
 Syntax:

SELECT CEIL(number);

 Example:

SELECT CEIL(12.34); -- Output: 13

1.3 FLOOR()

 Purpose: Rounds a number down to the nearest integer, ignoring the decimal part.
 Syntax:

SELECT FLOOR(number);

 Example:

SELECT FLOOR(12.98); -- Output: 12

1.4 ROUND()

 Purpose: Rounds a number to a specified number of decimal places.


 Syntax:

SELECT ROUND(number, decimal_places);

 Example:

SELECT ROUND(15.6789, 2); -- Output: 15.68

1.5 TRUNCATE()

 Purpose: Removes the decimal portion of a number without rounding it.


 Syntax:

SELECT TRUNCATE(number, decimal_places);

 Example:

SELECT TRUNCATE(12.98765, 2); -- Output: 12.98

1.6 MOD()

 Purpose: Returns the remainder of a division operation (i.e., computes the modulus).
 Syntax:

SELECT MOD(dividend, divisor);

 Example:

SELECT MOD(10, 3); -- Output: 1

1.7 POWER()

 Purpose: Raises a number to the power of another number.


 Syntax:
SELECT POWER(base, exponent);

 Example:

SELECT POWER(2, 3); -- Output: 8

1.8 SQRT()

 Purpose: Returns the square root of a number.


 Syntax:

SELECT SQRT(number);

 Example:

SELECT SQRT(16); -- Output: 4

1.9 EXP()

 Purpose: Returns the value of e raised to the power of a specified number, where e is the
base of the natural logarithm (approximately 2.71828).
 Syntax:

SELECT EXP(number);

 Example:

SELECT EXP(1); -- Output: 2.718281828459045

1.10 LOG()

 Purpose: Returns the natural logarithm (base e) of a number. You can also use
LOG(base, number) to calculate the logarithm of a number with a custom base.
 Syntax:

SELECT LOG(number);
SELECT LOG(base, number);

 Example:

SELECT LOG(100); -- Output: 4.605170186

1.11 RAND()

 Purpose: Generates a random floating-point number between 0 and 1.


 Syntax:
SELECT RAND();

 Example:

SELECT RAND(); -- Output: 0.287372

2. SQL Date & Time Functions


Date and time functions in SQL are used to perform operations on date and time values. These
functions help in extracting parts of dates, formatting, and performing calculations.(

2.1 NOW()

 Purpose: Returns the current date and time based on the server’s time zone.
 Syntax:

SELECT NOW();

 Example:

SELECT NOW(); -- Output: 2024-08-12 [Link]

2.2 CURDATE()

 Purpose: Returns the current date in the YYYY-MM-DD format.


 Syntax:

SELECT CURDATE();

 Example:

SELECT CURDATE(); -- Output: 2024-08-12

2.3 CURTIME()

 Purpose: Returns the current time in the HH:MM:SS format.


 Syntax:

SELECT CURTIME();

 Example:

SELECT CURTIME(); -- Output: [Link]


2.4 DATE_FORMAT()

 Purpose: Formats a date according to a specified format.

Certainly! Here's a structured overview of the Recovery System in Database Management


Systems (DBMS), organized with numbered headings and bullet points for clarity:

1. Failure Classification
 Transaction Failures:
o Logical Errors: Occurs when a transaction cannot complete due to internal
errors, such as invalid operations or constraints violations.
o System Errors: Happens when the DBMS must terminate an active transaction
due to errors like deadlocks or resource unavailability.
 System Crashes:
o Caused by hardware or software failures, leading to abrupt termination of the
DBMS.
o Assumed that non-volatile storage contents are not corrupted during a system
crash.
 Disk Failures:
o Involves physical damage to storage media, potentially destroying data.
o Such failures are detectable using checksums and other integrity checks.

2. Storage Structure
 Volatile Storage:
o Does not survive system crashes.
o Examples include main memory and cache memory.(
 Non-Volatile Storage:
o Survives system crashes.
o Examples include disk drives, tapes, and flash memory.
 Stable Storage:
o A theoretical concept that survives all failures.
o Implemented by maintaining multiple copies of data on separate non-volatile
media.

3. Recovery and Atomicity


 Atomicity:
o Ensures that a transaction is either fully completed or not executed at all.
o Implemented through logging mechanisms and recovery protocols.
 Recovery Techniques:
o Immediate Update: Updates are applied to the database immediately, with logs
used to undo changes if necessary.
o Deferred Update: Updates are applied to the database only after a transaction
commits, reducing the need for undo operations.

4. Log-Based Recovery
 Transaction Log:
o Records all operations performed by transactions, including start, commit, and
abort events.
o Entries include:
 <Tn, Start>: Transaction Tn starts.
 <Tn, X, V1, V2>: Transaction Tn changes data item X from value V1 to
V2.
 <Tn, Commit>: Transaction Tn commits.
 <Tn, Abort>: Transaction Tn aborts.
 Recovery Process:
o Upon system crash, the DBMS uses the log to:
 Undo: Revert changes made by transactions that did not commit.
 Redo: Reapply changes made by transactions that committed.
 Checkpointing:
o A mechanism to reduce recovery time by saving the state of the database and
active transactions at a specific point.
o Helps in limiting the amount of log data to be processed during recovery.

5. Recovery with Concurrent Transactions


 Challenges:
o Ensuring consistency and isolation when multiple transactions execute
concurrently.
o Managing interleaved operations and potential conflicts.
 Techniques:
o Locking Protocols: Prevent conflicts by ensuring that transactions acquire
appropriate locks on data items.
o Timestamp Ordering: Assigns timestamps to transactions to determine the
serializability order.
o Optimistic Concurrency Control: Allows transactions to execute without
restrictions and checks for conflicts before commit.

6. Buffer Management
 Buffer Pool:
o A portion of memory where data pages are temporarily stored before being
written to disk.
o Improves performance by reducing disk I/O operations.
 Dirty Pages:
o Pages that have been modified in memory but not yet written to disk.
o Must be managed carefully to ensure data integrity during recovery.
 Replacement Policies:
o Determine which pages to evict from the buffer pool when space is needed.
o Common policies include Least Recently Used (LRU) and First-In-First-Out
(FIFO).

7. Failure with Loss of Non-Volatile Storage


 Scenario:
o Involves catastrophic failures where the entire storage system becomes
inaccessible or corrupted.
 Recovery Strategies:
o Remote Backup Systems: Maintain copies of the database at remote locations to
protect against data loss.
o ARIES Recovery Algorithm: A sophisticated recovery algorithm that supports
high concurrency and efficient recovery.
o Shadow Paging: Uses a copy of the database to ensure atomicity and durability.

1. Distributed Databases
 Definition: A distributed database is a collection of multiple, logically interrelated
databases distributed over a computer network. Each site in the network can function
independently but cooperatively.
 Characteristics:
o Data is stored across multiple physical locations.
o Sites can be geographically dispersed.
o Each site has its own local DBMS.
o Users access data transparently, as if it were a single database.

2. Data Fragmentation
 Purpose: To improve performance and manageability by dividing a large database into
smaller, manageable pieces.
 Types:
o Horizontal Fragmentation: Dividing a table into rows based on certain
conditions.
o Vertical Fragmentation: Dividing a table into columns, typically to separate
frequently accessed attributes.
o Hybrid Fragmentation: A combination of horizontal and vertical fragmentation.
 Advantages:
o Improved query performance by localizing data.
o Enhanced security and privacy.
o Efficient use of storage resources.

3. Replication and Allocation Techniques


 Replication:
o Involves creating copies of data and storing them at multiple sites.
o Types:
 Full Replication: Every site stores a complete copy of the database.
 Partial Replication: Only certain data items are replicated at specific
sites.
o Advantages:
 Increased data availability and fault tolerance.
 Reduced access time for frequently accessed data.
o Challenges:
 Maintaining consistency across replicas.
 Increased storage and update overhead.
 Data Allocation:
o Determines where data fragments and replicas should be stored.
o Strategies:
 Centralized Allocation: Data is stored at a central location.
 Decentralized Allocation: Data is distributed across multiple sites based
on access patterns.
o Objective: Optimize performance, reduce latency, and balance load across the
system.

4. Semi Join
 Definition: A semi join is a type of join operation where only the matching rows from
one table are returned, without duplicating data from the other table.
 Usage in Distributed Databases:
o Reduces the amount of data transferred between sites.
o Enhances performance in distributed query processing.
 Example:
o Given two tables, A and B, a semi join returns rows from A that have matching
rows in B, but only the columns from A.

5. Homogeneous and Heterogeneous Databases


 Homogeneous Databases:
o All sites use the same DBMS and operating system.
o Easier to manage and maintain due to uniformity.
o Simplifies query processing and transaction management.
 Heterogeneous Databases:
o Sites may use different DBMS, operating systems, or data models.
o Requires middleware or translation mechanisms for interoperability.
o Offers flexibility but introduces complexity in integration and management.

6. Distributed Data Storage


 Methods:
o Replication: Storing copies of data at multiple sites.
o Fragmentation: Dividing data into fragments and storing them at different sites.
o Hybrid: Combining replication and fragmentation strategies.
 Objectives:
o Enhance data availability and reliability.
o Improve query performance by localizing data.
o Ensure fault tolerance and disaster recovery.
7. Distributed Transactions
 Definition: A distributed transaction involves multiple operations that may span across
different sites in a distributed database system.
 ACID Properties:
o Atomicity: All operations in the transaction are completed; otherwise, none are.
o Consistency: The database transitions from one valid state to another.
o Isolation: Transactions are executed independently without interference.
o Durability: Once a transaction is committed, it persists even in the event of a
system failure.
 Challenges:
o Ensuring atomicity and consistency across distributed sites.
o Handling network failures and communication delays.
o Implementing distributed concurrency control and recovery mechanisms.

1. Evolution from Collaborative to Cloud Computing


 Collaborative Computing: Focused on enabling users to work together over a network,
sharing resources and information.
 Client-Server Computing: Introduced a model where clients request services and
resources from centralized servers, improving resource management and scalability.
 Peer-to-Peer (P2P) Computing: Allowed direct sharing of resources between systems
without a central server, enhancing redundancy and fault tolerance.
 Distributed Computing: Involved multiple computers working together to solve a
problem, sharing tasks across a network to improve performance and reliability.
 Grid Computing: Extended distributed computing by linking geographically dispersed
resources to work on large-scale problems, often in scientific research.
 Cloud Computing: Evolved from these models to provide on-demand access to
computing resources over the internet, offering scalability, flexibility, and cost-
efficiency.

2. Introduction to Computing Models


2.1 Client-Server Computing

 Definition: A network architecture where clients request services and resources from
centralized servers.
 Characteristics:
o Centralized management of resources.
o Clients initiate requests; servers provide responses.
o Common in web applications, email systems, and databases.

2.2 Peer-to-Peer (P2P) Computing

 Definition: A decentralized network model where each node (peer) can act as both a
client and a server.
 Characteristics:
o Direct sharing of resources between peers.
o No central server; each node is equal.
o Used in file sharing, messaging apps, and distributed applications.

2.3 Distributed Computing

 Definition: A model where multiple computers work together over a network to achieve
a common goal.
 Characteristics:
o Tasks are divided among multiple machines.
o Requires coordination and communication between systems.
o Enhances performance and fault tolerance.

2.4 Grid Computing

 Definition: A form of distributed computing that connects geographically dispersed


resources to work on large-scale tasks.
 Characteristics:
o Utilizes idle resources across multiple locations.
o Common in scientific research and simulations.
o Requires specialized middleware for resource management.

2.5 Cloud Computing

 Definition: Provides on-demand access to computing resources over the internet,


allowing users to scale services as needed.
 Characteristics:
o Offers services like IaaS, PaaS, and SaaS.
o Pay-as-you-go pricing model.
o Enables remote access and collaboration.

3. Functioning of Cloud Computing


 Virtualization: Enables the creation of virtual instances of servers, storage, and
networks, allowing efficient resource utilization.
 Resource Pooling: Aggregates computing resources to serve multiple clients, providing
scalability and flexibility.
 On-Demand Self-Service: Users can provision and manage resources as needed, without
human intervention.
 Broad Network Access: Services are accessible over the network, promoting remote
access and collaboration.
 Measured Service: Resources are metered, and users are billed based on usage,
optimizing cost efficiency.

4. Differences Between Distributed Computing and Cloud


Computing
Feature Distributed Computing Cloud Computing

Resource Resources are managed by individual Resources are managed by cloud service
Management systems. providers.

Scaling requires manual addition of


Scalability Scales automatically based on demand.
resources.

Often involves upfront investment in


Cost Model Operates on a pay-as-you-go model.
hardware.

Depends on system design and Built-in redundancy and failover


Fault Tolerance
redundancy. mechanisms.

Access is typically limited to local Accessible over the internet from


Accessibility
networks. anywhere.

1. PL/SQL Blocks
PL/SQL programs are organized into blocks, which are the fundamental units of execution. Each
block can be anonymous (unnamed) or named (such as procedures or functions). A typical
PL/SQL block structure includes:

 DECLARE: Optional section where variables, constants, and cursors are declared.
 BEGIN: Mandatory section containing executable statements.
 EXCEPTION: Optional section for handling runtime errors.
 END: Marks the end of the block.

This structure allows for modular and reusable code within Oracle databases.

2. Conditional Statements
PL/SQL provides several conditional constructs to control the flow of execution:

 IF-THEN: Executes a block of code if a specified condition is true.


 IF-THEN-ELSE: Executes one block of code if the condition is true, and another if
false.
 IF-THEN-ELSIF-ELSE: Allows multiple conditions to be evaluated sequentially.

These statements enable decision-making capabilities within PL/SQL programs.

3. Loops
PL/SQL supports various looping mechanisms:

 LOOP: Repeats a block of code indefinitely until an EXIT condition is met.


 WHILE LOOP: Repeats a block of code as long as a specified condition is true.
 FOR LOOP: Iterates over a range of values, executing a block of code for each value.

These loops facilitate repetitive tasks and iterations within PL/SQL programs.

4. Cursors
Cursors are pointers to context areas that store the result set of a query. They allow for row-by-
row processing of SQL queries. There are two types:

 Implicit Cursors: Automatically created by Oracle for single-row queries.


 Explicit Cursors: Defined by the programmer for multi-row queries, providing more
control over the context area.

Using cursors, developers can fetch and process individual rows from a result set efficiently.
5. Triggers
Triggers are stored procedures that automatically execute (or "fire") in response to certain events
on a particular table or view. They are used for enforcing business rules, auditing data changes,
and maintaining referential integrity.

Triggers can be set to fire before or after events such as INSERT, UPDATE, or DELETE
operations. They can also be defined to execute for each row affected or once per statement.

You might also like