0% found this document useful (0 votes)
16 views119 pages

Key Characteristics of DBMS Explained

A Database Management System (DBMS) is essential for managing data efficiently, featuring characteristics like data storage, abstraction, minimized redundancy, consistency, and security. It differs from traditional file systems in structure, data handling, and support for multi-user access. The document also discusses approaches to building databases, the importance of data models, and DBMS architecture, highlighting the need for organized, secure, and easily accessible data management.

Uploaded by

moxob36870
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views119 pages

Key Characteristics of DBMS Explained

A Database Management System (DBMS) is essential for managing data efficiently, featuring characteristics like data storage, abstraction, minimized redundancy, consistency, and security. It differs from traditional file systems in structure, data handling, and support for multi-user access. The document also discusses approaches to building databases, the importance of data models, and DBMS architecture, highlighting the need for organized, secure, and easily accessible data management.

Uploaded by

moxob36870
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

1.

Write the characteristics of DBMS


Characteristics of DBMS
A Database Management System (DBMS) is powerful software that helps in managing data
systematically and efficiently. Unlike traditional file systems, DBMS has several unique
characteristics that make it an essential component for handling large volumes of
information in modern businesses and applications.
Let's explore the main characteristics of DBMS in detail:

1. Data Storage and Management


A DBMS stores data digitally inside structured repositories known as databases.
This data is organized using tables, schemas, and views, making storage more systematic
compared to scattered files in a file system.
• Example: Student records stored in tables, not random text files.

2. Data Abstraction and Independence


DBMS provides a clear separation between how data is stored and how it is accessed by
users.
It offers three levels of abstraction:
• Physical Level: How data is stored (hardware level).
• Logical Level: What data is stored and its relationships.
• View Level: How users interact with the data.
This separation gives data independence, meaning changes in the storage structure
don’t affect how users access the data.

3. Minimized Data Redundancy


Data redundancy means having duplicate copies of the same data. DBMS helps in reducing
redundancy through normalization techniques.
• In a file system, the same student’s address might be stored in many files.
• In DBMS, the address is stored only once and referred wherever needed.
This ensures efficient storage and avoids confusion due to multiple copies.

4. Data Consistency
Since data redundancy is minimized, data consistency is automatically improved.
This means that whenever data is updated in one place, all references to that data show the
updated value.
• Example: If a student's phone number changes, it needs to be updated only once.

5. Data Security
DBMS provides strong security mechanisms to protect sensitive data.
• Only authorized users can access or modify the database.
• Security is maintained using authentication, authorization, roles, and permissions.
• Sensitive data like salaries or health records can be protected using encryption.

6. Multi-User Access and Concurrency Control


A DBMS supports multiple users accessing the database at the same time without any data
loss or confusion.
It manages concurrency control to ensure:
• No two users update the same data in a conflicting way.
• Data remains accurate even during simultaneous operations.

7. Backup and Recovery


DBMS provides built-in features for automatic backup of data and recovery in case of
hardware/software failure.
• Regular snapshots of the database are taken.
• In case of a crash, the DBMS can restore the data to its last consistent state.
This protects against data loss due to disasters like system crashes, power failures, etc.

8. ACID Properties
DBMS follows four critical properties to ensure reliable transaction processing, known as
ACID:
• Atomicity – A transaction is either fully completed or fully failed.
• Consistency – Transactions take the database from one valid state to another.
• Isolation – Transactions are independent of each other.
• Durability – Once a transaction is committed, it remains so, even after a crash.
These properties ensure safe and correct database operations.

9. Ease of Data Retrieval


DBMS provides easy and powerful querying languages like SQL (Structured Query
Language).
• Users can quickly search, filter, and retrieve data based on specific conditions without
needing to know technical storage details.

10. Reduced Application Development Time


By managing data handling internally, DBMS reduces the burden on application developers.
Developers can focus on business logic, without worrying about storage, security, or
concurrency.

11. Support for Data Integrity


DBMS automatically enforces integrity constraints to maintain accuracy and validity of data.
For example:
• Age of a person cannot be negative.
• Email ID must be unique.
Integrity constraints protect against incorrect or illogical data being stored.

12. Provides Different Views to Different Users


DBMS can create different views of the database for different users based on their needs.
• Example:
o An HR manager sees employee salaries and designations.
o A finance manager sees project budgets and expenses.
Thus, every user gets a customized experience without affecting the overall database.

2. Explain File system vs Database system


File System vs Database System
Managing data has always been a major task in computing. In earlier days, before the
concept of databases, file systems were used to store and organize data. As the demand for
better efficiency, security, and scalability grew, database systems were developed.
Both File Systems and Database Management Systems (DBMS) are methods of storing
information on storage devices like hard disks — but they are very different in structure,
features, and how they handle data.
Let’s dive deep into understanding the differences.

What is a File System?


A File System is simply a method of storing and organizing files on storage media.
Each file holds data in a structured or unstructured way and is managed directly by the
operating system.
• Files are individual units.
• There’s no connection or linking between files unless manually handled by the programmer.
• Common examples of file systems include NTFS, FAT32, ext3, and ext4.
Example:
In a school, each student’s data (marks, attendance, fee details) is stored in different files like
[Link], [Link], etc. If you want to find a student’s overall performance, you have
to open and manually link these files.

What is a Database System (DBMS)?


A Database Management System (DBMS) is a software application that helps in storing,
retrieving, manipulating, and managing data in a structured way.
• Data is stored in tables, organized into rows and columns.
• Tables are interconnected using relationships.
• SQL queries are used to retrieve and manipulate data efficiently.
• Examples: MySQL, Oracle, PostgreSQL, MongoDB.
Example:
In the same school, a database would have a single system where all student information
(marks, attendance, fee) is stored in interconnected tables. A single SQL query can fetch a
complete report card.

Key Differences Between File System and Database System


Feature File System Database System (DBMS)

Stores data in individual Stores data in structured


Data Storage tables.
files.

Data High (same data might Low (normalization minimizes


Redundancy exist in many files). duplication).

Data Difficult to maintain across High, automatic consistency


Consistency files. through DBMS.

Enforced by DBMS using rules


Data Integrity Manual checking required. and constraints.

Limited, mostly handled by Strong access control and


Data Security encryption.
OS permissions.
Feature File System Database System (DBMS)

Multi-user Managed through


Difficult, leads to conflicts. concurrency control.
Access
Query Manual coding needed for Easy queries using SQL.
Processing data retrieval.
Backup and Automated backup and
Manual backup processes. recovery features.
Recovery
Higher setup cost but better
Cost Less costly initially. long-term value.

Complex Easily managed by DBMS.


Difficult to implement.
Transactions

3. What are the Approaches to building a database


Approaches to Building a Database
Building a database is a structured and careful process that involves designing, organizing,
programming, and managing the data efficiently.
There are multiple approaches that guide how a database should be created to ensure data
quality, security, consistency, and ease of access.
Here are the main approaches to building a database:

1. Database Management System (DBMS)


A Database Management System (DBMS) is a software application that plays a central role
in database building.
It manages databases by providing tools for:
• Storing large volumes of data,
• Retrieving information easily,
• Manipulating and updating data securely.
A DBMS is important because it controls data redundancy, ensures data consistency,
and organizes all the data in a logical structure.

Functions of a DBMS:
Function Description

Data DBMS allows structuring of data logically into tables, views,


Organization schemas, etc.

DBMS ensures that rules and constraints (like valid data ranges,
Data Integrity unique keys) are followed.

DBMS controls who can access or modify data through


Data Security authentication, authorization, and encryption.

Concurrent DBMS handles multiple users accessing data at the same time
Access without conflicts (Concurrency control).
Function Description

Backup and DBMS provides tools to create data backups and recover from
Recovery system failures.

Types of DBMS:
• Relational DBMS (RDBMS) — Data in tables (e.g., MySQL, PostgreSQL).
• Hierarchical DBMS — Data in tree structure.
• Network DBMS — Data in graph structure.
• Object-Oriented DBMS (OODBMS) — Data as objects.
Example: MySQL, Oracle, SQL Server, MongoDB.

2. Data Modeling
Data Modeling is the design phase of building a database.
It is the process of developing a model that describes:
• What data will be stored,
• How data elements are related,
• What rules and standards (naming, data types) will be followed.
Good data modeling ensures high data quality by maintaining:
• Consistency in naming conventions,
• Default values for fields,
• Clear semantics (clear meaning behind data),
• Strong security rules.
Data models can be visualized using diagrams like Entity-Relationship Diagrams (ERD),
which show entities, attributes, and relationships.

Types of Data Models:


Type Purpose

Conceptual Model High-level view (entities and relationships only).

Logical Model Detailed structure (tables, fields, data types).

Physical Model How data is stored physically in memory and disks.

Example:
Modeling a University Database with Entities like Student, Courses, Professors, and their
Relationships.

3. Database Programming
Once data is modeled and the DBMS is set up, database programming is required to interact
with the database efficiently.
There are three major approaches to programming with databases:

➔ a) Navigational Approach
• In this approach, programmers navigate through records using pointers.
• It provides maximum control over how data is accessed and processed.
• Complex and difficult to program.
Example: Used in Hierarchical and Network databases where you manually follow links
between records.

➔ b) SQL-Based Approach (Structured Query Language)


• SQL is the most popular and easy-to-use method.
• SQL provides a high-level language to query, insert, update, and delete data.
• It is widely supported by almost all relational database systems.
• Performance issues may occur with very complex queries or very large datasets, but it is
manageable.
Example:
SELECT * FROM Students WHERE Age > 20;

➔ c) Object-Oriented Approach
• In this method, data is treated as objects, just like in object-oriented programming
languages (like Java or C++).
• Objects contain both data and methods.
• Very useful for handling complex, real-world data like images, audio, videos.
Example:
An object Student with attributes like Name and Age and methods like enrollCourse() and
updateProfile().

4. Explain about Data Model

What is a Data Model?


In simple words, a Data Model is a framework that defines how data is connected, stored,
and processed inside a database.
A data model provides a logical structure to the data before it is actually stored.
It helps database designers, developers, and even business people to understand and
organize data properly.

Importance of Data Models


• Organizes the data efficiently.
• Defines relationships between data items.
• Avoids data redundancy.
• Helps in proper database design and management.
• Acts as a blueprint for creating a real database.

Key Elements of a Data Model


• Entity: Any object, person, place, or concept (e.g., Student, Employee).
• Attribute: Properties that describe the entity (e.g., Student Name, Age).
• Relationship: Connection between two or more entities (e.g., Student Enrolls in Course).
• Constraints: Rules to ensure data accuracy and integrity (e.g., Age must be positive).

What is a Data Model?


A Data Model is like a blueprint for designing a database.
It defines how data is connected, stored, organized, and retrieved.
Before building a real database, a data model is created to ensure that everyone
(developers, designers, business analysts) understands how the data will be arranged.

Why Data Models are Important


• Organize data properly.
• Avoid data redundancy (duplicate data).
• Ensure data integrity (correctness).
• Help in database design.
• Act as a communication tool between technical and non-technical people.

Key Concepts in Data Modeling


Term Meaning

Entity Real-world object (e.g., Student, Employee).

Attribute Property of an entity (e.g., Name, Age).

Relationship Link between two or more entities (e.g., Student ENROLLS in Course).

Constraint Rules to maintain data accuracy (e.g., Age cannot be negative).

Levels of Data Modeling


Level Description

Conceptual Data High-level design showing entities and relationships.


Model
Shows detailed structure — tables, columns, keys — but not
Logical Data Model storage details.

Shows exactly how data will be stored in memory/disk (files,


Physical Data Model indexes).

Types of Data Models (ALL 5 Models)


Now, let's discuss all five major data models in detail:

1. Hierarchical Model
➔ Structure:
• Data is organized in a tree-like structure.
• Parent-Child relationship.
• Each child has only one parent.
➔ Example:
➔ Features:
• Fast data retrieval when hierarchy is simple.
• Difficult to handle many-to-many relationships.
• Poor flexibility.

2. Network Model
➔ Structure:
• Data is organized like a graph.
• Many-to-many relationships are allowed.
• Records are connected through pointers.
➔ Example:

➔ Features:
• Flexible than hierarchical.
• Complex to maintain.
• Faster access for connected data.

3. Relational Model (Most Popular)


➔ Structure:
• Data is stored in tables (rows and columns).
• Tables are linked using keys (Primary Key, Foreign Key).
➔ Example (Table):
Student_ID Name Course_ID

101 John CSE101

102 Emma CSE102


➔ Features:
• Easy to use and understand.
• Powerful query language (SQL).
• Data independence and flexibility.
Most modern systems use this model — like MySQL, PostgreSQL, Oracle.

4. Object-Oriented Model
➔ Structure:
• Data stored as objects.
• Objects contain both data (fields) and methods (functions).
➔ Example:

➔ Features:
• Good for complex systems (CAD, multimedia).
• Supports inheritance, encapsulation, polymorphism.

5. Entity-Relationship (ER) Model


➔ Structure:
• Focuses on entities, their attributes, and relationships.
• Represented using ER diagrams.
• High-level model used during early design stages.
➔ Example:
• Entities: Student, Course
• Relationship: Student ENROLLS in Course
➔ Diagram (ER Diagram):
➔ Features:
• Easy to understand.
• Helps in database planning before building tables.
• Acts as a bridge between business needs and technical design.

5. Discuss about DBMS system architecture


Database Architecture: An Overview
The design of a Database Management System (DBMS) is fundamentally linked to its
architecture. In contemporary systems, a common approach is to use a client/server
architecture.
• This architecture is designed to support numerous interconnected components, including
PCs, web servers, and database servers, all communicating over networks.
• At its core, the client/server model involves multiple client machines (like PCs or
workstations) that connect to a central server via a network.
• Ultimately, the specific DBMS architecture is determined by how users connect to the
database to submit and process their requests.
Database Architecture Tiers
Database architecture can be categorized by the number of tiers involved. While it can be
viewed as single-tier or multi-tier, the two primary logical architectures are:

• 2-Tier Architecture
• 3-Tier Architecture
1-Tier Architecture
• In a 1-tier architecture, the database is directly accessible to the user.
• This means the user interacts with the DBMS directly on the same machine.
• Any modifications made by the user are applied directly to the database itself.
• This architecture isn't considered user-friendly for typical end-users.
• The 1-tier model is primarily used for local application development, where developers need
direct and immediate database access for quick development and testing.

2-Tier Architecture
• The 2-tier architecture closely resembles the basic client-server model.
• In this setup, client-side applications communicate directly with the database residing on the
server.
• This communication is facilitated by APIs (Application Programming Interfaces) such as ODBC
(Open Database Connectivity) and JDBC (Java Database Connectivity).
• The client side handles the user interface and application programs.
• The server is responsible for core database functionalities, including query processing and
transaction management.
• To enable communication, the client-side application establishes a connection with the
server.

3-Tier Architecture
• The 3-tier architecture introduces an intermediate layer between the client and the server.
• In this architecture, the client does not directly communicate with the database server.
• Instead, the client-side application interacts with an application server, which then
communicates with the database system.
• This design provides a level of abstraction, where the end-user is unaware of the database's
existence beyond the application server.
• Similarly, the database has no direct knowledge of users beyond the application server.
• The 3-tier architecture is commonly employed in large-scale web applications.

6. Discuss about Data Independence


Data independence is a property of a database management system by which we can change
the database schema at one level of the database system without changing the database
schema at the next higher level. In this article, we will learn in full detail about data
independence and will also see its types. If you read it completely, you will understand it
easily.
What is Data Independence in DBMS?
In the context of a database management system, data independence is the feature that
allows the schema of one layer of the database system to be changed without any impact on
the schema of the next higher level of the database system. ” Through data independence,
we can build an environment in which data is independent of all programs, and through the
three schema architectures, data independence will be more understandable. Data via two
card stencils along with centralized DBMS data is a form of transparency that has value for
someone.
It can be summed up as a sort of immunity of user applications that adjusts correctly and
does not change addresses, imparting the class of data and their order. I want the separate
applications not to be forced to deal with data representation and storage specifics because
this decreases quality and flexibility. DBMS permits you to see data with such a generalized
sight. It actually means that the ability to change the structure of the lower-level schema
without presenting the upper-level schema is called data

independence.
Types of Data Independence
There are two types of data independence.
• logical data independence
• Physical data independence
Logical Data Independence
• Changing the logical schema (conceptual level) without changing the external schema (view
level) is called logical data independence.
• It is used to keep the external schema separate from the logical schema.
• If we make any changes at the conceptual level of data, it does not affect the view level.
• This happens at the user interface level.
• For example, it is possible to add or delete new entities, attributes to the conceptual schema
without making any changes to the external schema.
Physical Data Independence
• Making changes to the physical schema without changing the logical schema is called
physical data independence.
• If we change the storage size of the database system server, it will not affect the conceptual
structure of the database.
• It is used to keep the conceptual level separate from the internal level.
• This happens at the logical interface level.
• Example – Changing the location of the database from C drive to D drive.
Difference Between Physical and Logical Data Independence
Logical Data Independence
Physical Data Independence

It mainly concerns how the data is It mainly concerns about changes to


stored in the system. the structure or data definition.

It is easier to achieve than logical It is difficult to achieve compared to


independence. physical independence.

To make changes at the physical level To make changes at the logical level,
we generally do not require changes we need to make changes at the
at the application program level. application level.

It tells about the internal schema. It tells about the conceptual schema.

Whenever the logical structure of


There may or may not be a need for the database has to be changed, the
changes to be made at the internal changes made at the logical level are
level to improve the structure. important.

Example- change in compression Example – adding/modifying or


technology, hashing deleting a new attribute.
algorithm, storage device etc.

Conclusion
The data independence property of the database is an expected property that relies on
separating the logical and physical aspects of storing and accessing data. This means that it is
easy to make structural modifications to the database without affecting the applications that
use it. This is a situation that impacts the capacity of the organization to remain adaptable in
the dynamic business environment, as well as making sure that the technological
advancements within the organization are interoperable over a long period of time.

7. What is an Entity? Explain different type of Entities.


Database management systems (DBMS) are large, integrated collections of data. They play an
important role in modern data management, helping agencies keep, retrieve, and manage
data effectively. At the core of any DBMS is the concept of entities, which is a basic concept
that refers to real-world devices or ideas inside a database. This article will explore the sector
of entities within a DBMS, providing an in-depth understanding of this fundamental concept
and its significance in database format.
Entity
An entity is a "thing" or "object" in the real world. An entity contains attributes, which
describe that entity. So anything about which we store information is called an entity. Entities
are recorded in the database and must be distinguishable, i.e., easily recognized from the
group.
For example: A student, An employee, or bank a/c, etc. all are entities.

Entity
Entity Set
An entity set is a collection of similar types of entities that share the same attributes.
For example: All students of a school are a entity set of Student entities.
Key Terminologies used in Entity Set:
• Attributes: Attributes are the houses or traits of an entity. They describe the data that may
be connected with an entity.
• Entity Type: A category or class of entities that share the same attributes is referred to as an
entity kind.
• Entity Instance: An entity example is a particular incidence or character entity within an
entity type. Each entity instance has a unique identity, often known as the number one key.
• Primary Key: A primary key is a unique identifier for every entity instance inside an entity
kind.
It can be classified into two types:
Strong Entity Set
Strong entity sets exist independently and each instance of a strong entity set has a unique
primary key.
Example of Strong Entity includes:
• Car Registration Number
• Model
• Name etc.

Strong Entity

Weak Entity Set


A weak entity cannot exist on its own; it is dependent on a strong entity to identify it. A weak
entity does not have a single primary key that uniquely identifies it; instead, it has a partial
key.
Example of Weak Entity Set includes:
• Laptop Color
• RAM, etc.

Weak Entity
Kinds of Entities
There are two types of Entities:
Tangible Entity
• A tangible entity is a physical object or a physical thing that can be physically touched, seen
or measured.
• It has a physical existence or can be seen directly.
• Examples of tangible entities are physical goods or physical products (for example, "inventory
items" in an inventory database) or people (for example, customers or employees).
Intangible Entity
• Intangible entities are abstract or conceptual objects that are not physically present but have
meaning in the database.
• They are typically defined by attributes or properties that are not directly visible.
• Examples of intangible entities include concepts or categories (such as “Product Categories”
or “Service Types”) and events or occurrences (such as appointments or transactions).
Entity Types in DBMS
• Strong Entity Types: These are entities that exist independently and have a completely
unique identifier.
• Weak Entity Types: These entities depend on another entity for his or her lifestyles and do
now not have a completely unique identifier on their own.
The Example of Strong and Weak Entity Types in DMBS is:

Example
• Associative Entity Types: These constitute relationships between or greater entities and
might have attributes in their own.
• Derived Entity Types: These entities are derived from different entities through a system or
calculation.
• Multi-Valued Entity Types: These entities will have more than one value for an characteristic.
Conclusion
In a database management system (DBMS), entities are the fundamental components that
represent the objects or concepts that exist in the real world. They are represented by
attributes, the primary key, and they can be either strong or weak. Together with
relationships, entities play an important role in structured data management and database
design.

8. What is an Attribute? Explain different types of Attributes


In a Database Management System (DBMS), an attribute is a property or characteristic of an
entity that is used to describe an entity. Essentially, it is a column in a table that holds data
values. An entity may contain any number of attributes. One of the attributes is considered
as the primary key. In an Entity-Relation model, attributes are represented in an elliptical
shape.
Example: Student has attributes like name, age, roll number, and many more. To uniquely
identify the student, we use the primary key as a roll number as it is not repeated. Attributes
can also be subdivided into another set of attributes. Attributes help define and organize the
data, making it easier to retrieve and manipulate information within the database. In this
article, we are going to discuss about different types of attributes in detail.
Types of Attributes
There are different types of attributes as discussed below-
• Simple Attribute
• Composite Attribute
• Single-Valued Attribute
• Multi-Valued Attribute
• Derived Attribute
• Complex Attribute
• Stored Attribute
• Key Attribute
• Null Attribute
• Descriptive Attribute
Attributes define the properties of entities in an ER model, and understanding their types is
essential for database design. To explore attributes and their applications further, the GATE
CS Self-Paced Course offers practical exercises on ER modeling
Let’s discuss each one by one:
1. Simple Attribute
An attribute that cannot be further subdivided into components is a simple attribute.
Example: The roll number of a student, the ID number of an employee, gender, and many
more.

Simple Attribute
[Link] Attribute
An attribute that can be split into components is a composite attribute.
Example: The address can be further split into house number, street number, city, state,
country, and pin code, the name can also be split into first name middle name, and last
name.

Composite Attribute
3. Single-Valued Attribute
The attribute which takes up only a single value for each entity instance is a single-valued
attribute.
Example: The age of a student, Aadhar card number.

Single-Valued
4. Multi-Valued Attribute
The attribute which takes up more than a single value for each entity instance is a multi-
valued attribute. And it is represented by double oval shape.
Example: Phone number of a student: Landline and mobile.

Multi-valued
5. Stored Attribute
The stored attribute are those attribute which doesn’t require any type of further update
since they are stored in the database.
Example: DOB(Date of birth) is the stored attribute.
Stored-attribute
6. Derived Attribute
An attribute that can be derived from other attributes is derived attributes. And it is
represented by dotted oval shape.
Example: Total and average marks of a student, age of an employee that is derived from date
of birth.

Derived-attribute
7. Complex Attribute
Those attributes, which can be formed by the nesting of composite and multi-valued
attributes, are called “Complex Attributes“. These attributes are rarely used in
DBMS(DataBase Management System). That’s why they are not so popular.
Example: Address because address contain composite value like street, city, state, PIN code
and also multivalued because one people has more that one house address.

Complex-attribute
Representation
Complex attributes are the nesting of two or more composite and multi-valued attributes.
Therefore, these multi-valued and composite attributes are called ‘Components’ of complex
attributes.
These components are grouped between parentheses ‘( )’ and multi-valued attributes
between curly braces ‘{ }’, Components are separated by commas ‘, ‘.
For example: let us consider a person having multiple phone numbers, emails, and an
address.
Here, phone number and email are examples of multi-valued attributes and address is an
example of the composite attribute, because it can be divided into house number, street,
city, and state.

Complex attributes
Components
Email, Phone number, Address(All are separated by commas and multi-valued components
are represented between curly braces).
Complex Attribute: Address_EmPhone(You can choose any name).
8. Key attribute
Key attributes are those attributes that can uniquely identify the entity in the entity set.
Example: Roll-No is the key attribute because it can uniquely identify the student.
9. Null Attribute
This attribute can take NULL value when entity does not have value for it.
Example –The ‘Net Banking Active Bin’ attribute gives weather particular customer having
net banking facility activated or not activated.
For bank which does not offer facility of net banking in customer table ‘Net Banking Active
Bin’ attribute is always null till Net banking facility is not activated as this attribute indicates
Bank offers net banking facility or does not offers.
10. Descriptive Attribute
Descriptive attribute give information about the relationship set example given below. Here
Start Date is the descriptive attribute of Manages relationship.

Descriptive-Attribute

9. What is Relationship? Explain different type of Relationships.


A relationship in DBMS is association between two or more entities. Entities are the real
world objects which hold data, participate in these relationships. Connection between
different entities are represented by a diamond shapes in ER (entity relationship) diagrams.
Relationships enables the separation and organization of data across multiple tables which
facilitate the data management and data retrieval.
Primary Terminologies
• Entity: Entity is a real world instance or object that can store data. Each entity is represented
by a table in the database.
• Relationships: It is relation or links between two or more entities. Relationships define how
data is related and structured across multiple tables in a database.
• ER (Entity-Relationship) Diagram: It is a graphical representation of entities and their
relationship to each other. In ER diagrams, entities are represented by rectangles and the
relation between them is represented by diamond.
Types of Relationships in DBMS
There are three primary types of relationships in DBMS. Each type of relationship plays a
unique role in database design.
• One-to-One relationship
• One-to-Many or Many-to-One relationship
• Many-to-Many relationship
One-to-One Relationship
In one-to-one relationship a single record in one table is related with a single record in other
table and vice versa. This type of relationships are relatively rare and commonly used for
security or organizational reasons.
Example:
Consider two entities "Person" and "Aadhar card". Each person can have only one Aadhar
card and each Aadhar card is assigned to only one person.

One-to-Many or Many-to-One Relationship


In one-to-many or many-to-one relationship, a single record in one table can be associated
with multiple records in another table and this is the most common type of relationship in
DBMS.
Example:
Consider two entities "customer" and "order". Each customer can place multiple orders but
each order is placed by only one customer.

Many-to-Many Relationship
A many-to-many relationship is relationship in which one multiple records in one table are
associated with multiple records in another table. This relationship is mainly implemented
using junction table.
Example:
Consider two entities "Student" and "Course" where each student can enroll in multiple
courses and each course can have multiple students enrolled in it.
Self-Referencing Relationships
A self-referencing relationship is also known as recursive relationship and it is useful is cases
when a table has relationship with itself. It is used for representing hierarchical data.
Example:
An "Employee" entity where each employee can have manager who is also an employee.

10. Construct ER-model for one real time application

11. Write about The Relational Model Concepts

What is the Relational Model?


The relational model represents how data is stored in Relational Databases. A relational
database consists of a collection of tables each of which is assigned a unique name. Consider
a relation STUDENT with attributes ROLL_NO, NAME, ADDRESS, PHONE, and AGE shown in
the table.
Table STUDENT

Key Terms
• Attribute: Attributes are the properties that define an entity. e.g. ROLL_NO, NAME,
ADDRESS.
• Relation Schema: A relation schema defines the structure of the relation and represents the
name of the relation with its attributes. e.g. STUDENT (ROLL_NO, NAME, ADDRESS, PHONE,
and AGE) is the relation schema for STUDENT. If a schema has more than 1 relation it is called
Relational Schema.
• Tuple: Each row in the relation is known as a tuple. The above relation contains 4 tuples one
of which is shown as:

1 RAM DELHI 9455123451 18


• Relation Instance: The set of tuples of a relation at a particular instance of time is called
a relation instance. It can change whenever there is an insertion, deletion or update in the
database.
• Degree: The number of attributes in the relation is known as the degree of the relation. The
STUDENT relation defined above has degree 5.
• Cardinality: The number of tuples in a relation is known as cardinality. The STUDENT relation
defined above has cardinality 4.
• Column: The column represents the set of values for a particular attribute. The column
ROLL_NO is extracted from the relation STUDENT.
• NULL Values: The value which is not known or unavailable is called a NULL value. It is
represented by NULL. e.g. PHONE of STUDENT having ROLL_NO 4 is NULL.
• Relation Key: These are basically the keys that are used to identify the rows uniquely or also
help in identifying tables. These are of the following types:
o Primary Key
o Candidate Key
o Super Key
o Foreign Key
o Alternate Key
o Composite Key

12. Define and explain about Domains, Attributes, Tuples, and Relations
Domain :
• Data is modeled by using atomic values as the basis for the domain. In the relational model,
atomic values refer to the number of values in a domain that are indivisible. First Name is a
set of character strings that represent the names of people in the domain.
• In a database, a domain is a column that contains a data type. Data types can be built-in
(such as integers or strings) or custom types that define constraints on the data themselves.
• A SQL Domain is a set of valid values that can be named by the user. Name of the Domain’s
set of values that must belong to (for character string types). This is the name of the
domain’s default Collation.
Example :
In a table, a domain is a set of values that can be used to attribute an attribute. The domain
of a month can accept January, February, etc. A domain of integers can accept whole
numbers that are negative, positive, and zero in December.
Tuple :
Tuples are one of the most used items in Database Management Systems (or DBMS). A Tuple
in DBMS is just a row having inter-related data about a particular entity(it can be any object).
• This data is spread across some columns having various attributes such as name, age, gender,
marks, etc. It should be noted that Tuples are mostly seen in Relational Databases
Management Systems(RDBMS) as RDBMS works on the relational model (Tabular format).
What Is Tuple In DBMS?
In Database Management System (DBMS), most of the time we need to store the data in
tabular format . This kind of data storage model is also called a Relational model and the
system which leverages the relational model is called Relational Database Management
System (RDBMS). These relations (or tables) consist of rows and columns. But in DBMS, we
call these rows “Tuples” and a row “Tuple”.
Let us see Tuple in DBMS in detail. Let us understand this with the help of a real-life example.
Example Of Single Record Or Tuple
Consider the table given below. We have data of some students like their id, name, age, etc.
here, each row has almost all the information of the respective student. Like the first row has
all the information about a student named “Sufiyan”, similarly, all other rows contain
information about other students. Hence, a single row is also termed a “record” as it
contains all the information of a student. This row or record is termed as Tuple in DBMS.
Hence Tuple in DBMS is just a row representing some inter-related data of a particular entity
such as student, employee, user, etc.
Table for reference:

A Tuple from the above-given table


In the above-given image, you can see that a Tuple is just a row having attributes of a
particular entity like name, age, marks, etc.
Attributes :
• Any real-world object is considered to be an entity that has self-existence and these entities
in DBMS have their own characteristics and properties known as attributes. Attributes give us
additional information about entities and help us to study their relationship within the
specified system.
• Attributes in an ER (Entity Relationship) model are always represented in an elliptical shape.
There are different types of attributes in DBMS: Simple, Composite, Single Valued, Multi-
Valued, Stored, Derived, Key, and Complex attributes.
• An entity may contain any number of attributes while one of the attributes is considered to
be a primary key attribute.
• An attribute can take its values from a set of possible values for each entity instance in an ER
model in DBMS.
We always represent attributes in DBMS in an elliptical shape. We can refer to the above
image where we have an ER model diagram and the student represented in rectangle shape
is our entity object. Student entity has different attributes: Roll_No, Name, DOB, Phone_No,
Age, Address, Country, State, City, and Street.
Degree :
• The degree of a relationship is the number of entity types that participate(associate) in a
relationship.
• By seeing an E-R diagram, we can simply tell the degree of a relationship i.e the number of
an entity type that is connected to a relationship is the degree of that relationship.
Example
If we have two entity type ‘Customer’ and ‘Account’ and they are linked using the primary
key and foreign key. We can say that the degree of relationship is 2 because here two entities
are taking part in the relationship.

Based on the number of entity types that are connected we have the following degree of
relationships:
• Unary
• Binary
• Ternary
• N-ary
Unary (degree 1)
A unary relationship exists when both the participating entity type are the same. When such
a relationship is present we say that the degree of relationship is 1.

Binary (degree 2)
A binary relationship exists when exactly two entity type participates. When such a
relationship is present we say that the degree is 2. This is the most common degree of
relationship. It is easy to deal with such relationship as these can be easily converted into
relational tables.

Ternary(degree 3)
A ternary relationship exists when exactly three entity type participates. When such a
relationship is present we say that the degree is 3. As the number of entity increases in the
relationship, it becomes complex to convert them into relational tables.

N-ary (n degree)
An N-ary relationship exists when ’n’ number of entities are participating. So, any number of
entities can participate in a relationship. There is no limitation to the maximum number of
entities that can participate. But, relations with a higher degree are not common. This is
because the conversion of higher degree relations to relational tables gets complex. We are
making an E-R model because it can be easily be converted into any other model for
implementing the database. But, this benefit is not available if we use higher degree
relations. So, binary relations are more popular and widely used. Though we can make a
relationship with any number of entity types but we don’t do that.
We represent an N-ary relationship as follows:

Cardinality :
In the view of databases, cardinality refers to the uniqueness of data values that are
contained in a column. High cardinality is nothing but the column contains a large
percentage of totally unique values. Low cardinality is nothing but the column which has a
lot of “repeats” in its data range.
Cardinality between the tables can be of type one-to-one, many-to-one or many-to-many.
Mapping Cardinality
It is expressed as the number of entities to which another entity can be associated via a
relationship set.
For binary relationship set there are entity set A and B then the mapping cardinality can be
one of the following −
• One-to-one
• One-to-many
• Many-to-one
• Many-to-many
One-to-one relationship
One entity of A is associated with one entity of B.
Example
Given below is an example of the one-to-one relationship in the mapping cardinality. Here,
one department has one head of the department (HOD).

One-to-many relationship
An entity set A is associated with any number of entities in B with a possibility of zero and
entity in B is associated with at most one entity in A

Example
Given below is an example of the one-to-many relationship in the mapping cardinality. Here,
one department has many faculties.

Many-to-one relationship
An entity set A is associated with at most one entity in B and an entity set in B can be
associated with any number of entities in A with a possibility of zero.
Example
Given below is an example of the many-to-one relationship in the mapping cardinality. Here,
many faculties work in one department.

Many-to-many relationship
Many entities of A are associated with many entities of B.
An entity in A is associated with many entities of B and an entity in B is associated with many
entities of A.
Many to many=many to one + one to many

Example
Given below is an example of the many-to-many relationship in the mapping cardinality.
Here, many employees work on many projects.

13. Explain about Schema-instance distinction


“Schema” and “Instance” are key ideas in a database management system (DBMS) that help
organize and manage data. A schema can be referred to as the blueprint of the database
while an instance is the actual contents of the database at a given point of time. This article
will look at these ideas in detail to understand their importance, their difference, and how
they relate to each other in a DBMS.
In a Database Management System (DBMS), the schema refers to the overall design or
blueprint of the database, describing its structure (like tables, columns, and relationships). It
remains relatively stable over time. On the other hand, an instance represents the actual
data within the database at any particular moment, which can change frequently as the
database is updated.
What is Schema?
Schema is the overall description of the database. The basic structure of how the data will be
stored in the database is called schema. In DBMS, the term schema refers to the architecture
of the database which describes how it will appear or will be constructed. It describes the
organization of data such as tables, relationships as well as constraints. A schema is a
template that dictates how data items in a database will be stored, arranged, and accessed.

Schema
Schema is of three types: Logical Schema, Physical Schema and view Schema.
• Logical Schema – It describes the database designed at a logical level.
• Physical Schema – It describes the database designed at the physical level.
• View Schema – It defines the design of the database at the view level.
Example:
Let’s say a table teacher in our database named school, the teacher table requires the name,
dob, and doj in their table so we design a structure as:
Teacher table
name: String
doj: date
dob: date
Advantages of Schema
• Consistency: Guarantees proper storage of data in order to allow easy access and
expandability.
• Structure: Helps in easy arrangement of the data base in an organized manner and hence
makes it easy to comprehend.
• Data Integrity: Puts in place restrictions that ensure the data’s maintaining of its accuracy
and subsequent reliability.
Disadvantages of Schema
• Rigidity: Schemas, defined, may be rigid for alteration, and may take a huge amount of effort
in order to alter the scheme.
• Complexity: Developing a schema may be difficult or time consuming in case of large
databases.
What is Instance?
An instance of DBMS refers to real data in a database coming at some particular point in
time. Instance on the other hand refers to the content in the database in as much as it refers
to the structure defined under a particular schema at a given point.
Example
Let’s say a table teacher in our database whose name is School, suppose the table has 50
records so the instance of the database has 50 records for now and tomorrow we are going
to add another fifty records so tomorrow the instance has a total of 100 records. This is
called an instance.
Advantages of Instance
• Real-Time Representation: It a return of the data in the database at a certain point in time
as may be required for analysis or for performing operations.
• Flexibility: While a schema remains fixed in time, instances can be quite volatile, as data is
written, updated, or deleted.
Disadvantages of Instance
• Volatility: Those are occurrences may be dynamic in a way they are different over time and
this may make it a challenge to keep track without the necessary intervention.
• Data Integrity Issues: If not well regulated, it is evident that the data in an instance could
become very inconsistent and at times even incorrect.
Difference Between Schema and Instance

Instance
Schema

It is the overall description of the It is the collection of information stored


database. in a database at a particular moment.

The schema is same for the whole Data in instances can be changed using
database. addition, deletion, and updation.

Does not change Frequently. Changes Frequently.

Defines the basic structure of the It is the set of Information stored at a


database i.e. how the data will be particular time.
stored in the database.

Affects the entire database Affects only the current state of data.
structure.

Easily altered by
Requires significant effort and performing CRUD (Create, Read, Update,
planning to change. Delete) operations.
Instance
Schema

Table structures, relationships, Data entries, records in tables.


constraints.

14. Explain different Types of Keys


Super Key
The set of one or more attributes (columns) that can uniquely identify a tuple (record) is
known as Super Key. For Example, STUD_NO, (STUD_NO, STUD_NAME), etc.
• A super key is a group of single or multiple keys that uniquely identifies rows in a table. It
supports NULL values in rows.
• A super key can contain extra attributes that aren’t necessary for uniqueness. For example, if
the “STUD_NO” column can uniquely identify a student, adding “SNAME” to it will still form a
valid super key, though it’s unnecessary.
Example:
Table STUDENT

PHONE
STUD_NO SNAME ADDRESS

1 Shyam Delhi 123456789

2 Rakesh Kolkata 223365796

3 Suraj Delhi 175468965

Consider the table shown above.


STUD_NO+PHONE is a super key.

Relation between Primary Key, Candidate Key, and Super Key


Now Try Questions discussed in Number of possible Superkeys to test your understanding.
Candidate Key
The minimal set of attributes that can uniquely identify a tuple is known as a candidate key.
For Example, STUD_NO in STUDENT relation.
• A candidate key is a minimal super key, meaning it can uniquely identify a record but
contains no extra attributes.
• It is a super key with no repeated data is called a candidate key.
• The minimal set of attributes that can uniquely identify a record.
• A candidate key must contain unique values, ensuring that no two rows have the same value
in the candidate key’s columns.
• Every table must have at least a single candidate key.
• A table can have multiple candidate keys but only one primary key.
Example:
STUD_NO is the candidate key for relation STUDENT.
Table STUDENT

PHONE
STUD_NO SNAME ADDRESS

1 Shyam Delhi 123456789

2 Rakesh Kolkata 223365796

3 Suraj Delhi 175468965

• The candidate key can be simple (having only one attribute) or composite as well.
Example:
{STUD_NO, COURSE_NO} is a composite
candidate key for relation STUDENT_COURSE.
Table STUDENT_COURSE

COURSE_NO
STUD_NO TEACHER_NO

1 001 C001

2 056 C005

Primary Key
There can be more than one candidate key in relation out of which one can be chosen as the
primary key. For Example, STUD_NO, as well as STUD_PHONE, are candidate keys for relation
STUDENT but STUD_NO can be chosen as the primary key (only one out of many candidate
keys).
• A primary key is a unique key, meaning it can uniquely identify each record (tuple) in a table.
• It must have unique values and cannot contain any duplicate values.
• A primary key cannot be NULL, as it needs to provide a valid, unique identifier for every
record.
• A primary key does not have to consist of a single column. In some cases, a composite
primary key (made of multiple columns) can be used to uniquely identify records in a table.
• Databases typically store rows ordered in memory according to primary key for fast access of
records using primary key.
Example:
STUDENT table -> Student(STUD_NO, SNAME, ADDRESS, PHONE) , STUD_NO is a primary key
Table STUDENT

PHONE
STUD_NO SNAME ADDRESS

1 Shyam Delhi 123456789

2 Rakesh Kolkata 223365796

3 Suraj Delhi 175468965

Alternate Key
An alternate key is any candidate key in a table that is not chosen as the primary key. In
other words, all the keys that are not selected as the primary key are considered alternate
keys.
• An alternate key is also referred to as a secondary key because it can uniquely identify
records in a table, just like the primary key.
• An alternate key can consist of one or more columns (fields) that can uniquely identify a
record, but it is not the primary key
• Eg:- SNAME, and ADDRESS is Alternate keys
Example:
Consider the table shown above.
STUD_NO, as well as PHONE both,
are candidate keys for relation STUDENT but
PHONE will be an alternate key
(only one out of many candidate keys).

Primary Key, Candidate Key, and Alternate Key


Foreign Key
A foreign key is an attribute in one table that refers to the primary key in another table. The
table that contains the foreign key is called the referencing table, and the table that is
referenced is called the referenced table.
• A foreign key in one table points to the primary key in another table, establishing a
relationship between them.
• It helps connect two or more tables, enabling you to create relationships between them.
This is essential for maintaining data integrity and preventing data redundancy.
• They act as a cross-reference between the tables.
• For example, DNO is a primary key in the DEPT table and a non-key in EMP
Example:
Refer Table STUDENT shown above.
STUD_NO in STUDENT_COURSE is a
foreign key to STUD_NO in STUDENT relation.
Table STUDENT_COURSE

COURSE_NO
STUD_NO TEACHER_NO

1 005 C001

2 056 C005

It may be worth noting that, unlike the Primary Key of any given relation, Foreign Key can be
NULL as well as may contain duplicate tuples i.e. it need not follow uniqueness
constraint. For Example, STUD_NO in the STUDENT_COURSE relation is not unique. It has
been repeated for the first and third tuples. However, the STUD_NO in STUDENT relation is a
primary key and it needs to be always unique, and it cannot be null.

Relation between Primary Key and Foreign Key


Composite Key
Sometimes, a table might not have a single column/attribute that uniquely identifies all the
records of a table. To uniquely identify rows of a table, a combination of two or more
columns/attributes can be used. It still can give duplicate values in rare cases. So, we need to
find the optimal set of attributes that can uniquely identify rows in a table.
• It acts as a primary key if there is no primary key in a table
• Two or more attributes are used together to make a composite key .
• Different combinations of attributes may give different accuracy in terms of identifying the
rows uniquely.
Example:
FULLNAME + DOB can be combined
together to access the details of a student.

Different Types of Keys

15. Write about Relational Algebra Operations from Set Theory used in SQL
Relational algebra is a formal system for manipulating and querying relations (tables) in a
relational database. It operates on sets and uses set theory principles to define operations
on these relations. The operations in relational algebra are typically used to form complex
queries and are foundational to SQL. Here’s a breakdown of the common relational algebra
operations derived from set theory:
1. Selection (σ)
• Operation: Selects rows from a relation that satisfy a given predicate (condition).
• SQL Equivalent: SELECT ... WHERE ...
• Set Theory Equivalent: The selection operation is equivalent to the set intersection, where
only the elements (tuples) that meet the specified condition are retained.
• Example:
o Relational Algebra: σ_{age > 30}(Employees)
o SQL: SELECT * FROM Employees WHERE age > 30;
o This operation retrieves all employees who are older than 30.
2. Projection (π)
• Operation: Extracts certain columns (attributes) from a relation, effectively removing
duplicates and producing a result that only includes the specified attributes.
• SQL Equivalent: SELECT ...
• Set Theory Equivalent: The projection operation is akin to a set’s projection in mathematics,
where only the specified dimensions of the tuples are retained.
• Example:
o Relational Algebra: π_{name, age}(Employees)
o SQL: SELECT name, age FROM Employees;
o This retrieves only the name and age columns from the Employees relation.
3. Union (∪)
• Operation: Combines the results of two relations, eliminating duplicates.
• SQL Equivalent: UNION
• Set Theory Equivalent: The union operation in set theory combines two sets to produce a
new set that contains all distinct elements from both sets.
• Example:
o Relational Algebra: Students ∪ Faculty
o SQL: SELECT * FROM Students UNION SELECT * FROM Faculty;
o This combines the Students and Faculty tables, removing any duplicates.
4. Difference (−)
• Operation: Returns the set of tuples that are in one relation but not in another.
• SQL Equivalent: EXCEPT or NOT IN
• Set Theory Equivalent: The difference operation is similar to subtracting one set from
another in set theory.
• Example:
o Relational Algebra: Employees − Managers
o SQL: SELECT * FROM Employees WHERE id NOT IN (SELECT id FROM Managers);
o This retrieves all employees who are not managers.
5. Cartesian Product (×)
• Operation: Combines every tuple of one relation with every tuple of another relation,
producing a new relation with all possible pairs of tuples.
• SQL Equivalent: JOIN (without a condition, or using CROSS JOIN)
• Set Theory Equivalent: The Cartesian product of two sets contains all ordered pairs from
both sets.
• Example:
o Relational Algebra: Employees × Departments
o SQL: SELECT * FROM Employees CROSS JOIN Departments;
o This produces a combination of every employee with every department.
6. Rename (ρ)
• Operation: Changes the name of a relation or the names of its attributes.
• SQL Equivalent: AS
• Set Theory Equivalent: This operation is akin to changing the labels (or identifiers) of sets or
their elements.
• Example:
o Relational Algebra: ρ_{E}(Employees)
o SQL: SELECT * FROM Employees AS E;
o This renames the Employees table to E for use in further operations.
7. Join (⨝)
• Operation: Combines tuples from two relations based on a common attribute or condition.
• SQL Equivalent: JOIN (typically INNER JOIN, LEFT JOIN, etc.)
• Set Theory Equivalent: Join is a special case of the Cartesian product, but with a condition
that pairs only matching tuples.
• Example:
o Relational Algebra: Employees ⨝ Departments
o SQL: SELECT * FROM Employees JOIN Departments ON Employees.department_id =
[Link];
o This performs an inner join between Employees and Departments on the
department_id field.
8. Intersection (∩)
• Operation: Returns the set of tuples that are present in both relations.
• SQL Equivalent: INTERSECT
• Set Theory Equivalent: This is equivalent to the set intersection in mathematics, which
returns common elements from both sets.
• Example:
o Relational Algebra: Employees ∩ Managers
o SQL: SELECT * FROM Employees INTERSECT SELECT * FROM Managers;
o This retrieves employees who are also managers.
9. Division (÷)
• Operation: Used when you want to find tuples in one relation that match every tuple in
another relation. This operation is often used in queries like "find employees who work in all
departments."
• SQL Equivalent: There is no direct SQL equivalent, but it can be done using GROUP BY and
HAVING clauses.
• Set Theory Equivalent: Division is analogous to finding a subset of elements that meet a set
of conditions.
• Example:
o Relational Algebra: Employees ÷ Departments
o This operation would return employees who work in all departments.

16. Explain Types of Integrity Constraints


In relational databases, integrity constraints are rules that ensure the accuracy, consistency,
and reliability of the data stored in the database. These constraints prevent invalid data from
being entered into the database and maintain relationships between different tables. There
are several types of integrity constraints, each serving a specific purpose to maintain the
integrity of the data.
Here’s a detailed explanation of the main types of integrity constraints in relational
databases:
1. Domain Integrity
• Definition: Domain integrity ensures that the data entered into a database field (attribute) is
valid based on the domain or range of possible values for that field.
• Purpose: To ensure that values in a column are of the correct data type, meet certain
conditions, and are within a valid range.
• Examples:
o A column of age should only contain positive integers within a valid range (e.g., 0 to
120).
o A price column should not accept negative values.
• SQL Implementation: This is implemented by setting data types for columns (e.g., INTEGER,
VARCHAR, DATE), using constraints like CHECK, and defining default values.
Example SQL:
CREATE TABLE Products (
product_id INT PRIMARY KEY,
product_name VARCHAR(100),
price DECIMAL(10, 2) CHECK (price >= 0)
);
2. Entity Integrity
• Definition: Entity integrity ensures that each row (tuple) in a table is uniquely identifiable. It
guarantees that every record in a table has a unique primary key that is not null.
• Purpose: To ensure that no two rows in a table are identical and that each row can be
uniquely referenced.
• Examples:
o The id column in a Customers table is often set as the primary key to ensure each
customer has a unique identifier.
• SQL Implementation: This is enforced using the PRIMARY KEY constraint, which ensures that
a column (or set of columns) is both unique and non-null.
Example SQL:
CREATE TABLE Customers (
customer_id INT PRIMARY KEY,
customer_name VARCHAR(100),
email VARCHAR(100) UNIQUE
);
3. Referential Integrity
• Definition: Referential integrity ensures that relationships between tables are maintained,
meaning that foreign keys in a child table must match primary keys in the parent table or be
null.
• Purpose: To ensure that the foreign key in a referencing table corresponds to a valid primary
key in the referenced table.
• Examples:
o In an Orders table, the customer_id must correspond to a valid customer_id in the
Customers table.
• SQL Implementation: This is enforced using the FOREIGN KEY constraint, which establishes a
link between the foreign key in a child table and the primary key in a parent table.
Example SQL:
CREATE TABLE Orders (
order_id INT PRIMARY KEY,
order_date DATE,
customer_id INT,
FOREIGN KEY (customer_id) REFERENCES Customers(customer_id)
);
4. User-Defined Integrity
• Definition: User-defined integrity constraints are custom rules defined by the database user
to meet the specific needs of the application or business logic. These constraints can include
business rules or complex conditions that are not covered by domain, entity, or referential
integrity.
• Purpose: To enforce additional business rules, such as the requirement that an employee’s
salary cannot exceed a certain amount or that a product’s stock level cannot be negative.
• Examples:
o An employee's salary must always be greater than 20000.
o A product's stock quantity cannot be negative.
• SQL Implementation: This is implemented using the CHECK constraint and sometimes with
triggers to enforce complex conditions.
Example SQL:
CREATE TABLE Employees (
employee_id INT PRIMARY KEY,
employee_name VARCHAR(100),
salary DECIMAL(10, 2),
CHECK (salary > 20000)
);
5. Null Integrity
• Definition: Null integrity ensures that the data is either provided (non-null) or not allowed to
be null when certain conditions demand that a value is required.
• Purpose: To ensure that certain columns do not contain null values when they are critical to
data integrity.
• Examples:
o A user's email address cannot be null, as it is a necessary piece of information for
contacting the user.
o The order_date in the Orders table cannot be null since every order needs to have a
date.
• SQL Implementation: This is enforced using the NOT NULL constraint, which ensures that a
column cannot have null values.
Example SQL:
CREATE TABLE Users (
user_id INT PRIMARY KEY,
username VARCHAR(50) NOT NULL,
email VARCHAR(100) NOT NULL
);
6. Check Constraints
• Definition: The CHECK constraint ensures that the values in a column meet a specified
condition or rule.
• Purpose: To enforce domain-specific rules on data values for individual columns in a table.
• Examples:
o A status column might only allow values like 'Active', 'Inactive', or 'Pending'.
o The age column might restrict values to be greater than or equal to 18.
• SQL Implementation: The CHECK constraint is applied directly to the column or table
definition to enforce the rules.
Example SQL:
CREATE TABLE Employees (
employee_id INT PRIMARY KEY,
employee_name VARCHAR(100),
status VARCHAR(20) CHECK (status IN ('Active', 'Inactive', 'Pending'))
);

17. Explain about selection, projection, cross product operators used in SQL
1. Selection (σ)
• Relational Algebra: The selection operator (σ) selects rows (tuples) from a relation (table)
that satisfy a given condition or predicate. It is used to filter rows based on specific criteria.
• SQL Equivalent: The WHERE clause in SQL performs the selection operation. It filters rows
based on a condition, similar to how selection works in relational algebra.
• Purpose: To retrieve specific rows from a table that meet a particular condition or criteria.
• Example:
o Relational Algebra: σ_{age > 30}(Employees)
▪ This retrieves all employees who are older than 30.
o SQL:

SELECT * FROM Employees WHERE age > 30;


▪ This SQL query returns all rows from the Employees table where the age is
greater than 30.
Key Points:
• The selection operator applies a condition (filter) to rows.
• It doesn't alter the columns of the result, only the rows.
2. Projection (π)
• Relational Algebra: The projection operator (π) is used to retrieve specific columns
(attributes) from a relation. It eliminates duplicate values in the selected columns, meaning
the result will only show distinct values for the selected columns.
• SQL Equivalent: The SELECT clause in SQL performs the projection operation. In SQL, you
specify the columns you want to retrieve.
• Purpose: To select specific columns (attributes) from a table and optionally eliminate
duplicate values.
• Example:
o Relational Algebra: π_{name, age}(Employees)
▪ This retrieves only the name and age columns from the Employees table.
o SQL:
SELECT name, age FROM Employees;
▪ This SQL query retrieves the name and age columns for all employees.
Key Points:
• The projection operation selects only certain columns (attributes) from a table.
• It doesn't affect the rows returned, only the columns.
3. Cross Product (×)
• Relational Algebra: The cross product (or Cartesian product) operator (×) combines every
tuple (row) of one relation with every tuple (row) of another relation. This results in a new
relation where every row from the first table is paired with every row from the second table,
resulting in all possible combinations.
• SQL Equivalent: The CROSS JOIN operator in SQL performs the cross product operation. It
returns the Cartesian product of two tables.
• Purpose: To combine all rows from two tables, typically producing a large number of results.
• Example:
o Relational Algebra: Employees × Departments
▪ This combines every employee with every department.
o SQL:
SELECT * FROM Employees CROSS JOIN Departments;
▪ This SQL query retrieves a Cartesian product of all rows from the Employees
table and all rows from the Departments table.
Key Points:
• The cross product produces a result set where the number of rows is the product of the
number of rows in the first table and the number of rows in the second table.
• It does not require any condition to combine rows.
• The result contains every possible pair of rows from both tables.
Comparison of Selection, Projection, and Cross Product
Relational
SQL Effect
Operation Algebra Purpose
Equivalent
Symbol
Filters rows
(tuples)
Filters rows from a table
WHERE
Selection σ (sigma) based on a based on
clause
condition specified
criteria

Retrieves Returns
specific selected
SELECT columns
Projection π (pi) columns
clause (attributes)
from a
table from a table

Combines Returns the


every row Cartesian
Cross CROSS of one product of
× (cross) two tables,
Product JOIN table with
every row combining
of another all rows

18. What is a Join? Discuss about various joins used in SQL.


An SQL JOIN clause is used to query and access data from multiple tables by
establishing logical relationships between them. It can access data from multiple tables
simultaneously using common key values shared across different tables. We can use SQL
JOIN with multiple tables. It can also be paired with other clauses, the most popular use will
be using JOIN with WHERE clause to filter data retrieval.
Before diving into the specifics, let’s visualize how each join type operates:
• INNER JOIN: Returns only the rows where there is a match in both tables.
• LEFT JOIN (LEFT OUTER JOIN): Returns all rows from the left table, and the matched rows
from the right table. If there’s no match, NULL values are returned for columns from the right
table.
• RIGHT JOIN (RIGHT OUTER JOIN): Returns all rows from the right table, and the matched
rows from the left table. If there’s no match, NULL values are returned for columns from the
left table.
• FULL JOIN (FULL OUTER JOIN): Returns all rows when there is a match in one of the tables. If
there’s no match, NULL values are returned for columns from the table without a match.
1. SQL INNER JOIN
The INNER JOIN keyword selects all rows from both the tables as long as the condition is
satisfied. This keyword will create the result-set by combining all rows from both the tables
where the condition satisfies i.e value of the common field will be the same.
Syntax
SELECT table1.column1,table1.column2,table2.column1,…. FROM table1 INNER JOIN table2
ON table1.matching_column = table2.matching_column;
Note: We can also write JOIN instead of INNER JOIN. JOIN is same as INNER JOIN.
Example of INNER JOIN
Consider the two tables, Student and StudentCourse, which share a common
column ROLL_NO. Using SQL JOINS, we can combine data from these tables based on
their relationship, allowing us to retrieve meaningful information like student details along
with their enrolled courses.
Student Table

StudentCourse Table

Let’s look at the example of INNER JOIN clause, and understand it’s working. This query will
show the names and age of students enrolled in different courses.
Query:
SELECT StudentCourse.COURSE_ID, [Link], [Link] FROM Student
INNER JOIN StudentCourse
ON Student.ROLL_NO = StudentCourse.ROLL_NO;
Output
2. SQL LEFT JOIN
A LEFT JOIN returns all rows from the left table, along with matching rows from the right
table. If there is no match, NULL values are returned for columns from the right table. LEFT
JOIN is also known as LEFT OUTER JOIN.
Syntax
SELECT table1.column1,table1.column2,table2.column1,….
FROM table1
LEFT JOIN table2
ON table1.matching_column = table2.matching_column;
Note: We can also use LEFT OUTER JOIN instead of LEFT JOIN, both are the same.

LEFT JOIN Example


In this example, the LEFT JOIN retrieves all rows from the Student table and the matching
rows from the StudentCourse table based on the ROLL_NO column.
Query:
SELECT [Link],StudentCourse.COURSE_ID
FROM Student
LEFT JOIN StudentCourse
ON StudentCourse.ROLL_NO = Student.ROLL_NO;
Output
3. SQL RIGHT JOIN
RIGHT JOIN returns all the rows of the table on the right side of the join and matching rows
for the table on the left side of the join. It is very similar to LEFT JOIN for the rows for which
there is no matching row on the left side, the result-set will contain null. RIGHT JOIN is also
known as RIGHT OUTER JOIN.
Syntax
SELECT table1.column1,table1.column2,table2.column1,....
FROM table1
RIGHT JOIN table2
ON table1.matching_column = table2.matching_column;
Key Terms
• table1: First table.
• table2: Second table
• matching_column: Column common to both the tables.
Note: We can also use RIGHT OUTER JOIN instead of RIGHT JOIN, both are the same.

RIGHT JOIN Example


In this example, the RIGHT JOIN retrieves all rows from the StudentCourse table and the
matching rows from the Student table based on the ROLL_NO column.
Query:
SELECT [Link],StudentCourse.COURSE_ID
FROM Student
RIGHT JOIN StudentCourse
ON StudentCourse.ROLL_NO = Student.ROLL_NO;
Output

4. SQL FULL JOIN


FULL JOIN creates the result-set by combining results of both LEFT JOIN and RIGHT JOIN. The
result-set will contain all the rows from both tables. For the rows for which there is no
matching, the result-set will contain NULL values.

Syntax
SELECT table1.column1,table1.column2,table2.column1,....
FROM table1
FULL JOIN table2
ON table1.matching_column = table2.matching_column;
Key Terms
• table1: First table.
• table2: Second table
• matching_column: Column common to both the tables.
FULL JOIN Example
This example demonstrates the use of a FULL JOIN, which combines the results of both LEFT
JOIN and RIGHT JOIN. The query retrieves all rows from
the Student and StudentCourse tables. If a record in one table does not have a matching
record in the other table, the result set will include that record with NULL values for the
missing fields
Query:
SELECT [Link],StudentCourse.COURSE_ID
FROM Student
FULL JOIN StudentCourse
ON StudentCourse.ROLL_NO = Student.ROLL_NO;
Output
COURSE_ID
NAME

HARSH 1

PRATIK 2

RIYANKA 2

DEEP 3

SAPTARHI 1

DHANRAJ NULL

ROHIT NULL

NIRAJ NULL

NULL 4

NULL 5

NULL 4

5. SQL Natural Join (?)


Natural join can join tables based on the common columns in the tables being joined. A
natural join returns all rows by matching values in common columns having same name
and data type of columns and that column should be present in both tables.
• Both table must have at least one common column with same column name and same data
type.
• The two table are joined using Cross join.
• DBMS will look for a common column with same name and data type. Tuples having exactly
same values in common columns are kept in result.
Natural join Example
Look at the two tables below- Employee and Department
Employee

Emp_id Emp_name Dept_id

1 Ram 10

2 Jon 30

3 Bob 50

Department

Dept_id Dept_name

10 IT

30 HR

40 TIS

Problem: Find all Employees and their respective departments.


Solution Query: (Employee) ? (Department)

Dept_nam
Emp_i Emp_nam Dept_i Dept_i
e
d e d d

1 Ram 10 10 IT

2 Jon 30 30 HR

Employee data Department data

19. List and explain SQL Relational Set Operators.


1. UNION
• Definition: The UNION operator combines the result sets of two or more SQL queries,
returning only the unique (distinct) rows from all the queries.
• Purpose: To combine results from multiple queries while removing duplicates. It is
particularly useful when you want to merge the results from different tables or conditions,
ensuring that the final result set contains only distinct rows.
• Requirements:
o The queries must have the same number of columns.
o The corresponding columns must have compatible data types.
• Example: Suppose you have two tables: Employees and Managers. You want to retrieve all
unique employee and manager names:
SELECT name FROM Employees
UNION
SELECT name FROM Managers;
o This query returns a list of distinct names from both the Employees and Managers
tables.
Key Points:
• Removes duplicates automatically.
• Use UNION ALL if you want to include all rows, including duplicates.
2. UNION ALL
• Definition: The UNION ALL operator combines the result sets of two or more SQL queries,
but unlike UNION, it does not remove duplicates.
• Purpose: To combine results from multiple queries while keeping all rows, including
duplicates.
• Requirements:
o The queries must have the same number of columns.
o The corresponding columns must have compatible data types.
• Example: If you want to combine all names from Employees and Managers tables without
removing duplicates:
SELECT name FROM Employees
UNION ALL
SELECT name FROM Managers;
o This query will return all names, including duplicates, from both the Employees and
Managers tables.
Key Points:
• Does not remove duplicates.
• Returns all rows, even if some are identical across the queries.
3. INTERSECT
• Definition: The INTERSECT operator returns only the rows that are common to the result
sets of two SQL queries. In other words, it finds the intersection of the results.
• Purpose: To retrieve the common rows from two or more result sets.
• Requirements:
o The queries must have the same number of columns.
o The corresponding columns must have compatible data types.
• Example: Suppose you want to find employees who are also managers:
SELECT name FROM Employees
INTERSECT
SELECT name FROM Managers;
o This query returns only the names that are present in both the Employees and
Managers tables.
Key Points:
• Returns common rows (intersection) between queries.
• Automatically removes duplicates.
4. EXCEPT
• Definition: The EXCEPT operator returns the rows from the first query that are not present in
the second query's result set. It is similar to the set difference in mathematics.
• Purpose: To find the rows that exist in one result set but not in another.
• Requirements:
o The queries must have the same number of columns.
o The corresponding columns must have compatible data types.
• Example: Suppose you want to find employees who are not managers:
SELECT name FROM Employees
EXCEPT
SELECT name FROM Managers;
o This query returns the names of employees who are not managers (i.e., employees
who do not appear in the Managers table).
Key Points:
• Returns rows from the first query that do not appear in the second query.
• Removes duplicates automatically.

Summary of Relational Set Operators


Operator Purpose Duplicates Example

SELECT name FROM


Combines results Employees UNION
UNION from two queries, Removed SELECT name FROM
removes duplicates Managers;

SELECT name FROM


Combines results Employees UNION ALL
UNION Not
from two queries, SELECT name FROM
ALL Removed
includes duplicates Managers;

SELECT name FROM


Returns rows Employees INTERSECT
INTERSECT common to both Removed SELECT name FROM
queries Managers;

SELECT name FROM


Returns rows in the Employees EXCEPT
EXCEPT first query but not Removed SELECT name FROM
in the second query Managers;

20. Explain how to converting the database specification in E/R notation to the relational schema
21. What is the Importance of a good schema design
A good schema design is crucial for the success of a relational database and its efficient
performance. The schema serves as the structure for the data stored in a database,
organizing it into tables, columns, and relationships. A well-designed schema can help ensure
that the database is flexible, efficient, easy to manage, and scalable. Here's why a good
schema design is important:
1. Data Integrity and Consistency
• Ensures Valid Data: A good schema enforces data integrity through constraints such as
primary keys, foreign keys, unique constraints, and check constraints. These constraints
ensure that the data is accurate, valid, and consistent across the database.
• Prevents Data Redundancy: By using techniques like normalization, a well-designed schema
reduces the chances of data duplication, which ensures that the data is consistent and avoids
unnecessary redundancy.
2. Improved Query Performance
• Efficient Data Retrieval: A good schema design can significantly improve the performance of
SQL queries. By structuring data in a way that makes logical sense and supports indexing and
joins, the schema allows for quicker and more efficient queries.
• Indexing and Search Optimization: Schema design enables the creation of indexes on
frequently queried columns, which can dramatically speed up data retrieval. For example,
indexing primary keys, foreign keys, or frequently queried fields helps the database engine
find data faster.
3. Scalability
• Handles Growth: A well-designed schema can scale efficiently as the volume of data grows.
Proper normalization and thoughtful organization of tables ensure that the database can
handle larger datasets without compromising performance.
• Supports Additional Features: As the database grows and more features are added, a well-
designed schema can be easily modified or extended to accommodate new data
requirements. For example, adding new tables or columns can be done without causing
major disruptions.
4. Flexibility and Maintainability
• Easy to Maintain: A good schema design is clear, logical, and easy to maintain. When data is
organized in a structured way with clear relationships between tables, it becomes easier for
developers and database administrators to manage and troubleshoot.
• Adaptability to Changes: A well-thought-out schema is flexible enough to accommodate
changes in business requirements or new types of data without requiring extensive changes
or causing data integrity issues.
5. Data Redundancy and Anomalies Prevention
• Normalization: One of the key aspects of a good schema is normalization, which organizes
data in such a way that it avoids unnecessary duplication (redundancy). This prevents issues
like:
o Update Anomalies: Where changes to a single piece of data might need to be made
in multiple places.
o Insert Anomalies: Where certain data cannot be inserted unless other irrelevant
data is also inserted.
o Delete Anomalies: Where deleting data in one place might unintentionally remove
necessary information.
6. Security and Access Control
• Fine-Grained Access Control: A good schema design can help implement security measures
by controlling access at the table or column level. Sensitive information (e.g., passwords,
credit card numbers) can be isolated in separate tables, and access permissions can be
restricted accordingly.
• Role-Based Permissions: A well-designed schema allows for the use of role-based access
control (RBAC), where different users or applications can be given specific permissions to
read, write, or modify data.
7. Ease of Data Integration
• Integrating with External Systems: A good schema design makes it easier to integrate the
database with other systems or third-party services. For instance, a standardized schema
that follows naming conventions and contains structured relationships allows easier data
export, import, and synchronization with external systems.
• Consistency in Data Models: A clear and well-organized schema facilitates the integration of
data from multiple sources, ensuring that data from different systems aligns and can be
merged seamlessly.
8. Reduced Data Duplication and Storage Overhead
• Minimizing Redundancy: When a schema is properly normalized, it minimizes the amount of
redundant data, leading to efficient storage use. Reducing redundancy means that the
database consumes less disk space, and the risk of inconsistent data is also reduced.
• Optimized Storage Management: A well-organized schema helps in optimizing storage
because it ensures that data is stored in the most efficient structure, avoiding wasted space
caused by redundant or unnecessary information.
9. Better Reporting and Data Analysis
• Supports BI Tools: A good schema facilitates data analysis and reporting. When the database
is well-structured, it’s easier for Business Intelligence (BI) tools or reporting software to
query and extract meaningful insights from the data.
• Clear Relationships for Reporting: Data that is organized with clear relationships (such as
using foreign keys and normalized tables) enables more accurate and efficient reporting. You
can more easily generate complex reports and insights by performing joins or aggregations
on well-structured data.
10. Avoiding Future Problems
• Fewer Future Modifications: A good schema anticipates future requirements, which means
that developers won't need to make constant changes to the schema. This reduces the risk of
creating structural problems as the database evolves.
• Reduced Risk of Data Corruption: Poorly designed schemas can lead to issues like data
corruption or inconsistent data when changes are made or when data is inserted incorrectly.
Best Practices for Good Schema Design
1. Normalization: Break down the data into related tables to eliminate redundancy and ensure
that the database is scalable and efficient.
2. Use of Primary and Foreign Keys: Ensure that primary keys uniquely identify records, and
foreign keys maintain relationships between tables.
3. Clear Naming Conventions: Use meaningful names for tables, columns, and constraints so
that the schema is easy to understand.
4. Consider Future Growth: Design the schema to accommodate future data growth and
changing business requirements.
5. Indexes for Performance: Use indexes on frequently queried columns to speed up search
and retrieval operations.
6. Data Types and Constraints: Choose appropriate data types for columns, and use constraints
like NOT NULL, UNIQUE, and CHECK to enforce data integrity.
22. What are the problems encountered with bad schema designs
A bad schema design can lead to a wide range of problems, some of which can have severe
consequences for both the performance and maintainability of the database. Here are some
of the key problems encountered with poor schema designs:
1. Data Redundancy
• Problem: Bad schema design often leads to data duplication. This redundancy occurs when
the same piece of data is stored in multiple places within the database.
• Consequences:
o Inconsistent data: When one copy of the data is updated and others are not, it can
lead to inconsistent or outdated information.
o Wasted storage: Duplicate data unnecessarily increases the size of the database and
consumes valuable storage space.
o Increased risk of errors: More data means more chances for mistakes during updates
or deletions, as multiple places need to be updated.
• Solution: Apply normalization techniques to break down large, redundant tables into smaller
ones and ensure data is stored only once.
2. Data Anomalies
• Problem: Bad schema design can lead to anomalies like update anomalies, insert
anomalies, and delete anomalies.
• Consequences:
o Update Anomalies: If data is repeated across tables, a change needs to be made in
multiple places, which can lead to inconsistency.
o Insert Anomalies: Certain data cannot be inserted unless other irrelevant data is also
inserted.
o Delete Anomalies: Deleting a record might unintentionally remove necessary data
due to improper relationships or missing foreign keys.
• Solution: Normalization and proper use of foreign keys can prevent these anomalies by
ensuring each piece of data is stored in the correct place.
3. Poor Query Performance
• Problem: A poorly designed schema can severely degrade query performance, especially
when the schema is not optimized for querying.
• Consequences:
o Slow queries: Lack of proper indexing and inefficient table structures can lead to
long-running queries that negatively impact application performance.
o Complex joins: Poorly designed schemas might require multiple, complex joins to
retrieve relevant data, increasing the computational load.
o Inefficient data retrieval: Data retrieval can become slow and inefficient if the
schema does not group related data together logically.
• Solution: Use indexes on frequently queried columns and organize the schema to minimize
the number of joins needed for common queries.
4. Difficult Maintenance and Scalability
• Problem: Bad schema designs can make maintaining and scaling the database difficult as the
application grows.
• Consequences:
o Difficult to modify: If a schema isn’t designed with future growth in mind, adding
new features or fields may require a complete overhaul of the database.
o Hard to scale: If the schema is not optimized for large volumes of data, performance
can degrade as the database grows. Queries may become slower, and the system
might not handle the increased load effectively.
o Complex modifications: As business requirements evolve, making changes to the
schema can become error-prone and time-consuming.
• Solution: Design the schema with scalability in mind, and ensure it’s flexible enough to
accommodate future changes without requiring major redesigns.
5. Inconsistent Data Integrity
• Problem: Without proper constraints (e.g., primary keys, foreign keys), a poorly designed
schema can result in data integrity issues.
• Consequences:
o Invalid data: If data integrity constraints aren’t enforced, invalid or incomplete data
can be inserted into the database, leading to data corruption.
o Broken relationships: If foreign keys aren’t properly defined, you can end up with
orphan records or invalid references.
• Solution: Implement primary keys, foreign keys, and other constraints (such as NOT NULL
and CHECK) to enforce data integrity.
6. Inflexibility for Reporting and Data Analysis
• Problem: A bad schema design can make it difficult to generate reports and perform data
analysis effectively.
• Consequences:
o Complex reporting: If the schema does not follow logical patterns, generating reports
may require complex queries, which can be time-consuming and error-prone.
o Difficulty in aggregating data: If data is not structured appropriately, it can be difficult
to aggregate and summarize data in a meaningful way.
• Solution: Use clear relationships between tables and ensure the schema is optimized for
data analysis, with proper normalization and indexing.
7. Security Issues
• Problem: A bad schema design can lead to security vulnerabilities.
• Consequences:
o Lack of access control: Without a good design, it may be difficult to implement role-
based access control or restrict access to sensitive data at the column or table level.
o Exposed sensitive data: If sensitive data (e.g., passwords, financial details) is not
stored properly (e.g., without encryption), it can be exposed to unauthorized users.
• Solution: Implement data encryption, access control, and proper data segmentation to
ensure that sensitive information is protected.
8. Overly Complex Schema
• Problem: A bad schema may be overly complex with too many tables, unnecessary
relationships, or too much normalization, making it hard to navigate and understand.
• Consequences:
o Difficult to understand: Developers and DBAs may find it hard to understand the
schema, leading to mistakes and inefficiencies.
o Increased development time: When the schema is too complex, it takes more time
to develop and maintain the application.
o Performance bottlenecks: Complex schemas with unnecessary relationships and
excessive joins can create performance issues.
• Solution: Keep the schema simple and intuitive, and apply normalization only to the level
required. Avoid overcomplicating the design with unnecessary relationships.
9. Data Duplication Across Tables
• Problem: Without careful design, the same data may be stored in multiple tables, creating
dependencies between unrelated tables.
• Consequences:
o Inconsistent updates: When data changes in one table but not in others, it creates
inconsistencies across the database.
o Increased storage requirements: Storing the same data in multiple places increases
the overall size of the database, leading to higher storage costs.
• Solution: Use foreign keys to relate tables and avoid data duplication, and make sure
normalization rules are followed to store data only once.
10. Poor User Experience
• Problem: If the database schema is inefficient or difficult to navigate, it can lead to poor
performance, slow applications, and a subpar user experience.
• Consequences:
o Slow-loading web pages or applications.
o Unresponsive queries that affect real-time interactions with the application.
• Solution: Ensure that the schema is well-optimized and can handle the load placed upon it
by the application, including using appropriate indexing and reducing unnecessary complex
queries.

23. Discuss about functional dependencies


What is Functional Dependency?
A functional dependency occurs when one attribute uniquely determines another attribute
within a relation. It is a constraint that describes how attributes in a table relate to each
other. If attribute A functionally determines attribute B we write this as the A→B.
Functional dependencies are used to mathematically express relations among database
entities and are very important to understanding advanced concepts in Relational Database
Systems.
Example:

dept_building
roll_no name dept_name

42 abc CO A4

43 pqr IT A3

44 xyz CO A4

45 xyz IT A3

46 mno EC B2
dept_building
roll_no name dept_name

47 jkl ME B2

From the above table we can conclude some valid functional dependencies:
• roll_no → { name, dept_name, dept_building }→ Here, roll_no can determine values of
fields name, dept_name and dept_building, hence a valid Functional dependency
• roll_no → dept_name , Since, roll_no can determine whole set of {name, dept_name,
dept_building}, it can determine its subset dept_name also.
• dept_name → dept_building , Dept_name can identify the dept_building accurately, since
departments with different dept_name will also have a different dept_building
• More valid functional dependencies: roll_no → name, {roll_no, name} ⇢ {dept_name,
dept_building}, etc.
Here are some invalid functional dependencies:
• name → dept_name Students with the same name can have different dept_name, hence
this is not a valid functional dependency.
• dept_building → dept_name There can be multiple departments in the same building.
Example, in the above table departments ME and EC are in the same building B2, hence
dept_building → dept_name is an invalid functional dependency.
• More invalid functional dependencies: name → roll_no, {name, dept_name} → roll_no,
dept_building → roll_no, etc.
Read more about What is Functional Dependency in DBMS ?
Types of Functional Dependencies in DBMS
1. Trivial functional dependency
2. Non-Trivial functional dependency
3. Multivalued functional dependency
4. Transitive functional dependency
1. Trivial Functional Dependency
In Trivial Functional Dependency, a dependent is always a subset of the determinant. i.e. If X
→ Y and Y is the subset of X, then it is called trivial functional dependency.
Symbolically: A→B is trivial functional dependency if B is a subset of A.
The following dependencies are also trivial: A→A & B→B
Example 1 :
• ABC -> AB
• ABC -> A
• ABC -> ABC
Example 2:

age
roll_no name

42 abc 17

43 pqr 18
age
roll_no name

44 xyz 18

Here, {roll_no, name} → name is a trivial functional dependency, since the dependent name
is a subset of determinant set {roll_no, name}. Similarly, roll_no → roll_no is also an example
of trivial functional dependency.
2. Non-trivial Functional Dependency
In Non-trivial functional dependency, the dependent is strictly not a subset of the
determinant. i.e. If X → Y and Y is not a subset of X, then it is called Non-trivial functional
dependency.
Example 1 :
• Id -> Name
• Name -> DOB
Example 2:

age
roll_no name

42 abc 17

43 pqr 18

44 xyz 18

Here, roll_no → name is a non-trivial functional dependency, since the dependent name is
not a subset of determinant roll_no. Similarly, {roll_no, name} → age is also a non-trivial
functional dependency, since age is not a subset of {roll_no, name}
3. Semi Non Trivial Functional Dependencies
A semi non-trivial functional dependency occurs when part of the dependent attribute (right-
hand side) is included in the determinant (left-hand side), but not all of it. This is a middle
ground between trivial and non-trivial functional dependencies. X -> Y is called semi non-
trivial when X intersect Y is not NULL.
Example:
Consider the following table:

Course_Name
Student_ID Course_ID

101 CSE101 Computer Science

102 CSE102 Data Structures


Course_Name
Student_ID Course_ID

103 CSE101 Computer Science

Functional Dependency:
{StudentID,CourseID}→CourseID
This is semi non-trivial because:
• Part of the dependent attribute (Course_ID) is already included in the determinant
({Student_ID, Course_ID}).
• However, the dependency is not completely trivial because {StudentID}→CourseID is not
implied directly.
4. Multivalued Functional Dependency
In Multivalued functional dependency, entities of the dependent set are not dependent on
each other. i.e. If a → {b, c} and there exists no functional dependency between b and c, then
it is called a multivalued functional dependency.
Example:

color
bike_model manuf_year

2007 Black
tu1001

2007 Red
tu1001

2008 Black
tu2012

2008 Red
tu2012

2009 Black
tu2222

2009 Red
tu2222

In this table:
• X: bike_model
• Y: color
• Z: manuf_year
For each bike model (bike_model):
1. There is a group of colors (color) and a group of manufacturing years (manuf_year).
2. The colors do not depend on the manufacturing year, and the manufacturing year does not
depend on the colors. They are independent.
3. The sets of color and manuf_year are linked only to bike_model.
That’s what makes it a multivalued dependency.
In this case these two columns are said to be multivalued dependent on bike_model. These
dependencies can be represented like this:
Read more about Multivalued Dependency in DBMS.
5. Transitive Functional Dependency
In transitive functional dependency, dependent is indirectly dependent on determinant. i.e.
If a → b & b → c, then according to axiom of transitivity, a → c. This is a transitive functional
dependency.
Example:

building_no
enrol_no name dept

42 abc CO 4

43 pqr EC 2

44 xyz IT 1

45 abc EC 2

Here, enrol_no → dept and dept → building_no. Hence, according to the axiom of
transitivity, enrol_no → building_no is a valid functional dependency. This is an indirect
functional dependency, hence called Transitive functional dependency.
6. Fully Functional Dependency
In full functional dependency an attribute or a set of attributes uniquely determines another
attribute or set of attributes. If a relation R has attributes X, Y, Z with the dependencies X->Y
and X->Z which states that those dependencies are fully functional.
Read more about Fully Functional Dependency.
7. Partial Functional Dependency
In partial functional dependency a non key attribute depends on a part of the composite key,
rather than the whole key. If a relation R has attributes X, Y, Z where X and Y are the
composite key and Z is non key attribute. Then X->Z is a partial functional dependency in
RBDMS.

24. What is Normalization and explain various Normal forms


25. Explain about 1NF, 2NF, 3NF and BCNF
26. Write about Decomposition
When we divide a table into multiple tables or divide a relation into multiple relations, then
this process is termed Decomposition in DBMS. We perform decomposition in DBMS when
we want to process a particular data set. It is performed in a database management system
when we need to ensure consistency and remove anomalies and duplicate data present in
the database. When we perform decomposition in DBMS, we must try to ensure that no
information or data is lost.
Decomposition in DBMS
Types of Decomposition
There are two types of Decomposition:
• Lossless Decomposition
• Lossy Decomposition

Types of Decomposition
Lossless Decomposition
The process in which where we can regain the original relation R with the help of joins from
the multiple relations formed after decomposition. This process is termed as lossless
decomposition. It is used to remove the redundant data from the database while retaining
the useful information. The lossless decomposition tries to ensure following things:
• While regaining the original relation, no information should be lost.
• If we perform join operation on the sub-divided relations, we must get the original relation.
Example:
There is a relation called R(A, B, C)

C
A B

55 16 27

48 52 89

Now we decompose this relation into two sub relations R1 and R2


R1(A, B)

B
A

55 16

48 52

R2(B, C)

C
B

16 27

52 89

After performing the Join operation we get the same original relation

C
A B

55 16 27

48 52 89

Lossy Decomposition
As the name suggests, lossy decomposition means when we perform join operation on the
sub-relations it doesn't result to the same relation which was decomposed. After the join
operation, we always found some extraneous tuples. These extra tuples genrates difficulty
for the user to identify the original tuples.
Example:
We have a relation R(A, B, C)

C
A B

1 2 1

2 5 3

3 3 3

Now , we decompose it into sub-relations R1 and R2


R1(A, B)

B
A

1 2

2 5

3 3

R2(B, C)

C
B

2 1

5 3

3 3

Now After performing join operation

C
A B

1 2 1
C
A B

2 5 3

2 3 3

3 5 3

3 3 3

Properties of Decomposition
• Lossless: All the decomposition that we perform in Database management system should be
lossless. All the information should not be lost while performing the join on the sub-relation
to get back the original relation. It helps to remove the redundant data from the database.
• Dependency Preservation: Dependency Preservation is an important technique in database
management system. It ensures that the functional dependencies between the entities is
maintained while performing decomposition. It helps to improve the database efficiency,
maintain consistency and integrity.
• Lack of Data Redundancy: Data Redundancy is generally termed as duplicate data or
repeated data. This property states that the decomposition performed should not suffer
redundant data. It will help us to get rid of unwanted data and focus only on the useful data
or information.

27. Write about multi-valued dependencies and 4NF


28. Explain about Denormalization
• Denormalization is a database optimization technique in which we add redundant data to
one or more tables. This can help us avoid costly joins in a relational database. Note that
denormalization does not mean ‘reversing normalization’ or ‘not to normalize’. It is an
optimization technique that is applied after normalization.
• Basically, The process of taking a normalized schema and making it non-normalized is called
denormalization, and designers use it to tune the performance of systems to support time-
critical operations.
In a traditional normalized database, we store data in separate logical tables and attempt to
minimize redundant data. We may strive to have only one copy of each piece of data in a
database.
• For example, in a normalized database, we might have a Courses table and a Teachers table.
Each entry in Courses would store the teacherID for a Course but not the teacherName.
When we need to retrieve a list of all Courses with the Teacher’s name, we would do a join
between these two tables.
• In some ways, this is great; if a teacher changes his or her name, we only have to update the
name in one place. The drawback is that if tables are large, we may spend an unnecessarily
long time doing joins on tables. Denormalization, then, strikes a different compromise.
Under denormalization, we decide that we’re okay with some redundancy and some extra
effort to update the database in order to get the efficiency advantages of fewer joins.

Step 1: Unnormalized Table


This is the starting point where all the data is stored in a single table.
What’s wrong with it?
• Redundancy: For example, “Alice” and “Math” are repeated multiple times. Similarly, “Mr.
Smith” is stored twice for the same class.
• Update Anomalies: If “Mr. Smith” changes to “Mr. Brown,” we have to update multiple rows.
Missing one row could lead to inconsistencies.
• Inefficient Storage: Repeated information takes up unnecessary space.
Step 2: Normalized Structure
To eliminate redundancy and avoid anomalies, we split the data into smaller, related tables.
This process is called normalization. Each table now focuses on a specific aspect, such as
students, classes, or subjects.
Why is this better?
• No Redundancy: “Mr. Smith” appears only once in the Classes Table, even if multiple
subjects are associated with the class.
• Easier Updates: If “Mr. Smith” changes to “Mr. Brown,” you only update the Classes Table,
and it automatically reflects everywhere.
• Efficient Storage: Repeated data is eliminated, saving space.
Step 3: Denormalized Table
In some cases, normalization can make querying complex and slow because you need to join
multiple tables to get the required information. To optimize performance, we can
denormalize the data by combining related tables into a single table.
What’s happening here?
• All related information (student name, class name, teacher, and subject) is stored in a single
table.
• This simplifies querying because you don’t need to join multiple tables.
How is Denormalization Different From Normalization ?
Normalization and Denormalization both are the method which use in database but it works
opposite to each other. One side normalization is used for reduce or removing the
redundancy which means there will be no duplicate data or entries in the same table and
also optimizes for data integrity and efficient storage, while Denormalization is used for add
the redundancy into normalized table so that enhance the functionality and minimize the
running time of database queries (like joins operation ) and optimizes for performance and
query simplicity.
In a system that demands scalability, like that of any major tech company, we almost always
use elements of both normalized and denormalized databases.
Advantages of Denormalization
• Improved Query Performance: Denormalization can improve query performance by
reducing the number of joins required to retrieve data.
• Reduced Complexity: By combining related data into fewer tables, denormalization can
simplify the database schema and make it easier to manage.
• Easier Maintenance and Updates: Denormalization can make it easier to update and
maintain the database by reducing the number of tables.
• Improved Read Performance: Denormalization can improve read performance by making it
easier to access data.
• Better Scalability: Denormalization can improve the scalability of a database system by
reducing the number of tables and improving the overall performance.
Disadvantages of Denormalization
• Reduced Data Integrity: By adding redundant data, denormalization can reduce data
integrity and increase the risk of inconsistencies.
• Increased Complexity: While denormalization can simplify the database schema in some
cases, it can also increase complexity by introducing redundant data.
• Increased Storage Requirements: By adding redundant data, denormalization can increase
storage requirements and increase the cost of maintaining the database.
• Increased Update and Maintenance Complexity: Denormalization can increase the
complexity of updating and maintaining the database by introducing redundant data.
• Limited Flexibility: Denormalization can reduce the flexibility of a database system by
introducing redundant data and making it harder to modify the schema.

29. Write about Armstrong's axioms for Functional dependencies


Armstrong’s Axioms refer to a set of inference rules, introduced by William W. Armstrong,
that are used to test the logical implication of functional dependencies. Given a set of
functional dependencies F, the closure of F (denoted as F+) is the set of all functional
dependencies logically implied by F. Armstrong’s Axioms, when applied repeatedly, help
generate the closure of functional dependencies.
These axioms are fundamental in determining functional dependencies in databases and are
used to derive conclusions about the relationships between attributes.
Axioms

Axioms
• Axiom of Reflexivity: If A is a set of attributes and B is a subset of A, then A holds B. If B⊆A
then A→B. This property is trivial property.
• Axiom of Augmentation: If A→B holds and Y is the attribute set, then AY→BY also holds.
That is adding attributes to dependencies, does not change the basic dependencies. If A→B,
then AC→BC for any C.
• Axiom of Transitivity: Same as the transitive rule in algebra, if A→B holds and B→C holds,
then A→C also holds. A→B is called A functionally which determines B. If X→Y and Y→Z,
then X→Z.
Example:
Let’s assume the following functional dependencies:
{A} → {B}
{B} → {C}
{A, C} → {D}
1. Reflexivity: Since any set of attributes determines its subset, we can immediately infer the
following:
• {A} → {A} (A set always determines itself).
• {B} → {B}.
• {A, C} → {A}.
2. Augmentation: If we know that {A} → {B}, we can add the same attribute (or set of
attributes) to both sides:
• From {A} → {B}, we can augment both sides with {C}: {A, C} → {B, C}.
• From {B} → {C}, we can augment both sides with {A}: {A, B} → {C, B}.
3. Transitivity: If we know {A} → {B} and {B} → {C}, we can infer that:
• {A} → {C} (Using transitivity: {A} → {B} and {B} → {C}).
Although Armstrong’s axioms are sound and complete, there are additional rules for
functional dependencies that are derived from them. These rules are introduced to simplify
operations and make the process easier.
Secondary Rules
These rules can be derived from the above axioms.
• Union: If A→B holds and A→C holds, then A→BC holds. If X→Y and X→Z then X→YZ.
• Composition: If A→B and X→Y hold, then AX→BY holds.
• Decomposition: If A→BC holds then A→B and A→C hold. If X→YZ then X→Y and X→Z.
• Pseudo Transitivity: If A→B holds and BC→D holds, then AC→D holds.
If X→Y and YZ→W then XZ→W.
Example:
Let’s assume we have the following functional dependencies in a relation schema:
{A} → {B}
{A} → {C}
{X} → {Y}
{Y, Z} → {W}
Now, let’s apply the Secondary Rules to derive new functional dependencies.
1. Union Rule: If A → B and A → C, then by the Union Rule, we can infer:
• A → BC This means if A determines both B and C, it also determines their combination, BC.
2. Composition Rule: If A → B and X → Y hold, then by the Composition Rule, we can infer:
• AX → BY
3. Decomposition Rule: If A → BC holds, then by the Decomposition Rule, we can infer:
• A → B and A → C
4. Pseudo Transitivity Rule: If A → B and BC → D hold, then by the Pseudo Transitivity Rule,
we can infer:
• AC → D

30. Write about join dependencies


Join Dependency (JD) can be illustrated as when the relation R is equal to the join of the sub-
relations R1, R2,..., and Rn are present in the database. Join Dependency arises when the
attributes in one relation are dependent on attributes in another relation, which means
certain rows will exist in the table if there is the same row in another table. Multiple tables
are joined to create a single table where one of the attributes is common in the sub-tables.
We can also relate the join dependency to the 5th Normal Form. A join dependency is said to
be not that important if any relational schemas in the join dependency are equivalent to the
original relation R.
Join dependency on a database is denoted by:
R1 ⨝ R2 ⨝ R3 ⨝ ..... ⨝ Rn ;
where R1 , R2, ... , Rn are the relations and ⨝ represents the natural join operator.
Types of Join Dependency
There are two types of Join Dependencies:
• Lossless Join Dependency: It means that whenever the join occurs between the tables, then
no information should be lost, the new table must have all the content in the original table.
• Lossy Join Dependency: In this type of join dependency, data loss may occur at some point in
time which includes the absence of a tuple from the original table or duplicate tuples within
the database.
Example of Join Dependency
Suppose you have a table having stats of the company, this can be decomposed into sub-
tables to check for the join dependency among them. Below is the depiction of a table
Company_Stats having attributes Company, Product, Agent. Then we created sub-tables R1
with attributes Company & Product and R2 with attributes Product & Agent. When we join
them we should get the exact same attributes as the original table.
Table: Company_Stats

Agent
Company Product

C1 TV Aman

C1 AC Aman

C2 Refrigerator Mohan

C2 TV Mohit

Table: R1

Product
Company

C1 TV

C1 AC

C2 Refrigerator

C2 TV

Table: R2
Agent
Product

TV Aman

AC Aman

Refrigerator Mohan

TV Mohit

On performing join operation between R1 & R2:


R1 ⨝ R2

Agent
Company Product

C1 TV Aman

C1 TV Mohan

C1 AC Aman

C2 Refrigerator Mohan

C2 TV Aman

C2 TV Mohit

Here, we can see that we got two additional tuples after performing join i.e. (C1, TV, Mohan)
& (C2, TV, Aman) these tuples are known as Spurious Tuple, which is not the property of Join
Dependency. Therefore, we will create another relation R3 and perform its natural join with
(R1 ⨝ R2). So, here it is:
Table: R3

Agent
Company

C1 Aman
Agent
Company

C2 Mohan

C2 Mohit

Now on doing natural join of (R1 ⨝ R2 ) ⨝ R3, we get

Agent
Company Product

C1 TV Aman

C1 AC Aman

C2 Refrigerator Mohan

C2 TV Mohit

Now, we got our original relation, that we had earlier decomposed, in this way you can
decompose the original relation and check for the join dependency among them.
Importance of Join Dependencies
Join dependency can be very important for several reasons as it helps in maintaining data
integrity, possess normalization, helps in query optimization within a database. Let us see
each point in a detail:
• Data Integrity: Join Dependency helps maintain data integrity in a database. Database
designers can make sure that the queries are consistent after checking for the dependencies.
Like in lossless join dependency no information is lost, which means data is accurate. This will
remove the data that is not accurate. Similarly, a join dependency is a constraint that
maintains data integrity.
• Query Optimization: Query optimization leads to improving the performance of the
database system. The database designers can choose the best join order to execute the
queries which in turn reduces the computational costs, memory utilization and i/o
operations to get the queries executed quickly.

31. Discuss about Transaction Management


Transactions are a set of operations used to perform a logical set of work. A transaction
usually means that the data in the database has changed. One of the major uses of DBMS is
to protect the user data from system failures. It is done by ensuring that all the data is
restored to a consistent state when the computer is restarted after a crash. The transaction is
any one execution of the user program in a DBMS. One of the important properties of the
transaction is that it contains a finite number of steps. Executing the same program multiple
times will generate multiple transactions.
Example: Consider the following example of transaction operations to be performed to
withdraw cash from an ATM vestibule.
Steps for ATM Transaction
1. Transaction Start.
2. Insert your ATM card.
3. Select a language for your transaction.
4. Select the Savings Account option.
5. Enter the amount you want to withdraw.
6. Enter your secret pin.
7. Wait for some time for processing.
8. Collect your Cash.
9. Transaction Completed.
A transaction can include the following basic database access operation.
• Read/Access data (R): Accessing the database item from disk (where the database stored
data) to memory variable.
• Write/Change data (W): Write the data item from the memory variable to the disk.
• Commit: Commit is a transaction control language that is used to permanently save the
changes done in a transaction
Example: Transfer of 50₹ from Account A to Account B. Initially A= 500₹, B= 800₹. This data
is brought to RAM from Hard Disk.
R(A) -- 500 // Accessed from RAM.
A = A-50 // Deducting 50₹ from A.
W(A)--450 // Updated in RAM.
R(B) -- 800 // Accessed from RAM.
B=B+50 // 50₹ is added to B's Account.
W(B) --850 // Updated in RAM.
commit // The data in RAM is taken back to Hard Disk.

Stages of Transaction
Note: The updated value of Account A = 450₹ and Account B = 850₹.
All instructions before committing come under a partially committed state and are stored in
RAM. When the commit is read the data is fully accepted and is stored on a Hard Disk.
If the transaction is failed anywhere before committing we have to go back and start from
the beginning. We can’t continue from the same state. This is known as Roll Back.
Desirable Properties of Transaction (ACID Properties)
Transaction management in a Database Management System (DBMS) ensures that database
transactions are executed reliably and follow ACID properties: Atomicity, Consistency,
Isolation, and Durability. These principles help maintain data integrity, even during failures
or concurrent user interactions, ensuring that all transactions are either fully completed or
rolled back if errors occur.
For a transaction to be performed in DBMS, it must possess several properties often
called ACID properties.
• A – Atomicity
• C – Consistency
• I – Isolation
• D – Durability
Transaction States
Transactions can be implemented using SQL queries and Servers. In the diagram, you can see
how transaction states work.

Transaction States
The transaction has four properties. These are used to maintain consistency in a database,
before and after the transaction.
Property of Transaction:
• Atomicity
• Consistency
• Isolation
• Durability
Atomicity
• States that all operations of the transaction take place at once if not, the transactions are
aborted.
• There is no midway, i.e., the transaction cannot occur partially. Each transaction is treated as
one unit and either run to completion or is not executed at all.
• Atomicity involves the following two operations:
• Abort: If a transaction stops or fails, none of the changes it made will be saved or visible.
• Commit: If a transaction completes successfully, all the changes it made will be saved and
visible.
Consistency
• The rules (integrity constraint) that keep the database accurate and consistent are followed
before and after a transaction.
• When a transaction is completed, it leaves the database either as it was before or in a new
stable state.
• This property means every transaction works with a reliable and consistent version of the
database.
• The transaction is used to transform the database from one consistent state to another
consistent state. A transaction changes the database from one consistent state to another
consistent state.
Isolation
• It shows that the data which is used at the time of execution of a transaction cannot be used
by the second transaction until the first one is completed.
• In isolation, if the transaction T1 is being executed and using the data item X, then that data
item can’t be accessed by any other transaction T2 until the transaction T1ends.
• The concurrency control subsystem of the DBMS enforced the isolation property
Durability
• The durability property is used to indicate the performance of the database’s consistent
state. It states that the transaction made the permanent changes.
• They cannot be lost by the erroneous operation of a faulty transaction or by the system
failure. When a transaction is completed, then the database reaches a state known as the
consistent state. That consistent state cannot be lost, even in the event of a system’s failure.
• The recovery subsystem of the DBMS has the responsibility of Durability property.
Implementing of Atomicity and Durability
The recovery-management component of a database system can support atomicity and
durability by a variety of schemes. E.g. the shadow-database scheme:
Shadow copy
• In the shadow-copy scheme, a transaction that wants to update the database first creates a
complete copy of the database.
• All updates are done on the new database copy, leaving the original copy, the shadow copy,
untouched. If at any point the transaction has to be aborted, the system merely deletes the
new copy. The old copy of the database has not been affected.
• This scheme is based on making copies of the database, called shadow copies, assumes that
only one transaction is active at a time.
• The scheme also assumes that the database is simply a file on disk. A pointer called db
pointer is maintained on disk, It points to the current copy of the database.
Transaction Isolation Levels in DBMS
Some other transaction may also have used value produced by the failed transaction. So we
also have to rollback those transactions. The SQL standard defines four isolation levels:
• Read Uncommitted: Read Uncommitted is the lowest isolation level. In this level, one
transaction may read not yet committed changes made by other transaction, there by
allowing dirty reads. In this level, transactions are not isolated from each other.
• Read Committed: This isolation level guarantees that any data read is committed at the
moment it is read. Thus it does not allows dirty read. The transaction holds a read or write
lock on the current row, and thus prevent other transactions from reading, updating or
deleting it.
• Repeatable Read: This is the most restrictive isolation level. The transaction holds locks on
all rows it references and writes locks on all rows it inserts, updates, deletes. Since other
transaction cannot read, update or delete these rows, consequently it
avoids non-repeatable read.
• Serializable: This is the Highest isolation level. A serializable execution is guaranteed to be
serializable. Serializable execution is defined to be an execution of operations in which
concurrently executing transactions appears to be serially executing.
Failure Classification
To find that where the problem has occurred, we generalize a failure into the following
categories:
• Transaction failure
• System crash
• Disk failure
1. Transaction failure
The transaction failure occurs when it fails to execute or when it reaches a point from where
it can’t go any further. If a few transactions or process is hurt, then this is called as
transaction failure.
Reasons for a transaction failure could be –
1. Logical errors: If a transaction cannot complete due to some code error or an internal error
condition, then the logical error occurs.
2. Syntax error: It occurs where the DBMS itself terminates an active transaction because the
database system is not able to execute it. For example, The system aborts an active
transaction, in case of deadlock or resource unavailability.
2. System Crash
System failure can occur due to power failure or other hardware or software failure.
Example: Operating system error.
• Fail-stop assumption: In the system crash, non-volatile storage is assumed not to be
corrupted.
3. Disk Failure
• It occurs where hard-disk drives or storage drives used to fail frequently. It was a common
problem in the early days of technology evolution.
• Disk failure occurs due to the formation of bad sectors, disk head crash, and unreachability
to the disk or any other failure, which destroy all or part of disk storage.
Serializability
It is an important aspect of Transactions. In simple meaning, you can say that serializability is
a way to check whether two transactions working on a database are maintaining database
consistency or not.
It is of two types:
1. Conflict Serializability
2. View Serializability
Schedule
Schedule, as the name suggests is a process of lining the transactions and executing them
one by one. When there are multiple transactions that are running in a concurrent manner
and the order of operation is needed to be set so that the operations do not overlap each
other, Scheduling is brought into play and the transactions are timed accordingly.
It is of two types:
1. Serial Schedule
2. Non-Serial Schedule
Uses of Transaction Management
• The DBMS is used to schedule the access of data concurrently. It means that the user can
access multiple data from the database without being interfered with by each other.
Transactions are used to manage concurrency.
• It is also used to satisfy ACID properties.
• It is used to solve Read/Write Conflicts.
• It is used to implementRecoverability , Serializability , and Cascading.
• Transaction Management is also used forConcurrency Control Protocols and the Locking of
data.
Advantages of using a Transaction
• Maintains a consistent and valid database after each transaction.
• Makes certain that updates to the database don’t affect its dependability or accuracy.
• Enables simultaneous use of numerous users without sacrificing data consistency.
Disadvantages of using a Transaction
• It may be difficult to change the information within the transaction database by end-users.
• We need to always roll back and start from the beginning rather than continue from the
previous state.

32. Write the properties of transaction


Properties of transactions
Transactions provide the ACID properties:
• Atomicity. The changes in a transaction are atomic: either all operations that are part of the
transaction occur or none occurs.
• Consistency. A transaction moves data between consistent states.
• Isolation. Even though transactions can be executed concurrently, no transaction sees
another transaction's work in progress. The transactions seem to run serially.
• Durability. After a transaction completes successfully, its changes survive subsequent
failures.
For example, consider a transaction that transfers money from one account to another. In
such a transfer, money is removed from one account and put into the other. These actions
are two parts of an atomic transaction; that is, if both cannot be completed, neither must
happen. If multiple requests are processed against an account at the same time, they must
be isolated so that only a single transaction can affect the account at one time. If the central
computer of the bank fails just after the transfer, the correct balance must still be shown
when the system becomes available again; that is, the change must be durable. Note
that consistency is a function of the application; if money is to be transferred from one
account to another, the application must subtract the same amount of money from one
account that it adds to the other account.
Transactions can be completed in one of two ways: they can commit or abort. A successful
transaction is said to commit. An unsuccessful transaction is said to abort. Any data
modifications that are made by an aborted transaction must be completely undone (rolled
back). In the above example, if money is removed from one account but a failure prevents
the money from being put into the other account, any changes that are made to the first
account must be completely undone. The next time any source queries the account balance,
the correct balance must be shown.

33. Explain various States of Transaction


What is a Transaction State?
A transaction is a set of operations or tasks performed to complete a logical process, which
may or may not change the data in a database. To handle different situations, like system
failures, a transaction is divided into different states.
A transaction state refers to the current phase or condition of a transaction during its
execution in a database. It represents the progress of the transaction and determines
whether it will successfully complete (commit) or fail (abort).
A transaction involves two main operations:
1. Read Operation: Reads data from the database, stores it temporarily in memory (buffer), and
uses it as needed.
2. Write Operation: Updates the database with the changed data using the buffer.
From the start of executing instructions to the end, these operations are treated as a single
transaction. This ensures the database remains consistent and reliable throughout the
process.
Different Types of Transaction States in DBMS

Transaction States
These are different types of Transaction States :
1. Active State – This is the first stage of a transaction, when the transaction’s instructions
are being executed.
• It is the first stage of any transaction when it has begun to execute. The execution of the
transaction takes place in this state.
• Operations such as insertion, deletion, or updation are performed during this state.
• During this state, the data records are under manipulation and they are not saved to the
database, rather they remain somewhere in a buffer in the main memory.
2. Partially Committed –
• The transaction has finished its final operation, but the changes are still not saved to the
database.
• After completing all read and write operations, the modifications are initially stored in main
memory or a local buffer. If the changes are made permanent on the DataBase then the state
will change to “committed state” and in case of failure it will go to the “failed state”.
3. Failed State –If any of the transaction-related operations cause an error during the active
or partially committed state, further execution of the transaction is stopped and it is brought
into a failed state. Here, the database recovery system makes sure that the database is in a
consistent state.
5. Aborted State- If a transaction reaches the failed state due to a failed check, the database
recovery system will attempt to restore it to a consistent state. If recovery is not possible, the
transaction is either rolled back or cancelled to ensure the database remains consistent.
In the aborted state, the DBMS recovery system performs one of two actions:
• Kill the transaction: The system terminates the transaction to prevent it from affecting other
operations.
• Restart the transaction: After making necessary adjustments, the system reverts the
transaction to an active state and attempts to continue its execution.
6. Commuted- This state of transaction is achieved when all the transaction-related
operations have been executed successfully along with the Commit operation, i.e. data is
saved into the database after the required manipulations in this state. This marks the
successful completion of a transaction.
7. Terminated State – If there isn’t any roll-back or the transaction comes from the
“committed state”, then the system is consistent and ready for new transaction and the old
transaction is terminated.
Example of Transaction States
Imagine a bank transaction where a user wants to transfer $500 from Account A to Account
B.
Transaction States:
1. Active State:
The transaction begins. It reads the balance of Account A and checks if it has enough funds.
• Example: Read balance of Account A = $1000.
2. Partially Committed State:
The transaction performs all its operations but hasn’t yet saved (committed) the changes to
the database.
• Example: Deduct $500 from Account A’s balance ($1000 – $500 = $500) and
temporarily update Account B’s balance (add $500).
3. Committed State:
The transaction successfully completes, and the changes are saved permanently in the
database.
• Example: Account A’s new balance = $500; Account B’s new balance = $1500.
Changes are written to the database.
4. Failed State:
If something goes wrong during the transaction (e.g., power failure, system crash), the
transaction moves to this state.
• Example: System crashes after deducting $500 from Account A but before adding it
to Account B.
5. Aborted State:
The failed transaction is rolled back, and the database is restored to its original state.
• Example: Account A’s balance is restored to $1000, and no changes are made to
Account B.

34. Discuss about Concurrency Control in DBMS


In a database management system (DBMS), allowing transactions to run concurrently has
significant advantages, such as better system resource utilization and higher throughput.
However, it is crucial that these transactions do not conflict with each other. The ultimate
goal is to ensure that the database remains consistent and accurate. For instance, if two
users try to book the last available seat on a flight at the same time, the system must ensure
that only one booking [Link] control is a critical mechanism in DBMS that
ensures the consistency and integrity of data when multiple operations are performed at the
same time.
• Concurrency control is a concept in Database Management Systems (DBMS) that ensures
multiple transactions can simultaneously access or modify data without causing errors or
inconsistencies. It provides mechanisms to handle the concurrent execution in a way that
maintains ACID properties.
• By implementing concurrency control, a DBMS allows transactions to execute concurrently
while avoiding issues such as deadlocks, race conditions, and conflicts between operations.
• The main goal of concurrency control is to ensure that simultaneous transactions do not lead
to data conflicts or violate the consistency of the database. The concept of serializability is
often used to achieve this goal.
In this article, we will explore the various concurrency control techniques in DBMS,
understand their importance, and learn how they enable reliable and efficient database
operations.
Concurrent Execution and Related Challenges in DBMS
In a multi-user system, several users can access and work on the same database at the same
time. This is known as concurrent execution, where the database is used simultaneously by
different users for various operations. For instance, one user might be updating data while
another is retrieving it.
When multiple transactions are performed on the database simultaneously, it is important
that these operations are executed in an interleaved manner. This means that the actions of
one user should not interfere with or affect the actions of another. This helps in maintaining
the consistency of the database. However, managing such simultaneous operations can be
challenging, and certain problems may arise if not handled properly. These challenges need
to be addressed to ensure smooth and error-free concurrent execution.
Concurrent Execution can lead to various challenges:
• Dirty Reads: One transaction reads uncommitted data from another transaction, leading to
potential inconsistencies if the changes are later rolled back.
• Lost Updates: When two or more transactions update the same data simultaneously, one
update may overwrite the other, causing data loss.
• Inconsistent Reads: A transaction may read the same data multiple times during its
execution, and the data might change between reads due to another transaction, leading to
inconsistency.
To read more about Concurrency Problems in DBMS Transactions Refer, Here.
Why is Concurrency Control Needed?
Consider the following example:
• Without Concurrency Control: Transactions interfere with each other, causing issues like lost
updates, dirty reads or inconsistent results.
• With Concurrency Control: Transactions are properly managed (e.g., using locks or
timestamps) to ensure they execute in a consistent, isolated manner, preserving data
accuracy.
Concurrency control is critical to maintaining the accuracy and reliability of databases in
multi-user environments. By preventing conflicts and inconsistencies during concurrent
transactions, it ensures the database remains consistent and correct, even under high levels
of simultaneous activity.
Concurrency Control Protocols
Concurrency control protocols are the set of rules which are maintained in order to solve the
concurrency control problems in the database. It ensures that the concurrent transactions
can execute properly while maintaining the database consistency. The concurrent execution
of a transaction is provided with atomicity, consistency, isolation, durability, and
serializability via the concurrency control protocols.

• Locked based concurrency control protocol


• Timestamp based concurrency control protocol
Cascadeless and Recoverable Schedules in Concurrency Control
1. Recoverable Schedules
• A recoverable schedule ensures that a transaction commits only if all the transactions it
depends on have committed. This avoids situations where a committed transaction depends
on an uncommitted transaction that later fails, leading to inconsistencies.
o Concurrency control ensures recoverable schedules by keeping track of which
transactions depend on others. It makes sure a transaction can only commit if all the
transactions it relies on have already committed successfully. This prevents issues
where a committed transaction depends on one that later fails.
o Techniques like strict two-phase locking (2PL) enforce recoverability by delaying the
commit of dependent transactions until the parent transactions have safely
committed.
2. Cascadeless Schedules
• A cascadeless schedule avoids cascading rollbacks, which occur when the failure of one
transaction causes multiple dependent transactions to fail.
o Concurrency control techniques such as strict 2PL or timestamp ordering ensure
cascadeless schedules by ensuring dependent transactions only access committed
data.
o By delaying read or write operations until the transaction they depend on has
committed, cascading rollbacks are avoided.
To read more about different types of schedules based on Recoverability Refer, Here.
Advantages of Concurrency
In general, concurrency means that more than one transaction can work on a system. The
advantages of a concurrent system are:
• Waiting Time: It means if a process is in a ready state but still the process does not get the
system to get execute is called waiting time. So, concurrency leads to less waiting time.
• Response Time: The time wasted in getting the response from the CPU for the first time, is
called response time. So, concurrency leads to less Response Time.
• Resource Utilization: The amount of Resource utilization in a particular system is called
Resource Utilization. Multiple transactions can run parallel in a system. So, concurrency leads
to more Resource Utilization.
• Efficiency: The amount of output produced in comparison to given input is called efficiency.
So, Concurrency leads to more Efficiency.
Disadvantages of Concurrency
• Overhead: Implementing concurrency control requires additional overhead, such as
acquiring and releasing locks on database objects. This overhead can lead to slower
performance and increased resource consumption, particularly in systems with high levels of
concurrency.
• Deadlocks: Deadlocks can occur when two or more transactions are waiting for each other to
release resources, causing a circular dependency that can prevent any of the transactions
from completing. Deadlocks can be difficult to detect and resolve, and can result in reduced
throughput and increased latency.
• Reduced concurrency: Concurrency control can limit the number of users or applications
that can access the database simultaneously. This can lead to reduced concurrency and
slower performance in systems with high levels of concurrency.
• Complexity: Implementing concurrency control can be complex, particularly in distributed
systems or in systems with complex transactional logic. This complexity can lead to increased
development and maintenance costs.
• Inconsistency: In some cases, concurrency control can lead to inconsistencies in the
database. For example, a transaction that is rolled back may leave the database in an
inconsistent state, or a long-running transaction may cause other transactions to wait for
extended periods, leading to data staleness and reduced accuracy.

35. What are Concurrency Control Protocols


Concurrency Control Protocols are mechanisms used in database management systems
(DBMS) to manage simultaneous operations on the database in a way that ensures data
consistency, isolation, and integrity, even when multiple transactions are executing
concurrently. These protocols are essential in multi-user environments where more than one
transaction may be attempting to access or modify the same data at the same time.
The goal of Concurrency Control is to ensure that the database remains in a consistent state
while allowing multiple transactions to proceed concurrently, thus improving system
performance and throughput. However, concurrency can lead to issues like lost updates,
temporary inconsistency, uncommitted data, and deadlocks, so it's crucial to use effective
concurrency control mechanisms.
Here are the main Concurrency Control Protocols:
1. Lock-Based Protocols
Lock-based protocols are the most common approach to concurrency control. They work by
assigning locks to data items before transactions can read or write to them. The key idea is to
ensure that only one transaction can modify a particular piece of data at a time.
Types of Locks:
• Shared Lock (S-lock): Allows a transaction to read the data but prevents other transactions
from modifying it. Multiple transactions can hold shared locks on the same data item
simultaneously.
• Exclusive Lock (X-lock): Allows a transaction to both read and modify the data, and prevents
other transactions from accessing the data in any way (either for reading or writing).
Locking Protocols:
• Two-Phase Locking (2PL): A popular protocol where each transaction follows two phases:
o Growing Phase: The transaction can acquire locks but cannot release any locks.
o Shrinking Phase: The transaction can release locks but cannot acquire any new locks.
o Guarantee: Two-Phase Locking ensures serializability, meaning the final outcome of
the transactions will be equivalent to some serial order.
• Strict Two-Phase Locking (Strict 2PL): A variation of 2PL where a transaction holds all of its
locks until it commits or aborts. This guarantees recoverability, ensuring that no transaction
can read uncommitted data (also known as dirty reads).
• Deadlock Prevention and Detection: Since locking mechanisms can lead to deadlocks
(where two or more transactions are waiting for each other to release locks), techniques like
deadlock detection (periodically checking for deadlocks and aborting a transaction) and
deadlock prevention (preventing the situation where deadlocks could occur by careful
scheduling of lock requests) are used.

2. Timestamp-Based Protocols
Timestamp-based protocols use a global timestamp to order transactions and determine the
order in which they can access data. Each transaction is given a unique timestamp when it
starts. The idea is to prevent conflicts by enforcing an order in which transactions can access
the data based on their timestamps.
Types of Timestamp Protocols:
• Basic Timestamp Ordering (TO): Each transaction is assigned a timestamp when it begins.
The protocol ensures that the operations (reads and writes) are performed in the order of
their timestamps. Specifically:
o A read operation by a transaction is allowed only if no later transaction has written
to the same data item.
o A write operation by a transaction is allowed only if no later transaction has read or
written to the same data item.
• Thomas’ Write Rule: A variation of the basic timestamp ordering protocol that allows certain
writes to be ignored if they are no longer relevant (i.e., the data has already been
overwritten by a transaction with a later timestamp). This helps improve efficiency without
sacrificing consistency.
Advantage:
• No Locks: Timestamp-based protocols don't require locks, so they avoid the possibility of
deadlocks.
Disadvantage:
• Rollback Overhead: Transactions might need to be rolled back and restarted if they violate
the timestamp order, leading to additional overhead.

3. Optimistic Concurrency Control (OCC)


Optimistic Concurrency Control is based on the assumption that conflicts between
transactions are rare. Rather than locking data, transactions are allowed to execute without
restrictions and only check for conflicts at the end of their execution (during the validation
phase).
Phases of OCC:
1. Read Phase: A transaction reads the data it needs and performs its operations without
acquiring any locks.
2. Validation Phase: Before committing, the transaction checks whether any of the data it
modified has been changed by other transactions in the meantime. If there is a conflict, the
transaction is rolled back.
3. Write Phase: If no conflict is detected, the transaction commits and writes its changes to the
database.
Advantage:
• No Blocking: Transactions don’t block each other, allowing high throughput in low-
contention environments.
Disadvantage:
• Rollback Overhead: In high-contention environments, many transactions may need to be
rolled back due to conflicts, causing performance issues.

4. Multi-Version Concurrency Control (MVCC)


Multi-Version Concurrency Control allows multiple versions of a data item to exist
concurrently. Instead of blocking transactions from accessing a data item, MVCC ensures that
each transaction operates on its own snapshot of the database, allowing for greater
concurrency.
Key Concepts:
• Versioning: Every time a data item is modified, a new version is created, and the old version
is preserved. Each transaction accesses the version of the data that was current when it
began (its snapshot).
• Transaction Timestamps: Each transaction has a timestamp, and each version of data is
marked with a timestamp that indicates the transaction that created it.
Advantage:
• High Concurrency: Since transactions work on different versions, there’s no need for locks,
which significantly reduces contention.
• No Deadlocks: MVCC avoids deadlocks because transactions do not block each other.
Disadvantage:
• Storage Overhead: Storing multiple versions of data can increase storage requirements, and
there might be a need for periodic cleanup of obsolete versions.

5. Serializability and Its Guarantees


The ultimate goal of any concurrency control protocol is to ensure that the final state of the
database is serializable, meaning that the result of executing concurrent transactions should
be the same as if the transactions were executed in some serial order (one after the other),
without any overlap.
Guarantees Provided by Concurrency Control Protocols:
• Serializability: Ensures that the outcome of concurrent transactions is the same as if they
were executed one by one in some sequence.
• Recoverability: Ensures that the database can recover to a consistent state in case of a
transaction failure (e.g., through undo or redo operations).
• Conflict Serializability: Ensures that transactions can be reordered to form a serial schedule
without causing conflicts (e.g., violating data integrity).

Comparison of Concurrency Control Protocols


Rollb
Protoc Lockin Throu ack Compl
Deadlock exity
ol g ghput Over
head
Lock-
Yes Yes (need Mode
Based Moder
(locks detection/pre Low rate
Protoc ate
data) vention)
ols
Timest High
No
amp- (due Mode
(uses No (no
Based High to rate
timest blocking)
Protoc rollb
amps)
ols acks)
Optimi
High
stic High
(due
Concur No (no No (no (low Low
to
rency locks) blocking) conten
rollb
Contro tion)
acks)
l
Multi-
Versio
n
No
Concur
(uses No (no Very High
rency Low
version blocking) High
Contro
s)
l
(MVCC
)

36. Explain about Lock-Based Protocols


A lock is a variable associated with a data item that indicates whether it is currently in use or
available for other operations. Locks are essential for managing access to data during
concurrent transactions. When one transaction is accessing or modifying a data item, a lock
ensures that other transactions cannot interfere with it, maintaining data integrity and
preventing conflicts. This process, known as locking, is a widely used method to ensure
smooth and consistent operation in database systems.
Lock Based Protocols
Lock-Based Protocols in DBMS ensure that a transaction cannot read or write data until it
gets the necessary lock. Here’s how they work:
• These protocols prevent concurrency issues by allowing only one transaction to access a
specific data item at a time.
• Locks help multiple transactions work together smoothly by managing access to the
database items.
• Locking is a common method used to maintain the serializability of transactions.
• A transaction must acquire a read lock or write lock on a data item before performing any
read or write operations on it.
Types of Lock
1. Shared Lock (S): Shared Lock is also known as Read-only lock. As the name suggests it can be
shared between transactions because while holding this lock the transaction does not have
the permission to update data on the data item. S-lock is requested using lock-S instruction.
2. Exclusive Lock (X): Data item can be both read as well as written. This is Exclusive and cannot
be held simultaneously on the same data item. X-lock is requested using lock-X instruction.
Read more about Types of Locks.
Rules of Locking
The basic rules for Locking are given below :
Read Lock (or) Shared Lock(S)
❖ If a Transaction has a Read lock on a data item, it can read the item but not update it.
❖ If a transaction has a Read lock on the data item, other transaction can obtain Read Lock
on the data item but no Write Locks.
❖ So, the Read Lock is also called a Shared Lock.
Write Lock (or) Exclusive Lock (X)
❖ If a transaction has a write Lock on a data item, it can both read and update the data item.
❖ If a transaction has a write Lock on the data item, then other transactions cannot obtain
either a Read lock or write lock on the data item.
❖ So, the Write Lock is also known as Exclusive Lock.
Lock Compatibility Matrix
• A transaction can acquire a lock on a data item only if the requested lock is compatible with
existing locks held by other transactions.
• Shared Locks (S): Multiple transactions can hold shared locks on the same data item
simultaneously.
• Exclusive Lock (X): If a transaction holds an exclusive lock on a data item, no other
transaction can hold any type of lock on that item.
• If a requested lock is not compatible, the requesting transaction must wait until all
incompatible locks are released by other transactions.
• Once the incompatible locks are released, the requested lock is granted.
Compatibility Matrix
Concurrency Control Protocols
Concurrency Control Protocols are the methods used to manage multiple transactions
happening at the same time. They ensure that transactions are executed safely without
interfering with each other, maintaining the accuracy and consistency of the database.
These protocols prevent issues like data conflicts, lost updates or inconsistent data by
controlling how transactions access and modify data.
Types of Lock-Based Protocols
1. Simplistic Lock Protocol
It is the simplest method for locking data during a transaction. Simple lock-based protocols
enable all transactions to obtain a lock on the data before inserting, deleting, or updating it.
It will unlock the data item once the transaction is completed.
Example:
Consider a database with a single data item X = 10.
Transactions:
• T1: Wants to read and update X.
• T2: Wants to read X.
Steps:
1. T1 requests an exclusive lock on X to update its value. The lock is granted.
• T1 reads X = 10 and updates it to X = 20.
2. T2 requests a shared lock on X to read its value. Since T1 is holding an exclusive lock, T2 must
wait.
3. T1 completes its operation and releases the lock.
4. T2 now gets the shared lock and reads the updated value X = 20.
This example shows how simplistic lock protocols handle concurrency but do not prevent
problems like deadlocks or limits concurrency.
2. Pre-Claiming Lock Protocol
The Pre-Claiming Lock Protocol evaluates a transaction to identify all the data items that
require locks. Before the transaction begins, it requests the database management system to
grant locks on all necessary data elements. If all the requested locks are successfully
acquired, the transaction proceeds. Once the transaction is completed, all locks are released.
However, if any of the locks are unavailable, the transaction rolls back and waits until all
required locks are granted before restarting.
Example:
Consider two transactions T1 and T2 and two data items, X and Y:
1. Transaction T1 declares that it needs:
• A write lock on X.
• A read lock on Y.
Since both locks are available, the system grants them. T1 starts execution:
• It updates X.
• It reads the value of Y.
2. While T1 is executing, Transaction T2 declares that it needs:
• A read lock on X.
However, since T1 already holds a write lock on X, T2’s request is denied. T2 must wait until
T1 completes its operations and releases the locks.
3. Once T1 finishes, it releases the locks on X and Y. The system now grants the read lock
on X to T2, allowing it to proceed.
This method is simple but may lead to inefficiency in systems with a high number of
transactions.
3. Two-phase locking (2PL)
A transaction is said to follow the Two-Phase Locking protocol if Locking and Unlocking can
be done in two phases :
• Growing Phase: New locks on data items may be acquired but none can be released.
• Shrinking Phase: Existing locks may be released but no new locks can be acquired.
For more detail refer the article Two-phase locking (2PL).
4. Strict Two-Phase Locking Protocol
Strict Two-Phase Locking requires that in addition to the 2-PL all Exclusive(X) locks held by
the transaction be released until after the Transaction Commits.
For more details refer the article Strict Two-Phase Locking Protocol.
Problem With Simple Locking
Consider the Partial Schedule:

T2
[Link] T1

1 lock-X(B)

2 read(B)

3 B:=B-50

4 write(B)

5 lock-S(A)

6 read(A)

7 lock-S(B)
T2
[Link] T1

8 lock-X(A)

9 …… ……

1. Deadlock
In the given execution scenario, T1 holds an exclusive lock on B, while T2 holds a shared lock
on A. At Statement 7, T2 requests a lock on B, and at Statement 8, T1 requests a lock on A.
This situation creates a deadlock, as both transactions are waiting for resources held by the
other, preventing either from proceeding with their execution.
2. Starvation
Starvation is also possible if concurrency control manager is badly designed. For example: A
transaction may be waiting for an X-lock on an item, while a sequence of other transactions
request and are granted an S-lock on the same item. This may be avoided if the concurrency
control manager is properly designed.

37. Explain about Time-based Protocols


Timestamp-based concurrency control is a method used in database systems to ensure that
transactions are executed safely and consistently without conflicts, even when multiple
transactions are being processed simultaneously. This approach relies on timestamps to
manage and coordinate the execution order of transactions. Refer to the timestamp of a
transaction T as TS(T).
What is Timestamp Ordering Protocol?
The Timestamp Ordering Protocol is a method used in database systems to order
transactions based on their timestamps. A timestamp is a unique identifier assigned to each
transaction, typically determined using the system clock or a logical counter. Transactions are
executed in the ascending order of their timestamps, ensuring that older transactions get
higher priority.
For example:
• If Transaction T1 enters the system first, it gets a timestamp TS(T1) = 007 (assumption).
• If Transaction T2 enters after T1, it gets a timestamp TS(T2) = 009 (assumption).
This means T1 is “older” than T2 and T1 should execute before T2 to maintain consistency.
Key Features of Timestamp Ordering Protocol:
Transaction Priority:
• Older transactions (those with smaller timestamps) are given higher priority.
• For example, if transaction T1 has a timestamp of 007 times and transaction T2 has a
timestamp of 009 times, T1 will execute first as it entered the system earlier.
Early Conflict Management:
• Unlike lock-based protocols, which manage conflicts during execution, timestamp-based
protocols start managing conflicts as soon as a transaction is created.
Ensuring Serializability:
• The protocol ensures that the schedule of transactions is serializable. This means the
transactions can be executed in an order that is logically equivalent to their timestamp order.
Basic Timestamp Ordering
Precedence Graph for TS ordering
The Basic Timestamp Ordering (TO) Protocol is a method in database systems that uses
timestamps to manage the order of transactions. Each transaction is assigned a unique
timestamp when it enters the system ensuring that all operations follow a specific order
making the schedule conflict-serializable and deadlock-free.
• Suppose, if an old transaction Ti has timestamp TS(Ti), a new transaction Tj is assigned
timestamp TS(Tj) such that TS(Ti) < TS(Tj).
• The protocol manages concurrent execution such that the timestamps determine the
serializability order.
• The timestamp ordering protocol ensures that any conflicting read and write operations are
executed in timestamp order.
• Whenever some Transaction T tries to issue a R_item(X) or a W_item(X), the Basic TO
algorithm compares the timestamp of T with R_TS(X) & W_TS(X) to ensure that the
Timestamp order is not violated.
This describes the Basic TO protocol in the following two cases:
Whenever a Transaction T issues a W_item(X) operation, check the following conditions:
• If R_TS(X) > TS(T) and if W_TS(X) > TS(T), then abort and rollback T and reject the operation.
else,
• Execute W_item(X) operation of T and set W_TS(X) to TS(T) to the larger of TS(T) and current
W_TS(X).
Whenever a Transaction T issues a R_item(X) operation, check the following conditions:
• If W_TS(X) > TS(T), then abort and reject T and reject the operation, else
• If W_TS(X) <= TS(T), then execute the R_item(X) operation of T and set R_TS(X) to the larger
of TS(T) and current R_TS(X).
Whenever the Basic TO algorithm detects two conflicting operations that occur in an
incorrect order, it rejects the latter of the two operations by aborting the Transaction that
issued it.
Advantages of Basic TO Protocol
• Conflict Serializable: Ensures all conflicting operations follow the timestamp order.
• Deadlock-Free: Transactions do not wait for resources, preventing deadlocks.
• Strict Ordering: Operations are executed in a predefined, conflict-free order based on
timestamps.
Drawbacks of Basic Timestamp Ordering (TO) Protocol
• Cascading Rollbacks : If a transaction is aborted, all dependent transactions must also be
aborted, leading to inefficiency.
• Starvation of Newer Transactions : Older transactions are prioritized, which can delay or
starve newer transactions.
• High Overhead: Maintaining and updating timestamps for every data item adds significant
system overhead.
• Inefficient for High Concurrency: The strict ordering can reduce throughput in systems with
many concurrent transactions.
Strict Timestamp Ordering
The Strict Timestamp Ordering Protocol is an enhanced version of the Basic Timestamp
Ordering Protocol. It ensures a stricter control over the execution of transactions to avoid
cascading rollbacks and maintain a more consistent schedule.
Key Features
• Strict Execution Order: Transactions must execute in the exact order of their timestamps.
Operations are delayed if executing them would violate the timestamp order, ensuring a
strict schedule.
• No Cascading Rollbacks: To avoid cascading aborts, a transaction must delay its operations
until all conflicting operations of older transactions are either committed or aborted.
• Consistency and Serializability: The protocol ensures conflict-serializable schedules by
following strict ordering rules based on transaction timestamps.
For Read Operations (R_item(X)):
• A transaction T can read a data item X only if: W_TS(X), the timestamp of the last transaction
that wrote to X, is less than or equal to TS(T), the timestamp of T and the transaction that
last wrote to X has committed.
• If these conditions are not met, T’s read operation is delayed until they are satisfied.
For Write Operations (W_item(X)):
• A transaction T can write to a data item X only if: R_TS(X), the timestamp of the last
transaction that read X, and W_TS(X), the timestamp of the last transaction that wrote to X,
are both less than or equal to TS(T) and all transactions that previously read or wrote X have
committed.
• If these conditions are not met, T’s write operation is delayed until all conflicting transactions
are resolved.

38. Explain the concept of serializability


What is a serializable schedule, and what is it used for?
If a non-serial schedule can be transformed into its corresponding serial schedule, it is said to
be serializable. Simply said, a non-serial schedule is referred to as a serializable schedule if it
yields the same results as a serial timetable.
Non-serial Schedule
A schedule where the transactions are overlapping or switching places. As they are used to
carry out actual database operations, multiple transactions are running at once. It's possible
that these transactions are focusing on the same data set. Therefore, it is crucial that non-
serial schedules can be serialized in order for our database to be consistent both before and
after the transactions are executed.
Example:

Transaction-2
Transaction-1

R(a)

W(a)

R(b)

W(b)

R(b)

R(a)
Transaction-2
Transaction-1

W(b)

W(a)

We can observe that Transaction-2 begins its execution before Transaction-1 is finished, and
they are both working on the same data, i.e., "a" and "b", interchangeably. Where "R"-Read,
"W"-Write
Serializability testing
We can utilize the Serialization Graph or Precedence Graph to examine a schedule's
serializability. A schedule's full transactions are organized into a Directed Graph, what a
serialization graph is.

Precedence
Graph
It can be described as a Graph G(V, E) with vertices V = "V1, V2, V3,..., Vn" and directed edges
E = "E1, E2, E3,..., En". One of the two operations—READ or WRITE—performed by a certain
transaction is contained in the collection of edges. Where Ti -> Tj, means Transaction-Ti is
either performing read or write before the transaction-Tj.
Types of Serializability
There are two ways to check whether any non-serial schedule is serializable.

Types
of Serializability - Conflict & View
1. Conflict serializability
Conflict serializability refers to a subset of serializability that focuses on maintaining the
consistency of a database while ensuring that identical data items are executed in an order.
In a DBMS each transaction has a value and all the transactions, in the database rely on this
uniqueness. This uniqueness ensures that no two operations with the conflict value can
occur simultaneously.
For example lets consider an order table and a customer table as two instances. Each order is
associated with one customer even though a single client may place orders. However there
are restrictions for achieving conflict serializability in the database. Here are a few of them.
1. Different transactions should be used for the two procedures.
2. The identical data item should be present in both transactions.
3. Between the two operations, there should be at least one write operation.
Example
Three transactions—t1, t2, and t3—are active on a schedule "S" at once. Let's create a graph
of precedence.

Transaction - 3 (t3)
Transaction - 1 (t1) Transaction - 2 (t2)

R(a)

R(b)

R(b)

W(b)

W(a)

W(a)

R(a)

W(a)

It is a conflict serializable schedule as well as a serial schedule because the graph (a DAG) has
no loops. We can also determine the order of transactions because it is a serial schedule.
DAG of
transactions
As there is no incoming edge on Transaction 1, Transaction 1 will be executed first. T3 will run
second because it only depends on T1. Due to its dependence on both T1 and T3, t2 will
finally be executed.
Therefore, the serial schedule's equivalent order is: t1 --> t3 --> t2
Note: A schedule is unquestionably consistent if it is conflicting serializable. A non-conflicting
serializable schedule, on the other hand, might or might not be serial. We employ the idea of
View Serializability to further examine its serial behavior.
2. View Serializability
View serializability is a kind of operation in a serializable in which each transaction should
provide some results, and these outcomes are the output of properly sequentially executing
the data item. The view serializability, in contrast to conflict serialized, is concerned with
avoiding database inconsistency. The view serializability feature of DBMS enables users to
see databases in contradictory ways.
To further understand view serializability in DBMS, we need to understand the schedules S1
and S2. The two transactions T1 and T2 should be used to establish these two schedules.
Each schedule must follow the three transactions in order to retain the equivalent of the
transaction. These three circumstances are listed below.
1. The first prerequisite is that the same kind of transaction appears on every schedule. This
requirement means that the same kind of group of transactions cannot appear on both
schedules S1 and S2. The schedules are not equal to one another if one schedule commits a
transaction but it does not match the transaction of the other schedule.
2. The second requirement is that different read or write operations should not be used in
either schedule. On the other hand, we say that two schedules are not similar if schedule S1
has two write operations whereas schedule S2 only has one. The number of the write
operation must be the same in both schedules, however there is no issue if the number of
the read operation is different.
3. The second to last requirement is that there should not be a conflict between either
timetable. execution order for a single data item. Assume, for instance, that schedule S1's
transaction is T1, and schedule S2's transaction is T2. The data item A is written by both the
transaction T1 and the transaction T2. The schedules are not equal in this instance. However,
we referred to the schedule as equivalent to one another if it had the same number of all
write operations in the data item.
What is view equivalency?
Schedules (S1 and S2) must satisfy these two requirements in order to be viewed as
equivalent:
1. The same piece of data must be read for the first time. For instance, if transaction t1 is
reading "A" from the database in schedule S1, then t1 must also read A in schedule S2.
2. The same piece of data must be used for the final write. As an illustration, if transaction t1
updated A last in S1, it should also conduct final write in S2.
3. The middle sequence need to follow suit. As an illustration, if in S1 t1 is reading A, and t2
updates A, then in S2 t1 should read A, and t2 should update A.
View Serializability refers to the process of determining whether a schedule's views are
equivalent.
Example
We have a schedule "S" with two concurrently running transactions, "t1" and "t2."
Schedule - S:

Transaction-2 (t2)
Transaction-1 (t1)

R(a)

W(a)

R(a)

W(a)

R(b)

W(b)

R(b)

W(b)

By switching between both transactions' mid-read-write operations, let's create its view
equivalent schedule (S').
Schedule - S':

Transaction-2 (t2)
Transaction-1 (t1)

R(a)

W(a)
Transaction-2 (t2)
Transaction-1 (t1)

R(b)

W(b)

R(a)

W(a)

R(b)

W(b)

It is a view serializable schedule since a view similar schedule is conceivable.


Note: A conflict serializable schedule is always viewed as serializable, but vice versa is not
always true.
Advantages of Serializability
1. Execution is predictable: In serializable, the DBMS's threads are all performed
simultaneously. The DBMS doesn't include any such surprises. In DBMS, no data loss or
corruption occurs and all variables are updated as intended.
2. DBMS executes each thread independently, making it much simpler to understand and
troubleshoot each database thread. This can greatly simplify the debugging process. The
concurrent process is therefore not a concern for us.
3. Lower Costs: The cost of the hardware required for the efficient operation of the database
can be decreased with the aid of the serializable property. It may also lower the price of
developing the software.
4. Increased Performance: Since serializable executions provide developers the opportunity to
optimize their code for performance, they occasionally outperform non-serializable
equivalents.

39. What are the Types of Serializability


Types of Serializability
There are two ways to check whether any non-serial schedule is serializable.
Types
of Serializability - Conflict & View
1. Conflict serializability
Conflict serializability refers to a subset of serializability that focuses on maintaining the
consistency of a database while ensuring that identical data items are executed in an order.
In a DBMS each transaction has a value and all the transactions, in the database rely on this
uniqueness. This uniqueness ensures that no two operations with the conflict value can
occur simultaneously.
For example lets consider an order table and a customer table as two instances. Each order is
associated with one customer even though a single client may place orders. However there
are restrictions for achieving conflict serializability in the database. Here are a few of them.
4. Different transactions should be used for the two procedures.
5. The identical data item should be present in both transactions.
6. Between the two operations, there should be at least one write operation.
Example
Three transactions—t1, t2, and t3—are active on a schedule "S" at once. Let's create a graph
of precedence.

Transaction - 3 (t3)
Transaction - 1 (t1) Transaction - 2 (t2)

R(a)

R(b)

R(b)

W(b)

W(a)

W(a)
Transaction - 3 (t3)
Transaction - 1 (t1) Transaction - 2 (t2)

R(a)

W(a)

It is a conflict serializable schedule as well as a serial schedule because the graph (a DAG) has
no loops. We can also determine the order of transactions because it is a serial schedule.

DAG of
transactions
As there is no incoming edge on Transaction 1, Transaction 1 will be executed first. T3 will run
second because it only depends on T1. Due to its dependence on both T1 and T3, t2 will
finally be executed.
Therefore, the serial schedule's equivalent order is: t1 --> t3 --> t2
Note: A schedule is unquestionably consistent if it is conflicting serializable. A non-conflicting
serializable schedule, on the other hand, might or might not be serial. We employ the idea of
View Serializability to further examine its serial behavior.
2. View Serializability
View serializability is a kind of operation in a serializable in which each transaction should
provide some results, and these outcomes are the output of properly sequentially executing
the data item. The view serializability, in contrast to conflict serialized, is concerned with
avoiding database inconsistency. The view serializability feature of DBMS enables users to
see databases in contradictory ways.
To further understand view serializability in DBMS, we need to understand the schedules S1
and S2. The two transactions T1 and T2 should be used to establish these two schedules.
Each schedule must follow the three transactions in order to retain the equivalent of the
transaction. These three circumstances are listed below.
4. The first prerequisite is that the same kind of transaction appears on every schedule. This
requirement means that the same kind of group of transactions cannot appear on both
schedules S1 and S2. The schedules are not equal to one another if one schedule commits a
transaction but it does not match the transaction of the other schedule.
5. The second requirement is that different read or write operations should not be used in
either schedule. On the other hand, we say that two schedules are not similar if schedule S1
has two write operations whereas schedule S2 only has one. The number of the write
operation must be the same in both schedules, however there is no issue if the number of
the read operation is different.
6. The second to last requirement is that there should not be a conflict between either
timetable. execution order for a single data item. Assume, for instance, that schedule S1's
transaction is T1, and schedule S2's transaction is T2. The data item A is written by both the
transaction T1 and the transaction T2. The schedules are not equal in this instance. However,
we referred to the schedule as equivalent to one another if it had the same number of all
write operations in the data item.
What is view equivalency?
Schedules (S1 and S2) must satisfy these two requirements in order to be viewed as
equivalent:
4. The same piece of data must be read for the first time. For instance, if transaction t1 is
reading "A" from the database in schedule S1, then t1 must also read A in schedule S2.
5. The same piece of data must be used for the final write. As an illustration, if transaction t1
updated A last in S1, it should also conduct final write in S2.
6. The middle sequence need to follow suit. As an illustration, if in S1 t1 is reading A, and t2
updates A, then in S2 t1 should read A, and t2 should update A.
View Serializability refers to the process of determining whether a schedule's views are
equivalent.
Example
We have a schedule "S" with two concurrently running transactions, "t1" and "t2."
Schedule - S:

Transaction-2 (t2)
Transaction-1 (t1)

R(a)

W(a)

R(a)

W(a)

R(b)

W(b)

R(b)

W(b)
By switching between both transactions' mid-read-write operations, let's create its view
equivalent schedule (S').
Schedule - S':

Transaction-2 (t2)
Transaction-1 (t1)

R(a)

W(a)

R(b)

W(b)

R(a)

W(a)

R(b)

W(b)

It is a view serializable schedule since a view similar schedule is conceivable.


Note: A conflict serializable schedule is always viewed as serializable, but vice versa is not
always true.
Advantages of Serializability
5. Execution is predictable: In serializable, the DBMS's threads are all performed
simultaneously. The DBMS doesn't include any such surprises. In DBMS, no data loss or
corruption occurs and all variables are updated as intended.
6. DBMS executes each thread independently, making it much simpler to understand and
troubleshoot each database thread. This can greatly simplify the debugging process. The
concurrent process is therefore not a concern for us.
7. Lower Costs: The cost of the hardware required for the efficient operation of the database
can be decreased with the aid of the serializable property. It may also lower the price of
developing the software.
8. Increased Performance: Since serializable executions provide developers the opportunity to
optimize their code for performance, they occasionally outperform non-serializable
equivalents.

40. Explain about Serializability testing


Serializability testing is the process of verifying whether a schedule (the order in which
database operations are executed in concurrent transactions) is serializable. In other words,
serializability testing ensures that the outcome of concurrent transactions is equivalent to
the outcome of some serial execution of those transactions. This is crucial because
serializability guarantees the correctness of a database system when it handles multiple
transactions simultaneously.
What is Serializability?
In the context of databases, serializability refers to the property that the final result of
executing a series of transactions concurrently is the same as if the transactions had been
executed one after the other (serial execution) without any overlap.
• A serial schedule means that transactions are executed one by one, with no interleaving.
This is straightforward and ensures consistency.
• A non-serial schedule allows transactions to interleave, meaning operations from different
transactions can be executed in parallel, but the final result should still be equivalent to
some serial execution.
Serializability is the highest level of transaction isolation in ACID properties (Atomicity,
Consistency, Isolation, Durability).
Types of Serializability
1. Conflict Serializability:
o A schedule is conflict-serializable if it can be transformed into a serial schedule by
swapping non-conflicting operations.
o Conflict: Two operations are in conflict if they access the same data item and at least
one of them is a write operation.
o If a schedule is conflict-serializable, it guarantees serializability, but not every
serializable schedule is conflict-serializable.
2. View Serializability:
o A schedule is view-serializable if the final result of executing the schedule is the
same as the final result of executing some serial schedule.
o View equivalence: This means that for each data item, the same transaction reads
the same value in both the schedule and the serial schedule. View serializability is a
broader concept than conflict serializability but is harder to enforce in practice.

Serializability Testing
Serializability testing involves checking whether a given schedule of transactions is
serializable, i.e., whether it can be rearranged into an equivalent serial schedule without
violating any data consistency or integrity rules.
Steps to Test Serializability
1. Conflict Graph Method (Precedence Graph Method): The most common method for testing
conflict serializability is to use a precedence graph (also called a conflict graph or
serializability graph). This method works as follows:
o Step 1: Build the Precedence Graph
▪ Create a directed graph where each node represents a transaction.
▪ Draw a directed edge from transaction T1 to transaction T2 if T1 conflicts
with T2. A conflict occurs if:
▪ Both transactions access the same data item.
▪ At least one of the transactions is a write operation.
▪ The direction of the edge indicates the order of execution. If T1 writes a data
item and T2 reads or writes the same data item later, draw an edge from T1
to T2.
o Step 2: Check for Cycles
▪ If the graph contains any cycles, the schedule is not conflict serializable
because cycles indicate conflicting transactions that cannot be reordered
into a serial schedule.
▪ If the graph is acyclic, the schedule is conflict serializable and can be
transformed into a serial schedule by following the topological order of the
transactions in the graph.
Example of a conflict graph:
o Suppose we have two transactions, T1 and T2, with the following operations:
▪ T1: Write(A)
▪ T2: Read(A)
▪ T1: Write(B)
▪ T2: Write(A)
o We would create a graph with nodes for T1 and T2, and draw directed edges based
on conflicts:
▪ There's a conflict between T1: Write(A) and T2: Read(A) (T1 → T2).
▪ There's a conflict between T1: Write(B) and T2: Write(A) (T1 → T2).
o If there are no cycles in the graph, the schedule is conflict-serializable.
2. Serializable Schedule Definition via Transaction Graphs:
o Transaction graphs can also be used to model schedules. These graphs represent the
transaction dependencies (i.e., which transactions must wait for others).
o A schedule is serializable if there exists a serial execution that respects these
dependencies.
3. Lock-based Serializability Testing:
o Lock-based concurrency control methods, like Two-Phase Locking (2PL), can be used
to test serializability by observing whether a schedule conforms to the rules of
locking and whether the transactions can be reordered without violating isolation.
o In lock-based methods, the database ensures serializability by acquiring locks for
each operation (read or write). The success of the lock acquisition process can be
checked to verify if the transactions are serializable.
4. Serialization Graph Method (also known as a Serializable Precedence Graph):
o This method builds a directed graph to model the dependency between
transactions. The graph is built based on read and write operations, and edges are
added between transactions that have conflicts (e.g., one writes and another reads
or writes the same data).
o After constructing the graph, a topological sort is performed. If the sort produces a
cycle, the schedule is not serializable; otherwise, the schedule is serializable.

Example of Serializability Testing: Precedence Graph


Consider a schedule of three transactions with the following operations:
Operation Transaction Data Item

Write T1 A
Operation Transaction Data Item

Read T2 A

Write T2 B

Write T1 B

1. Conflict analysis:
o T1 and T2 both access A, and T1 writes it while T2 reads it. Therefore, there is a
conflict, and we add a directed edge from T1 to T2 (T1 → T2).
o T1 and T2 both access B, and T1 writes it while T2 writes it too. There is a conflict,
and we add another directed edge from T1 to T2 (T1 → T2).
2. Precedence graph:
o Nodes: T1, T2
o Directed edges: T1 → T2 (for both A and B).
3. Checking for cycles:
o There is no cycle in the graph. Therefore, the schedule is conflict serializable.
4. Result:
o The schedule is conflict serializable, and we can find an equivalent serial schedule by
following the topological order (in this case, T1 → T2).

41. Write the objectives and Types of file organization


Objectives of File Organization
File organization refers to the way data is stored in files in a computer system. The primary
objective of file organization is to enable efficient storage, retrieval, and modification of data.
Good file organization ensures that data is stored in a way that optimizes access speed, data
integrity, and storage efficiency. The following are the main objectives of file organization:
1. Efficient Data Retrieval:
o Ensure that data can be retrieved quickly and with minimal processing time, whether
for reading or searching operations.
2. Optimized Storage Utilization:
o Organize files to make the most efficient use of available storage space, minimizing
waste and ensuring that files are not fragmented.
3. Data Integrity and Consistency:
o Organize files in such a way that they can maintain their integrity, preventing data
corruption and ensuring that information is stored correctly.
4. Minimize Access Time:
o Reduce the time needed to access data in files, especially when dealing with large
amounts of data or databases. This is particularly important for systems requiring
fast data retrieval.
5. Efficient Updating and Insertion:
o Make sure the system supports efficient data insertion and updating operations,
preventing the system from becoming slow when the file grows over time.
6. Concurrency Control:
o Organize files in a way that allows multiple users or processes to access or modify
files concurrently without conflicting or compromising data consistency.
7. Flexibility and Scalability:
o Design the file organization such that it can easily adapt to changing requirements
and data volume growth, allowing scalability without significant performance
degradation.
8. Security and Access Control:
o Ensure that the file organization includes mechanisms to protect data and allow only
authorized users to access or modify certain portions of the data.

Types of File Organization


There are various types of file organization techniques, each suited to different use cases
depending on the operations (read, write, search) that need to be performed efficiently. The
following are the most common types of file organization:
1. Sequential File Organization
• Description: In sequential file organization, records are stored in a sequential order (either
ascending or descending) based on a key field (e.g., alphabetically or numerically).
• Use Case: This is ideal for applications where records are mostly accessed in a sequential
manner.
• Advantages:
o Efficient for reading records in order.
o Simple to implement.
o Efficient for large volumes of data where access patterns are primarily sequential.
• Disadvantages:
o Slow for searching and updating records (since linear search is required).
o Insertion of new records can be inefficient, as records must be re-ordered to
maintain the sequence.
2. Direct (Hash) File Organization
• Description: Direct file organization uses a hash function to calculate the location (address)
of records based on a key field. This method maps each record to a specific location using a
hash algorithm.
• Use Case: Ideal for scenarios where fast retrieval of records based on specific keys is
required.
• Advantages:
o Very fast access times, particularly for search operations.
o Efficient for direct access to records.
• Disadvantages:
o Collisions can occur when multiple records hash to the same location, which needs
to be handled using techniques like chaining or open addressing.
o Inefficient for range-based queries or sequential access.
3. Indexed File Organization
• Description: Indexed file organization maintains an index that maps keys to the location of
records. The index helps to quickly locate records based on key values, making it faster than
sequential search.
• Use Case: Ideal for applications where fast retrieval of records is needed, and there are a lot
of searches based on a key.
• Advantages:
o Faster than sequential file organization for search operations.
o Supports efficient access to records through the index.
• Disadvantages:
o Requires extra storage space for the index.
o Insertions and deletions can be slower, as the index must also be updated.
4. Clustered File Organization
• Description: In clustered file organization, related records are stored together in the same
physical block or location. This is usually done based on a logical grouping of records that are
often accessed together.
• Use Case: Best suited for applications where related records are frequently accessed
together, reducing the I/O overhead when reading records.
• Advantages:
o Reduces disk I/O by storing related records together.
o Improves performance for queries that need to access related data.
• Disadvantages:
o Complex to manage and update the clusters.
o May lead to unused or wasted space in the storage system.
5. B-Tree (Balanced Tree) File Organization
• Description: B-trees are a type of self-balancing tree structure used for indexing, which
maintains sorted data and allows searches, insertions, deletions, and updates in logarithmic
time.
• Use Case: Ideal for systems where quick search, insertion, and deletion of records are
needed while maintaining a sorted order of records.
• Advantages:
o Efficient for large databases with frequent updates.
o Provides fast search and retrieval.
o Supports both range queries and equality queries efficiently.
• Disadvantages:
o More complex to implement and maintain compared to other file organization
methods.
o Requires additional disk I/O for balancing the tree structure.
6. Heap File Organization
• Description: In heap file organization, records are stored in random order, with no specific
sequence. New records are simply added to the next available space.
• Use Case: Suitable for applications with low-volume transactions and where records are
frequently added without much concern for their order.
• Advantages:
o Simple and easy to implement.
o Suitable for applications where records are added in an arbitrary order.
• Disadvantages:
o Slow retrieval times, as the entire file may need to be scanned to find a specific
record.
o Not efficient for queries requiring sorted data.
7. Multi-Level Index File Organization
• Description: This is an extension of the indexed file organization, where multiple levels of
indexes are maintained. The first index points to the second-level index, which points to the
actual records.
• Use Case: Used in large databases where the size of the index itself becomes large, requiring
multiple levels of indexing to manage efficiently.
• Advantages:
o Can handle large amounts of data and support fast lookups even with very large
indexes.
• Disadvantages:
o More complex to manage and requires additional resources for maintaining multiple
levels of indexes.
8. File Organization with Compression
• Description: In this type of file organization, data is compressed to save storage space, and
records are stored in a compressed format.
• Use Case: Best for scenarios where storage space is at a premium, such as archival systems
or cloud storage.
• Advantages:
o Saves storage space by compressing data.
o Can improve I/O performance if the disk system is slower than the CPU, as less data
needs to be read from disk.
• Disadvantages:
o Additional processing overhead for compression and decompression.
o Can slow down read operations if not implemented efficiently.

42. Explain about Sequential File Organization


Sequential File Organization
The easiest method for file Organization is the Sequential method. In this method, the file is
stored one after another in a sequential manner. There are two ways to implement this
method:
1. Pile File Method
This method is quite simple, in which we store the records in a sequence i.e. one after the
other in the order in which they are inserted into the tables.

Pile File Method


Insertion of the new record: Let the R1, R3, and so on up to R5 and R4 be four records in the
sequence. Here, records are nothing but a row in any table. Suppose a new record R2 has to
be inserted in the sequence, then it is simply placed at the end of the file.

New Record Insertion


2. Sorted File Method
In this method, As the name itself suggests whenever a new record has to be inserted, it is
always inserted in a sorted (ascending or descending) manner. The sorting of records may be
based on any primary key or any other key.

Sorted File Method


Insertion of the new record: Let us assume that there is a preexisting sorted sequence of
four records R1, R3, and so on up to R7 and R8. Suppose a new record R2 has to be inserted
in the sequence, then it will be inserted at the end of the file and then it will sort the
sequence.

new Record Insertion


Advantages of Sequential File Organization
• Fast and efficient method for huge amounts of data.
• Simple design.
• Files can be easily stored inmagnetic tapes i.e. cheaper storage mechanism.
Disadvantages of Sequential File Organization
• Time wastage as we cannot jump on a particular record that is required, but we have to
move in a sequential manner which takes our time.
• The sorted file method is inefficient as it takes time and space for sorting records.

43. Explain about Heap file organization


Heap File Organization works with data blocks. In this method, records are inserted at the
end of the file, into the data blocks. No Sorting or Ordering is required in this method. If a
data block is full, the new record is stored in some other block, Here the other data block
need not be the very next data block, but it can be any block in the memory. It is the
responsibility of DBMS to store and manage the new records.

Heap File Organization


Insertion of the new record: Suppose we have four records in the heap R1, R5, R6, R4, and
R3, and suppose a new record R2 has to be inserted in the heap then, since the last data
block i.e data block 3 is full it will be inserted in any of the data blocks selected by the DBMS,
let’s say data block 1.

New Record Insertion


If we want to search, delete or update data in the heap file Organization we will traverse the
data from the beginning of the file till we get the requested record. Thus if the database is
very huge, searching, deleting, or updating the record will take a lot of time.
Advantages of Heap File Organization
• Fetching and retrieving records is faster than sequential records but only in the case of small
databases.
• When there is a huge number of data that needs to be loaded into thedatabase at a time,
then this method of file Organization is best suited.
Disadvantages of Heap File Organization
• The problem of unused memory blocks.
• Inefficient for larger databases.

44. Explain about Hash File Organization


Hash File Organization
• Data bucket – Data buckets are the memory locations where the records are stored. These
buckets are also considered Units of Storage.
• Hash Function – The hash function is a mapping function that maps all the sets of search
keys to the actual record address. Generally, the hash function uses the primary key to
generate the hash index – the address of the data block. The hash function can be a simple
mathematical function to any complex mathematical function.
• Hash Index-The prefix of an entire hash value is taken as a hash index. Every hash index has a
depth value to signify how many bits are used for computing a hash function. These bits can
address 2n buckets. When all these bits are consumed? then the depth value is increased
linearly and twice the buckets are allocated.
Static Hashing
In static hashing, when a search-key value is provided, the hash function always computes
the same address. For example, if we want to generate an address for STUDENT_ID = 104
using a mod (5) hash function, it always results in the same bucket address 4. There will not
be any changes to the bucket address here. Hence a number of data buckets in the memory
for this static hashing remain constant throughout.
Operations:
• Insertion – When a new record is inserted into the table, The hash function h generates a
bucket address for the new record based on its hash key K. Bucket address = h(K)
• Searching – When a record needs to be searched, The same hash function is used to retrieve
the bucket address for the record. For Example, if we want to retrieve the whole record for
ID 104, and if the hash function is mod (5) on that ID, the bucket address generated would be
4. Then we will directly got to address 4 and retrieve the whole record for ID 104. Here ID
acts as a hash key.
• Deletion – If we want to delete a record, Using the hash function we will first fetch the
record which is supposed to be deleted. Then we will remove the records for that address in
memory.
• Updation – The data record that needs to be updated is first searched using the hash
function, and then the data record is updated.
Now, If we want to insert some new records into the file But the data bucket address
generated by the hash function is not empty or the data already exists in that address. This
becomes a critical situation to handle. This situation is static hashing is called bucket
overflow. How will we insert data in this case? There are several methods provided to
overcome this situation.
Some commonly used methods are discussed below:
• Open Hashing – In the Open hashing method, the next available data block is used to enter
the new record, instead of overwriting the older one. This method is also called linear
probing. For example, D3 is a new record that needs to be inserted, the hash function
generates the address as 105. But it is already full. So the system searches the next available
data bucket, 123, and assigns D3 to it.
Open Hashing
Closed hashing – In the Closed hashing method, a new data bucket is allocated with the
same address and is linked to it after the full data bucket. This method is also known as
overflow chaining. For example, we have to insert a new record D3 into the tables. The static
hash function generates the data bucket address as 105. But this bucket is full to store the
new data. In this case, a new data bucket is added at the end of the 105 data bucket and is
linked to it. The new record D3 is inserted into the new bucket.

Closed Hashing
• Quadratic probing: Quadratic probing is very much similar to open hashing or linear probing.
Here, The only difference between old and new buckets is linear. The quadratic function is
used to determine the new bucket address.
• Double Hashing: Double Hashing is another method similar to linear probing. Here the
difference is fixed as in linear probing, but this fixed difference is calculated by using another
hash function. That’s why the name is double hashing.
Dynamic Hashing
The drawback of static hashing is that it does not expand or shrink dynamically as the size of
the database grows or shrinks. In Dynamic hashing, data buckets grow or shrink (added or
removed dynamically) as the records increase or decrease. Dynamic hashing is also known
as extended hashing. In dynamic hashing, the hash function is made to produce a large
number of values. For Example, there are three data records D1, D2, and D3. The hash
function generates three addresses 1001, 0101, and 1010 respectively. This method of
storing considers only part of this address – especially only the first bit to store the data. So it
tries to load three of them at addresses 0 and 1.
dynamic hashing
But the problem is that No bucket address is remaining for D3. The bucket has to grow
dynamically to accommodate D3. So it changes the address to have 2 bits rather than 1 bit,
and then it updates the existing data to have a 2-bit address. Then it tries to accommodate
D3.

dynamic hashing

45. Explain about Indexed sequential access method (ISAM) File Organization
Indexed sequential access method also known as ISAM method, is an upgrade to the
conventional sequential file organization method. You can say that it is an advanced version
of sequential file organization method. In this method, primary key of the record is stored
with an address, this address is mapped to an address of a data block in memory. This
address field works as an index of the file.
In this method, reading and fetching a record is done using the index of the file. Index field
contains the address of a data record in memory, which can be quickly used to read and
fetch the record from memory.

Advantages of ISAM
1. Searching a record is faster in ISAM file organization compared to other file organization
methods as the primary key can be used to identify the record and since primary key also has
the address of the record, it can read and fetch the data from memory.
2. This method is more flexible compared to other methods as this allows to generate the
index field (address field) for any column of the record. This makes searching easier and
efficient as searches can be done using multiple column fields.
3. This allows range retrieval of the records since the address file is stored with the primary key
of the record, we can retrieve the record based on a certain range of primary key columns.
4. This method allow partial searches as well. For example, employee name starting with “St”
can be used to search all the employees with the name starting with letters “St”. This will
result all the records where employee name begins with the letters “St”.
Disadvantages of ISAM
1. Requires additional space in the memory to store the index field.
2. After adding a record to the file, the file needs to be re-organized to maintain the sequence
based on primary key column.
3. Requires memory cleanup because when a record is deleted, the space used by the record
needs to be released in order to be used by the other record.
4. Performance issues are there if there are frequent deletion of records, as every deletion
needs a memory cleanup and optimization.

46. What is Indexing in DBMS


Indexing is a technique for improving database performance by reducing the number of disk
accesses necessary when a query is run. An index is a form of data structure. It’s used to
swiftly identify and access data and information present in a database table.
Structure of Index
We can create indices using some columns of the database.

• The search key is the database’s first column, and it contains a duplicate or copy of the
table’s candidate key or primary key. The primary key values are saved in sorted order so that
the related data can be quickly accessible.
• The data reference is the database’s second column. It contains a group of pointers that
point to the disk block where the value of a specific key can be found.
Methods of Indexing
Ordered Indices
To make searching easier and faster, the indices are frequently arranged/sorted. Ordered
indices are indices that have been sorted.
Example
Let’s say we have a table of employees with thousands of records, each of which is ten bytes
large. If their IDs begin with 1, 2, 3,…, etc., and we are looking for the student with ID-543:
• We must search the disk block from the beginning till it reaches 543 in the case of a DB
without an index. After reading 543*10=5430 bytes, the DBMS will read the record.
• We will perform the search using indices in the case of an index, and the DBMS would read
the record after it reads 542*2 = 1084 bytes, which is significantly less than the prior
example.
Primary Index
• Primary indexing refers to the process of creating an index based on the table’s primary key.
These primary keys are specific to each record and establish a 1:1 relationship between
them.
• The searching operation is fairly efficient because primary keys are stored in sorted order.
• There are two types of primary indexes: dense indexes and sparse indexes.
Dense Index
Every search key value in the data file has an index record in the dense index. It speeds up
the search process. The total number of records present in the index table and the main
table are the same in this case. It requires extra space to hold the index record. A pointer to
the actual record on the disk and the search key are both included in the index records.

Sparse Index
Only a few items in the data file have index records. Each and every item points to a certain
block. Rather than pointing to each item in the main database, the index, in this case, points
to the records that are present in the main table that is in a gap.
Clustering Index
• An ordered data file can be defined as a clustered index. Non-primary key columns, which
may or may not be unique for each record, are sometimes used to build indices.
• In this situation, we’ll join two or more columns to acquire the unique value and generate an
index out of them to make it easier to find the record. A clustering index is a name for this
method.
• Records with comparable properties are grouped together, and indices for these groups are
constructed.
Example
Assume that each department in a corporation has numerous employees. Assume we utilise
a clustering index, in which all employees with the same Dept_ID are grouped together into a
single cluster, and index pointers refer to the cluster as a whole. Dept_Id is a non-unique key
in this case.

Because one disk block is shared by records from various clusters, the previous structure is a
little unclear. It is referred to as a better strategy when we employ distinct disk blocks for
separate clusters.
Secondary Index
When using sparse indexing, the size of the mapping grows in sync with the size of the table.
These mappings are frequently stored in primary memory to speed up address fetching. The
secondary memory then searches the actual data using the address obtained through
mapping. Fetching the address becomes slower as the mapping size increases. The sparse
index will be ineffective in this scenario, so secondary indexing is used to solve this problem.
Another level of indexing is introduced in secondary indexing to reduce the size of the
mapping. The massive range for the columns is chosen first in this method, resulting in a
small mapping size at the first level. Each range is then subdivided into smaller groups.
Because the first level’s mapping is kept in primary memory, fetching the addresses is faster.
The second-level mapping, as well as the actual data, are kept in secondary memory (or hard
disk).

Example
• In case we want to find the record for roll 111 in the diagram, it will look for the highest item
in the first level index that is equal to or smaller than 111. At this point, it will get a score of
100.
• Then it does max (111) <= 111 in the second index level and obtains 110. Using address 110,
it now navigates to the data block and begins searching through each record until it finds
111.
• In this method, a search is carried out in this manner. In the same way, you can insert,
update, or delete data.

47. Explain about Ordered indices, Primary Index


Ordered Indices
To make searching easier and faster, the indices are frequently arranged/sorted. Ordered
indices are indices that have been sorted.
Example
Let’s say we have a table of employees with thousands of records, each of which is ten bytes
large. If their IDs begin with 1, 2, 3,…, etc., and we are looking for the student with ID-543:
• We must search the disk block from the beginning till it reaches 543 in the case of a DB
without an index. After reading 543*10=5430 bytes, the DBMS will read the record.
• We will perform the search using indices in the case of an index, and the DBMS would read
the record after it reads 542*2 = 1084 bytes, which is significantly less than the prior
example.
Primary Index
• Primary indexing refers to the process of creating an index based on the table’s primary key.
These primary keys are specific to each record and establish a 1:1 relationship between
them.
• The searching operation is fairly efficient because primary keys are stored in sorted order.
• There are two types of primary indexes: dense indexes and sparse indexes.
Dense Index
Every search key value in the data file has an index record in the dense index. It speeds up
the search process. The total number of records present in the index table and the main
table are the same in this case. It requires extra space to hold the index record. A pointer to
the actual record on the disk and the search key are both included in the index records.

48. Explain about Clustering Index and secondary Index


Clustering Index
• An ordered data file can be defined as a clustered index. Non-primary key columns, which
may or may not be unique for each record, are sometimes used to build indices.
• In this situation, we’ll join two or more columns to acquire the unique value and generate an
index out of them to make it easier to find the record. A clustering index is a name for this
method.
• Records with comparable properties are grouped together, and indices for these groups are
constructed.
Example
Assume that each department in a corporation has numerous employees. Assume we utilise
a clustering index, in which all employees with the same Dept_ID are grouped together into a
single cluster, and index pointers refer to the cluster as a whole. Dept_Id is a non-unique key
in this case.

Because one disk block is shared by records from various clusters, the previous structure is a
little unclear. It is referred to as a better strategy when we employ distinct disk blocks for
separate clusters.

Secondary Index
When using sparse indexing, the size of the mapping grows in sync with the size of the table.
These mappings are frequently stored in primary memory to speed up address fetching. The
secondary memory then searches the actual data using the address obtained through
mapping. Fetching the address becomes slower as the mapping size increases. The sparse
index will be ineffective in this scenario, so secondary indexing is used to solve this problem.
Another level of indexing is introduced in secondary indexing to reduce the size of the
mapping. The massive range for the columns is chosen first in this method, resulting in a
small mapping size at the first level. Each range is then subdivided into smaller groups.
Because the first level’s mapping is kept in primary memory, fetching the addresses is faster.
The second-level mapping, as well as the actual data, are kept in secondary memory (or hard
disk).

Example
• In case we want to find the record for roll 111 in the diagram, it will look for the highest item
in the first level index that is equal to or smaller than 111. At this point, it will get a score of
100.
• Then it does max (111) <= 111 in the second index level and obtains 110. Using address 110,
it now navigates to the data block and begins searching through each record until it finds
111.
• In this method, a search is carried out in this manner. In the same way, you can insert,
update, or delete data.

49. Explain about B trees with suitable example

A B-Tree is a specialized m-way tree designed to optimize data access, especially on disk-based
storage systems.

• In a B-Tree of order m, each node can have up to m children and m-1 keys, allowing it to
efficiently manage large datasets.

• The value of m is decided based on disk block and key sizes.

• One of the standout features of a B-Tree is its ability to store a significant number of keys
within a single node, including large key values. It significantly reduces the tree’s height,
hence reducing costly disk operations.

• B Trees allow faster data retrieval and updates, making them an ideal choice for systems
requiring efficient and scalable data management. By maintaining a balanced structure at all
times,

• B-Trees deliver consistent and efficient performance for critical operations such as search,
insertion, and deletion.
Following is an example of a B-Tree of order 5 .

Properties of a B-Tree

A B Tree of order m can be defined as an m-way search tree which satisfies the following
properties:

1. All leaf nodes of a B tree are at the same level, i.e. they have the same depth (height of the
tree).

2. The keys of each node of a B tree (in case of multiple keys), should be stored in the
ascending order.

3. In a B tree, all non-leaf nodes (except root node) should have at least m/2 children.

4. All nodes (except root node) should have at least m/2 - 1 keys.

5. If the root node is a leaf node (only node in the tree), then it will have no children and will
have at least one key. If the root node is a non-leaf node, then it will have at least 2
children and at least one key.

6. A non-leaf node with n-1 key values should have n non NULL children.

50. Explain about B+ trees with suitable example

You might also like