Key Characteristics of DBMS Explained
Key Characteristics of DBMS Explained
4. Data Consistency
Since data redundancy is minimized, data consistency is automatically improved.
This means that whenever data is updated in one place, all references to that data show the
updated value.
• Example: If a student's phone number changes, it needs to be updated only once.
5. Data Security
DBMS provides strong security mechanisms to protect sensitive data.
• Only authorized users can access or modify the database.
• Security is maintained using authentication, authorization, roles, and permissions.
• Sensitive data like salaries or health records can be protected using encryption.
8. ACID Properties
DBMS follows four critical properties to ensure reliable transaction processing, known as
ACID:
• Atomicity – A transaction is either fully completed or fully failed.
• Consistency – Transactions take the database from one valid state to another.
• Isolation – Transactions are independent of each other.
• Durability – Once a transaction is committed, it remains so, even after a crash.
These properties ensure safe and correct database operations.
Functions of a DBMS:
Function Description
DBMS ensures that rules and constraints (like valid data ranges,
Data Integrity unique keys) are followed.
Concurrent DBMS handles multiple users accessing data at the same time
Access without conflicts (Concurrency control).
Function Description
Backup and DBMS provides tools to create data backups and recover from
Recovery system failures.
Types of DBMS:
• Relational DBMS (RDBMS) — Data in tables (e.g., MySQL, PostgreSQL).
• Hierarchical DBMS — Data in tree structure.
• Network DBMS — Data in graph structure.
• Object-Oriented DBMS (OODBMS) — Data as objects.
Example: MySQL, Oracle, SQL Server, MongoDB.
2. Data Modeling
Data Modeling is the design phase of building a database.
It is the process of developing a model that describes:
• What data will be stored,
• How data elements are related,
• What rules and standards (naming, data types) will be followed.
Good data modeling ensures high data quality by maintaining:
• Consistency in naming conventions,
• Default values for fields,
• Clear semantics (clear meaning behind data),
• Strong security rules.
Data models can be visualized using diagrams like Entity-Relationship Diagrams (ERD),
which show entities, attributes, and relationships.
Example:
Modeling a University Database with Entities like Student, Courses, Professors, and their
Relationships.
3. Database Programming
Once data is modeled and the DBMS is set up, database programming is required to interact
with the database efficiently.
There are three major approaches to programming with databases:
➔ a) Navigational Approach
• In this approach, programmers navigate through records using pointers.
• It provides maximum control over how data is accessed and processed.
• Complex and difficult to program.
Example: Used in Hierarchical and Network databases where you manually follow links
between records.
➔ c) Object-Oriented Approach
• In this method, data is treated as objects, just like in object-oriented programming
languages (like Java or C++).
• Objects contain both data and methods.
• Very useful for handling complex, real-world data like images, audio, videos.
Example:
An object Student with attributes like Name and Age and methods like enrollCourse() and
updateProfile().
Relationship Link between two or more entities (e.g., Student ENROLLS in Course).
1. Hierarchical Model
➔ Structure:
• Data is organized in a tree-like structure.
• Parent-Child relationship.
• Each child has only one parent.
➔ Example:
➔ Features:
• Fast data retrieval when hierarchy is simple.
• Difficult to handle many-to-many relationships.
• Poor flexibility.
2. Network Model
➔ Structure:
• Data is organized like a graph.
• Many-to-many relationships are allowed.
• Records are connected through pointers.
➔ Example:
➔ Features:
• Flexible than hierarchical.
• Complex to maintain.
• Faster access for connected data.
4. Object-Oriented Model
➔ Structure:
• Data stored as objects.
• Objects contain both data (fields) and methods (functions).
➔ Example:
➔ Features:
• Good for complex systems (CAD, multimedia).
• Supports inheritance, encapsulation, polymorphism.
• 2-Tier Architecture
• 3-Tier Architecture
1-Tier Architecture
• In a 1-tier architecture, the database is directly accessible to the user.
• This means the user interacts with the DBMS directly on the same machine.
• Any modifications made by the user are applied directly to the database itself.
• This architecture isn't considered user-friendly for typical end-users.
• The 1-tier model is primarily used for local application development, where developers need
direct and immediate database access for quick development and testing.
2-Tier Architecture
• The 2-tier architecture closely resembles the basic client-server model.
• In this setup, client-side applications communicate directly with the database residing on the
server.
• This communication is facilitated by APIs (Application Programming Interfaces) such as ODBC
(Open Database Connectivity) and JDBC (Java Database Connectivity).
• The client side handles the user interface and application programs.
• The server is responsible for core database functionalities, including query processing and
transaction management.
• To enable communication, the client-side application establishes a connection with the
server.
3-Tier Architecture
• The 3-tier architecture introduces an intermediate layer between the client and the server.
• In this architecture, the client does not directly communicate with the database server.
• Instead, the client-side application interacts with an application server, which then
communicates with the database system.
• This design provides a level of abstraction, where the end-user is unaware of the database's
existence beyond the application server.
• Similarly, the database has no direct knowledge of users beyond the application server.
• The 3-tier architecture is commonly employed in large-scale web applications.
independence.
Types of Data Independence
There are two types of data independence.
• logical data independence
• Physical data independence
Logical Data Independence
• Changing the logical schema (conceptual level) without changing the external schema (view
level) is called logical data independence.
• It is used to keep the external schema separate from the logical schema.
• If we make any changes at the conceptual level of data, it does not affect the view level.
• This happens at the user interface level.
• For example, it is possible to add or delete new entities, attributes to the conceptual schema
without making any changes to the external schema.
Physical Data Independence
• Making changes to the physical schema without changing the logical schema is called
physical data independence.
• If we change the storage size of the database system server, it will not affect the conceptual
structure of the database.
• It is used to keep the conceptual level separate from the internal level.
• This happens at the logical interface level.
• Example – Changing the location of the database from C drive to D drive.
Difference Between Physical and Logical Data Independence
Logical Data Independence
Physical Data Independence
To make changes at the physical level To make changes at the logical level,
we generally do not require changes we need to make changes at the
at the application program level. application level.
It tells about the internal schema. It tells about the conceptual schema.
Conclusion
The data independence property of the database is an expected property that relies on
separating the logical and physical aspects of storing and accessing data. This means that it is
easy to make structural modifications to the database without affecting the applications that
use it. This is a situation that impacts the capacity of the organization to remain adaptable in
the dynamic business environment, as well as making sure that the technological
advancements within the organization are interoperable over a long period of time.
Entity
Entity Set
An entity set is a collection of similar types of entities that share the same attributes.
For example: All students of a school are a entity set of Student entities.
Key Terminologies used in Entity Set:
• Attributes: Attributes are the houses or traits of an entity. They describe the data that may
be connected with an entity.
• Entity Type: A category or class of entities that share the same attributes is referred to as an
entity kind.
• Entity Instance: An entity example is a particular incidence or character entity within an
entity type. Each entity instance has a unique identity, often known as the number one key.
• Primary Key: A primary key is a unique identifier for every entity instance inside an entity
kind.
It can be classified into two types:
Strong Entity Set
Strong entity sets exist independently and each instance of a strong entity set has a unique
primary key.
Example of Strong Entity includes:
• Car Registration Number
• Model
• Name etc.
Strong Entity
Weak Entity
Kinds of Entities
There are two types of Entities:
Tangible Entity
• A tangible entity is a physical object or a physical thing that can be physically touched, seen
or measured.
• It has a physical existence or can be seen directly.
• Examples of tangible entities are physical goods or physical products (for example, "inventory
items" in an inventory database) or people (for example, customers or employees).
Intangible Entity
• Intangible entities are abstract or conceptual objects that are not physically present but have
meaning in the database.
• They are typically defined by attributes or properties that are not directly visible.
• Examples of intangible entities include concepts or categories (such as “Product Categories”
or “Service Types”) and events or occurrences (such as appointments or transactions).
Entity Types in DBMS
• Strong Entity Types: These are entities that exist independently and have a completely
unique identifier.
• Weak Entity Types: These entities depend on another entity for his or her lifestyles and do
now not have a completely unique identifier on their own.
The Example of Strong and Weak Entity Types in DMBS is:
Example
• Associative Entity Types: These constitute relationships between or greater entities and
might have attributes in their own.
• Derived Entity Types: These entities are derived from different entities through a system or
calculation.
• Multi-Valued Entity Types: These entities will have more than one value for an characteristic.
Conclusion
In a database management system (DBMS), entities are the fundamental components that
represent the objects or concepts that exist in the real world. They are represented by
attributes, the primary key, and they can be either strong or weak. Together with
relationships, entities play an important role in structured data management and database
design.
Simple Attribute
[Link] Attribute
An attribute that can be split into components is a composite attribute.
Example: The address can be further split into house number, street number, city, state,
country, and pin code, the name can also be split into first name middle name, and last
name.
Composite Attribute
3. Single-Valued Attribute
The attribute which takes up only a single value for each entity instance is a single-valued
attribute.
Example: The age of a student, Aadhar card number.
Single-Valued
4. Multi-Valued Attribute
The attribute which takes up more than a single value for each entity instance is a multi-
valued attribute. And it is represented by double oval shape.
Example: Phone number of a student: Landline and mobile.
Multi-valued
5. Stored Attribute
The stored attribute are those attribute which doesn’t require any type of further update
since they are stored in the database.
Example: DOB(Date of birth) is the stored attribute.
Stored-attribute
6. Derived Attribute
An attribute that can be derived from other attributes is derived attributes. And it is
represented by dotted oval shape.
Example: Total and average marks of a student, age of an employee that is derived from date
of birth.
Derived-attribute
7. Complex Attribute
Those attributes, which can be formed by the nesting of composite and multi-valued
attributes, are called “Complex Attributes“. These attributes are rarely used in
DBMS(DataBase Management System). That’s why they are not so popular.
Example: Address because address contain composite value like street, city, state, PIN code
and also multivalued because one people has more that one house address.
Complex-attribute
Representation
Complex attributes are the nesting of two or more composite and multi-valued attributes.
Therefore, these multi-valued and composite attributes are called ‘Components’ of complex
attributes.
These components are grouped between parentheses ‘( )’ and multi-valued attributes
between curly braces ‘{ }’, Components are separated by commas ‘, ‘.
For example: let us consider a person having multiple phone numbers, emails, and an
address.
Here, phone number and email are examples of multi-valued attributes and address is an
example of the composite attribute, because it can be divided into house number, street,
city, and state.
Complex attributes
Components
Email, Phone number, Address(All are separated by commas and multi-valued components
are represented between curly braces).
Complex Attribute: Address_EmPhone(You can choose any name).
8. Key attribute
Key attributes are those attributes that can uniquely identify the entity in the entity set.
Example: Roll-No is the key attribute because it can uniquely identify the student.
9. Null Attribute
This attribute can take NULL value when entity does not have value for it.
Example –The ‘Net Banking Active Bin’ attribute gives weather particular customer having
net banking facility activated or not activated.
For bank which does not offer facility of net banking in customer table ‘Net Banking Active
Bin’ attribute is always null till Net banking facility is not activated as this attribute indicates
Bank offers net banking facility or does not offers.
10. Descriptive Attribute
Descriptive attribute give information about the relationship set example given below. Here
Start Date is the descriptive attribute of Manages relationship.
Descriptive-Attribute
Many-to-Many Relationship
A many-to-many relationship is relationship in which one multiple records in one table are
associated with multiple records in another table. This relationship is mainly implemented
using junction table.
Example:
Consider two entities "Student" and "Course" where each student can enroll in multiple
courses and each course can have multiple students enrolled in it.
Self-Referencing Relationships
A self-referencing relationship is also known as recursive relationship and it is useful is cases
when a table has relationship with itself. It is used for representing hierarchical data.
Example:
An "Employee" entity where each employee can have manager who is also an employee.
Key Terms
• Attribute: Attributes are the properties that define an entity. e.g. ROLL_NO, NAME,
ADDRESS.
• Relation Schema: A relation schema defines the structure of the relation and represents the
name of the relation with its attributes. e.g. STUDENT (ROLL_NO, NAME, ADDRESS, PHONE,
and AGE) is the relation schema for STUDENT. If a schema has more than 1 relation it is called
Relational Schema.
• Tuple: Each row in the relation is known as a tuple. The above relation contains 4 tuples one
of which is shown as:
12. Define and explain about Domains, Attributes, Tuples, and Relations
Domain :
• Data is modeled by using atomic values as the basis for the domain. In the relational model,
atomic values refer to the number of values in a domain that are indivisible. First Name is a
set of character strings that represent the names of people in the domain.
• In a database, a domain is a column that contains a data type. Data types can be built-in
(such as integers or strings) or custom types that define constraints on the data themselves.
• A SQL Domain is a set of valid values that can be named by the user. Name of the Domain’s
set of values that must belong to (for character string types). This is the name of the
domain’s default Collation.
Example :
In a table, a domain is a set of values that can be used to attribute an attribute. The domain
of a month can accept January, February, etc. A domain of integers can accept whole
numbers that are negative, positive, and zero in December.
Tuple :
Tuples are one of the most used items in Database Management Systems (or DBMS). A Tuple
in DBMS is just a row having inter-related data about a particular entity(it can be any object).
• This data is spread across some columns having various attributes such as name, age, gender,
marks, etc. It should be noted that Tuples are mostly seen in Relational Databases
Management Systems(RDBMS) as RDBMS works on the relational model (Tabular format).
What Is Tuple In DBMS?
In Database Management System (DBMS), most of the time we need to store the data in
tabular format . This kind of data storage model is also called a Relational model and the
system which leverages the relational model is called Relational Database Management
System (RDBMS). These relations (or tables) consist of rows and columns. But in DBMS, we
call these rows “Tuples” and a row “Tuple”.
Let us see Tuple in DBMS in detail. Let us understand this with the help of a real-life example.
Example Of Single Record Or Tuple
Consider the table given below. We have data of some students like their id, name, age, etc.
here, each row has almost all the information of the respective student. Like the first row has
all the information about a student named “Sufiyan”, similarly, all other rows contain
information about other students. Hence, a single row is also termed a “record” as it
contains all the information of a student. This row or record is termed as Tuple in DBMS.
Hence Tuple in DBMS is just a row representing some inter-related data of a particular entity
such as student, employee, user, etc.
Table for reference:
Based on the number of entity types that are connected we have the following degree of
relationships:
• Unary
• Binary
• Ternary
• N-ary
Unary (degree 1)
A unary relationship exists when both the participating entity type are the same. When such
a relationship is present we say that the degree of relationship is 1.
Binary (degree 2)
A binary relationship exists when exactly two entity type participates. When such a
relationship is present we say that the degree is 2. This is the most common degree of
relationship. It is easy to deal with such relationship as these can be easily converted into
relational tables.
Ternary(degree 3)
A ternary relationship exists when exactly three entity type participates. When such a
relationship is present we say that the degree is 3. As the number of entity increases in the
relationship, it becomes complex to convert them into relational tables.
N-ary (n degree)
An N-ary relationship exists when ’n’ number of entities are participating. So, any number of
entities can participate in a relationship. There is no limitation to the maximum number of
entities that can participate. But, relations with a higher degree are not common. This is
because the conversion of higher degree relations to relational tables gets complex. We are
making an E-R model because it can be easily be converted into any other model for
implementing the database. But, this benefit is not available if we use higher degree
relations. So, binary relations are more popular and widely used. Though we can make a
relationship with any number of entity types but we don’t do that.
We represent an N-ary relationship as follows:
Cardinality :
In the view of databases, cardinality refers to the uniqueness of data values that are
contained in a column. High cardinality is nothing but the column contains a large
percentage of totally unique values. Low cardinality is nothing but the column which has a
lot of “repeats” in its data range.
Cardinality between the tables can be of type one-to-one, many-to-one or many-to-many.
Mapping Cardinality
It is expressed as the number of entities to which another entity can be associated via a
relationship set.
For binary relationship set there are entity set A and B then the mapping cardinality can be
one of the following −
• One-to-one
• One-to-many
• Many-to-one
• Many-to-many
One-to-one relationship
One entity of A is associated with one entity of B.
Example
Given below is an example of the one-to-one relationship in the mapping cardinality. Here,
one department has one head of the department (HOD).
One-to-many relationship
An entity set A is associated with any number of entities in B with a possibility of zero and
entity in B is associated with at most one entity in A
Example
Given below is an example of the one-to-many relationship in the mapping cardinality. Here,
one department has many faculties.
Many-to-one relationship
An entity set A is associated with at most one entity in B and an entity set in B can be
associated with any number of entities in A with a possibility of zero.
Example
Given below is an example of the many-to-one relationship in the mapping cardinality. Here,
many faculties work in one department.
Many-to-many relationship
Many entities of A are associated with many entities of B.
An entity in A is associated with many entities of B and an entity in B is associated with many
entities of A.
Many to many=many to one + one to many
Example
Given below is an example of the many-to-many relationship in the mapping cardinality.
Here, many employees work on many projects.
Schema
Schema is of three types: Logical Schema, Physical Schema and view Schema.
• Logical Schema – It describes the database designed at a logical level.
• Physical Schema – It describes the database designed at the physical level.
• View Schema – It defines the design of the database at the view level.
Example:
Let’s say a table teacher in our database named school, the teacher table requires the name,
dob, and doj in their table so we design a structure as:
Teacher table
name: String
doj: date
dob: date
Advantages of Schema
• Consistency: Guarantees proper storage of data in order to allow easy access and
expandability.
• Structure: Helps in easy arrangement of the data base in an organized manner and hence
makes it easy to comprehend.
• Data Integrity: Puts in place restrictions that ensure the data’s maintaining of its accuracy
and subsequent reliability.
Disadvantages of Schema
• Rigidity: Schemas, defined, may be rigid for alteration, and may take a huge amount of effort
in order to alter the scheme.
• Complexity: Developing a schema may be difficult or time consuming in case of large
databases.
What is Instance?
An instance of DBMS refers to real data in a database coming at some particular point in
time. Instance on the other hand refers to the content in the database in as much as it refers
to the structure defined under a particular schema at a given point.
Example
Let’s say a table teacher in our database whose name is School, suppose the table has 50
records so the instance of the database has 50 records for now and tomorrow we are going
to add another fifty records so tomorrow the instance has a total of 100 records. This is
called an instance.
Advantages of Instance
• Real-Time Representation: It a return of the data in the database at a certain point in time
as may be required for analysis or for performing operations.
• Flexibility: While a schema remains fixed in time, instances can be quite volatile, as data is
written, updated, or deleted.
Disadvantages of Instance
• Volatility: Those are occurrences may be dynamic in a way they are different over time and
this may make it a challenge to keep track without the necessary intervention.
• Data Integrity Issues: If not well regulated, it is evident that the data in an instance could
become very inconsistent and at times even incorrect.
Difference Between Schema and Instance
Instance
Schema
The schema is same for the whole Data in instances can be changed using
database. addition, deletion, and updation.
Affects the entire database Affects only the current state of data.
structure.
Easily altered by
Requires significant effort and performing CRUD (Create, Read, Update,
planning to change. Delete) operations.
Instance
Schema
PHONE
STUD_NO SNAME ADDRESS
PHONE
STUD_NO SNAME ADDRESS
• The candidate key can be simple (having only one attribute) or composite as well.
Example:
{STUD_NO, COURSE_NO} is a composite
candidate key for relation STUDENT_COURSE.
Table STUDENT_COURSE
COURSE_NO
STUD_NO TEACHER_NO
1 001 C001
2 056 C005
Primary Key
There can be more than one candidate key in relation out of which one can be chosen as the
primary key. For Example, STUD_NO, as well as STUD_PHONE, are candidate keys for relation
STUDENT but STUD_NO can be chosen as the primary key (only one out of many candidate
keys).
• A primary key is a unique key, meaning it can uniquely identify each record (tuple) in a table.
• It must have unique values and cannot contain any duplicate values.
• A primary key cannot be NULL, as it needs to provide a valid, unique identifier for every
record.
• A primary key does not have to consist of a single column. In some cases, a composite
primary key (made of multiple columns) can be used to uniquely identify records in a table.
• Databases typically store rows ordered in memory according to primary key for fast access of
records using primary key.
Example:
STUDENT table -> Student(STUD_NO, SNAME, ADDRESS, PHONE) , STUD_NO is a primary key
Table STUDENT
PHONE
STUD_NO SNAME ADDRESS
Alternate Key
An alternate key is any candidate key in a table that is not chosen as the primary key. In
other words, all the keys that are not selected as the primary key are considered alternate
keys.
• An alternate key is also referred to as a secondary key because it can uniquely identify
records in a table, just like the primary key.
• An alternate key can consist of one or more columns (fields) that can uniquely identify a
record, but it is not the primary key
• Eg:- SNAME, and ADDRESS is Alternate keys
Example:
Consider the table shown above.
STUD_NO, as well as PHONE both,
are candidate keys for relation STUDENT but
PHONE will be an alternate key
(only one out of many candidate keys).
COURSE_NO
STUD_NO TEACHER_NO
1 005 C001
2 056 C005
It may be worth noting that, unlike the Primary Key of any given relation, Foreign Key can be
NULL as well as may contain duplicate tuples i.e. it need not follow uniqueness
constraint. For Example, STUD_NO in the STUDENT_COURSE relation is not unique. It has
been repeated for the first and third tuples. However, the STUD_NO in STUDENT relation is a
primary key and it needs to be always unique, and it cannot be null.
15. Write about Relational Algebra Operations from Set Theory used in SQL
Relational algebra is a formal system for manipulating and querying relations (tables) in a
relational database. It operates on sets and uses set theory principles to define operations
on these relations. The operations in relational algebra are typically used to form complex
queries and are foundational to SQL. Here’s a breakdown of the common relational algebra
operations derived from set theory:
1. Selection (σ)
• Operation: Selects rows from a relation that satisfy a given predicate (condition).
• SQL Equivalent: SELECT ... WHERE ...
• Set Theory Equivalent: The selection operation is equivalent to the set intersection, where
only the elements (tuples) that meet the specified condition are retained.
• Example:
o Relational Algebra: σ_{age > 30}(Employees)
o SQL: SELECT * FROM Employees WHERE age > 30;
o This operation retrieves all employees who are older than 30.
2. Projection (π)
• Operation: Extracts certain columns (attributes) from a relation, effectively removing
duplicates and producing a result that only includes the specified attributes.
• SQL Equivalent: SELECT ...
• Set Theory Equivalent: The projection operation is akin to a set’s projection in mathematics,
where only the specified dimensions of the tuples are retained.
• Example:
o Relational Algebra: π_{name, age}(Employees)
o SQL: SELECT name, age FROM Employees;
o This retrieves only the name and age columns from the Employees relation.
3. Union (∪)
• Operation: Combines the results of two relations, eliminating duplicates.
• SQL Equivalent: UNION
• Set Theory Equivalent: The union operation in set theory combines two sets to produce a
new set that contains all distinct elements from both sets.
• Example:
o Relational Algebra: Students ∪ Faculty
o SQL: SELECT * FROM Students UNION SELECT * FROM Faculty;
o This combines the Students and Faculty tables, removing any duplicates.
4. Difference (−)
• Operation: Returns the set of tuples that are in one relation but not in another.
• SQL Equivalent: EXCEPT or NOT IN
• Set Theory Equivalent: The difference operation is similar to subtracting one set from
another in set theory.
• Example:
o Relational Algebra: Employees − Managers
o SQL: SELECT * FROM Employees WHERE id NOT IN (SELECT id FROM Managers);
o This retrieves all employees who are not managers.
5. Cartesian Product (×)
• Operation: Combines every tuple of one relation with every tuple of another relation,
producing a new relation with all possible pairs of tuples.
• SQL Equivalent: JOIN (without a condition, or using CROSS JOIN)
• Set Theory Equivalent: The Cartesian product of two sets contains all ordered pairs from
both sets.
• Example:
o Relational Algebra: Employees × Departments
o SQL: SELECT * FROM Employees CROSS JOIN Departments;
o This produces a combination of every employee with every department.
6. Rename (ρ)
• Operation: Changes the name of a relation or the names of its attributes.
• SQL Equivalent: AS
• Set Theory Equivalent: This operation is akin to changing the labels (or identifiers) of sets or
their elements.
• Example:
o Relational Algebra: ρ_{E}(Employees)
o SQL: SELECT * FROM Employees AS E;
o This renames the Employees table to E for use in further operations.
7. Join (⨝)
• Operation: Combines tuples from two relations based on a common attribute or condition.
• SQL Equivalent: JOIN (typically INNER JOIN, LEFT JOIN, etc.)
• Set Theory Equivalent: Join is a special case of the Cartesian product, but with a condition
that pairs only matching tuples.
• Example:
o Relational Algebra: Employees ⨝ Departments
o SQL: SELECT * FROM Employees JOIN Departments ON Employees.department_id =
[Link];
o This performs an inner join between Employees and Departments on the
department_id field.
8. Intersection (∩)
• Operation: Returns the set of tuples that are present in both relations.
• SQL Equivalent: INTERSECT
• Set Theory Equivalent: This is equivalent to the set intersection in mathematics, which
returns common elements from both sets.
• Example:
o Relational Algebra: Employees ∩ Managers
o SQL: SELECT * FROM Employees INTERSECT SELECT * FROM Managers;
o This retrieves employees who are also managers.
9. Division (÷)
• Operation: Used when you want to find tuples in one relation that match every tuple in
another relation. This operation is often used in queries like "find employees who work in all
departments."
• SQL Equivalent: There is no direct SQL equivalent, but it can be done using GROUP BY and
HAVING clauses.
• Set Theory Equivalent: Division is analogous to finding a subset of elements that meet a set
of conditions.
• Example:
o Relational Algebra: Employees ÷ Departments
o This operation would return employees who work in all departments.
17. Explain about selection, projection, cross product operators used in SQL
1. Selection (σ)
• Relational Algebra: The selection operator (σ) selects rows (tuples) from a relation (table)
that satisfy a given condition or predicate. It is used to filter rows based on specific criteria.
• SQL Equivalent: The WHERE clause in SQL performs the selection operation. It filters rows
based on a condition, similar to how selection works in relational algebra.
• Purpose: To retrieve specific rows from a table that meet a particular condition or criteria.
• Example:
o Relational Algebra: σ_{age > 30}(Employees)
▪ This retrieves all employees who are older than 30.
o SQL:
Retrieves Returns
specific selected
SELECT columns
Projection π (pi) columns
clause (attributes)
from a
table from a table
StudentCourse Table
Let’s look at the example of INNER JOIN clause, and understand it’s working. This query will
show the names and age of students enrolled in different courses.
Query:
SELECT StudentCourse.COURSE_ID, [Link], [Link] FROM Student
INNER JOIN StudentCourse
ON Student.ROLL_NO = StudentCourse.ROLL_NO;
Output
2. SQL LEFT JOIN
A LEFT JOIN returns all rows from the left table, along with matching rows from the right
table. If there is no match, NULL values are returned for columns from the right table. LEFT
JOIN is also known as LEFT OUTER JOIN.
Syntax
SELECT table1.column1,table1.column2,table2.column1,….
FROM table1
LEFT JOIN table2
ON table1.matching_column = table2.matching_column;
Note: We can also use LEFT OUTER JOIN instead of LEFT JOIN, both are the same.
Syntax
SELECT table1.column1,table1.column2,table2.column1,....
FROM table1
FULL JOIN table2
ON table1.matching_column = table2.matching_column;
Key Terms
• table1: First table.
• table2: Second table
• matching_column: Column common to both the tables.
FULL JOIN Example
This example demonstrates the use of a FULL JOIN, which combines the results of both LEFT
JOIN and RIGHT JOIN. The query retrieves all rows from
the Student and StudentCourse tables. If a record in one table does not have a matching
record in the other table, the result set will include that record with NULL values for the
missing fields
Query:
SELECT [Link],StudentCourse.COURSE_ID
FROM Student
FULL JOIN StudentCourse
ON StudentCourse.ROLL_NO = Student.ROLL_NO;
Output
COURSE_ID
NAME
HARSH 1
PRATIK 2
RIYANKA 2
DEEP 3
SAPTARHI 1
DHANRAJ NULL
ROHIT NULL
NIRAJ NULL
NULL 4
NULL 5
NULL 4
1 Ram 10
2 Jon 30
3 Bob 50
Department
Dept_id Dept_name
10 IT
30 HR
40 TIS
Dept_nam
Emp_i Emp_nam Dept_i Dept_i
e
d e d d
1 Ram 10 10 IT
2 Jon 30 30 HR
20. Explain how to converting the database specification in E/R notation to the relational schema
21. What is the Importance of a good schema design
A good schema design is crucial for the success of a relational database and its efficient
performance. The schema serves as the structure for the data stored in a database,
organizing it into tables, columns, and relationships. A well-designed schema can help ensure
that the database is flexible, efficient, easy to manage, and scalable. Here's why a good
schema design is important:
1. Data Integrity and Consistency
• Ensures Valid Data: A good schema enforces data integrity through constraints such as
primary keys, foreign keys, unique constraints, and check constraints. These constraints
ensure that the data is accurate, valid, and consistent across the database.
• Prevents Data Redundancy: By using techniques like normalization, a well-designed schema
reduces the chances of data duplication, which ensures that the data is consistent and avoids
unnecessary redundancy.
2. Improved Query Performance
• Efficient Data Retrieval: A good schema design can significantly improve the performance of
SQL queries. By structuring data in a way that makes logical sense and supports indexing and
joins, the schema allows for quicker and more efficient queries.
• Indexing and Search Optimization: Schema design enables the creation of indexes on
frequently queried columns, which can dramatically speed up data retrieval. For example,
indexing primary keys, foreign keys, or frequently queried fields helps the database engine
find data faster.
3. Scalability
• Handles Growth: A well-designed schema can scale efficiently as the volume of data grows.
Proper normalization and thoughtful organization of tables ensure that the database can
handle larger datasets without compromising performance.
• Supports Additional Features: As the database grows and more features are added, a well-
designed schema can be easily modified or extended to accommodate new data
requirements. For example, adding new tables or columns can be done without causing
major disruptions.
4. Flexibility and Maintainability
• Easy to Maintain: A good schema design is clear, logical, and easy to maintain. When data is
organized in a structured way with clear relationships between tables, it becomes easier for
developers and database administrators to manage and troubleshoot.
• Adaptability to Changes: A well-thought-out schema is flexible enough to accommodate
changes in business requirements or new types of data without requiring extensive changes
or causing data integrity issues.
5. Data Redundancy and Anomalies Prevention
• Normalization: One of the key aspects of a good schema is normalization, which organizes
data in such a way that it avoids unnecessary duplication (redundancy). This prevents issues
like:
o Update Anomalies: Where changes to a single piece of data might need to be made
in multiple places.
o Insert Anomalies: Where certain data cannot be inserted unless other irrelevant
data is also inserted.
o Delete Anomalies: Where deleting data in one place might unintentionally remove
necessary information.
6. Security and Access Control
• Fine-Grained Access Control: A good schema design can help implement security measures
by controlling access at the table or column level. Sensitive information (e.g., passwords,
credit card numbers) can be isolated in separate tables, and access permissions can be
restricted accordingly.
• Role-Based Permissions: A well-designed schema allows for the use of role-based access
control (RBAC), where different users or applications can be given specific permissions to
read, write, or modify data.
7. Ease of Data Integration
• Integrating with External Systems: A good schema design makes it easier to integrate the
database with other systems or third-party services. For instance, a standardized schema
that follows naming conventions and contains structured relationships allows easier data
export, import, and synchronization with external systems.
• Consistency in Data Models: A clear and well-organized schema facilitates the integration of
data from multiple sources, ensuring that data from different systems aligns and can be
merged seamlessly.
8. Reduced Data Duplication and Storage Overhead
• Minimizing Redundancy: When a schema is properly normalized, it minimizes the amount of
redundant data, leading to efficient storage use. Reducing redundancy means that the
database consumes less disk space, and the risk of inconsistent data is also reduced.
• Optimized Storage Management: A well-organized schema helps in optimizing storage
because it ensures that data is stored in the most efficient structure, avoiding wasted space
caused by redundant or unnecessary information.
9. Better Reporting and Data Analysis
• Supports BI Tools: A good schema facilitates data analysis and reporting. When the database
is well-structured, it’s easier for Business Intelligence (BI) tools or reporting software to
query and extract meaningful insights from the data.
• Clear Relationships for Reporting: Data that is organized with clear relationships (such as
using foreign keys and normalized tables) enables more accurate and efficient reporting. You
can more easily generate complex reports and insights by performing joins or aggregations
on well-structured data.
10. Avoiding Future Problems
• Fewer Future Modifications: A good schema anticipates future requirements, which means
that developers won't need to make constant changes to the schema. This reduces the risk of
creating structural problems as the database evolves.
• Reduced Risk of Data Corruption: Poorly designed schemas can lead to issues like data
corruption or inconsistent data when changes are made or when data is inserted incorrectly.
Best Practices for Good Schema Design
1. Normalization: Break down the data into related tables to eliminate redundancy and ensure
that the database is scalable and efficient.
2. Use of Primary and Foreign Keys: Ensure that primary keys uniquely identify records, and
foreign keys maintain relationships between tables.
3. Clear Naming Conventions: Use meaningful names for tables, columns, and constraints so
that the schema is easy to understand.
4. Consider Future Growth: Design the schema to accommodate future data growth and
changing business requirements.
5. Indexes for Performance: Use indexes on frequently queried columns to speed up search
and retrieval operations.
6. Data Types and Constraints: Choose appropriate data types for columns, and use constraints
like NOT NULL, UNIQUE, and CHECK to enforce data integrity.
22. What are the problems encountered with bad schema designs
A bad schema design can lead to a wide range of problems, some of which can have severe
consequences for both the performance and maintainability of the database. Here are some
of the key problems encountered with poor schema designs:
1. Data Redundancy
• Problem: Bad schema design often leads to data duplication. This redundancy occurs when
the same piece of data is stored in multiple places within the database.
• Consequences:
o Inconsistent data: When one copy of the data is updated and others are not, it can
lead to inconsistent or outdated information.
o Wasted storage: Duplicate data unnecessarily increases the size of the database and
consumes valuable storage space.
o Increased risk of errors: More data means more chances for mistakes during updates
or deletions, as multiple places need to be updated.
• Solution: Apply normalization techniques to break down large, redundant tables into smaller
ones and ensure data is stored only once.
2. Data Anomalies
• Problem: Bad schema design can lead to anomalies like update anomalies, insert
anomalies, and delete anomalies.
• Consequences:
o Update Anomalies: If data is repeated across tables, a change needs to be made in
multiple places, which can lead to inconsistency.
o Insert Anomalies: Certain data cannot be inserted unless other irrelevant data is also
inserted.
o Delete Anomalies: Deleting a record might unintentionally remove necessary data
due to improper relationships or missing foreign keys.
• Solution: Normalization and proper use of foreign keys can prevent these anomalies by
ensuring each piece of data is stored in the correct place.
3. Poor Query Performance
• Problem: A poorly designed schema can severely degrade query performance, especially
when the schema is not optimized for querying.
• Consequences:
o Slow queries: Lack of proper indexing and inefficient table structures can lead to
long-running queries that negatively impact application performance.
o Complex joins: Poorly designed schemas might require multiple, complex joins to
retrieve relevant data, increasing the computational load.
o Inefficient data retrieval: Data retrieval can become slow and inefficient if the
schema does not group related data together logically.
• Solution: Use indexes on frequently queried columns and organize the schema to minimize
the number of joins needed for common queries.
4. Difficult Maintenance and Scalability
• Problem: Bad schema designs can make maintaining and scaling the database difficult as the
application grows.
• Consequences:
o Difficult to modify: If a schema isn’t designed with future growth in mind, adding
new features or fields may require a complete overhaul of the database.
o Hard to scale: If the schema is not optimized for large volumes of data, performance
can degrade as the database grows. Queries may become slower, and the system
might not handle the increased load effectively.
o Complex modifications: As business requirements evolve, making changes to the
schema can become error-prone and time-consuming.
• Solution: Design the schema with scalability in mind, and ensure it’s flexible enough to
accommodate future changes without requiring major redesigns.
5. Inconsistent Data Integrity
• Problem: Without proper constraints (e.g., primary keys, foreign keys), a poorly designed
schema can result in data integrity issues.
• Consequences:
o Invalid data: If data integrity constraints aren’t enforced, invalid or incomplete data
can be inserted into the database, leading to data corruption.
o Broken relationships: If foreign keys aren’t properly defined, you can end up with
orphan records or invalid references.
• Solution: Implement primary keys, foreign keys, and other constraints (such as NOT NULL
and CHECK) to enforce data integrity.
6. Inflexibility for Reporting and Data Analysis
• Problem: A bad schema design can make it difficult to generate reports and perform data
analysis effectively.
• Consequences:
o Complex reporting: If the schema does not follow logical patterns, generating reports
may require complex queries, which can be time-consuming and error-prone.
o Difficulty in aggregating data: If data is not structured appropriately, it can be difficult
to aggregate and summarize data in a meaningful way.
• Solution: Use clear relationships between tables and ensure the schema is optimized for
data analysis, with proper normalization and indexing.
7. Security Issues
• Problem: A bad schema design can lead to security vulnerabilities.
• Consequences:
o Lack of access control: Without a good design, it may be difficult to implement role-
based access control or restrict access to sensitive data at the column or table level.
o Exposed sensitive data: If sensitive data (e.g., passwords, financial details) is not
stored properly (e.g., without encryption), it can be exposed to unauthorized users.
• Solution: Implement data encryption, access control, and proper data segmentation to
ensure that sensitive information is protected.
8. Overly Complex Schema
• Problem: A bad schema may be overly complex with too many tables, unnecessary
relationships, or too much normalization, making it hard to navigate and understand.
• Consequences:
o Difficult to understand: Developers and DBAs may find it hard to understand the
schema, leading to mistakes and inefficiencies.
o Increased development time: When the schema is too complex, it takes more time
to develop and maintain the application.
o Performance bottlenecks: Complex schemas with unnecessary relationships and
excessive joins can create performance issues.
• Solution: Keep the schema simple and intuitive, and apply normalization only to the level
required. Avoid overcomplicating the design with unnecessary relationships.
9. Data Duplication Across Tables
• Problem: Without careful design, the same data may be stored in multiple tables, creating
dependencies between unrelated tables.
• Consequences:
o Inconsistent updates: When data changes in one table but not in others, it creates
inconsistencies across the database.
o Increased storage requirements: Storing the same data in multiple places increases
the overall size of the database, leading to higher storage costs.
• Solution: Use foreign keys to relate tables and avoid data duplication, and make sure
normalization rules are followed to store data only once.
10. Poor User Experience
• Problem: If the database schema is inefficient or difficult to navigate, it can lead to poor
performance, slow applications, and a subpar user experience.
• Consequences:
o Slow-loading web pages or applications.
o Unresponsive queries that affect real-time interactions with the application.
• Solution: Ensure that the schema is well-optimized and can handle the load placed upon it
by the application, including using appropriate indexing and reducing unnecessary complex
queries.
dept_building
roll_no name dept_name
42 abc CO A4
43 pqr IT A3
44 xyz CO A4
45 xyz IT A3
46 mno EC B2
dept_building
roll_no name dept_name
47 jkl ME B2
From the above table we can conclude some valid functional dependencies:
• roll_no → { name, dept_name, dept_building }→ Here, roll_no can determine values of
fields name, dept_name and dept_building, hence a valid Functional dependency
• roll_no → dept_name , Since, roll_no can determine whole set of {name, dept_name,
dept_building}, it can determine its subset dept_name also.
• dept_name → dept_building , Dept_name can identify the dept_building accurately, since
departments with different dept_name will also have a different dept_building
• More valid functional dependencies: roll_no → name, {roll_no, name} ⇢ {dept_name,
dept_building}, etc.
Here are some invalid functional dependencies:
• name → dept_name Students with the same name can have different dept_name, hence
this is not a valid functional dependency.
• dept_building → dept_name There can be multiple departments in the same building.
Example, in the above table departments ME and EC are in the same building B2, hence
dept_building → dept_name is an invalid functional dependency.
• More invalid functional dependencies: name → roll_no, {name, dept_name} → roll_no,
dept_building → roll_no, etc.
Read more about What is Functional Dependency in DBMS ?
Types of Functional Dependencies in DBMS
1. Trivial functional dependency
2. Non-Trivial functional dependency
3. Multivalued functional dependency
4. Transitive functional dependency
1. Trivial Functional Dependency
In Trivial Functional Dependency, a dependent is always a subset of the determinant. i.e. If X
→ Y and Y is the subset of X, then it is called trivial functional dependency.
Symbolically: A→B is trivial functional dependency if B is a subset of A.
The following dependencies are also trivial: A→A & B→B
Example 1 :
• ABC -> AB
• ABC -> A
• ABC -> ABC
Example 2:
age
roll_no name
42 abc 17
43 pqr 18
age
roll_no name
44 xyz 18
Here, {roll_no, name} → name is a trivial functional dependency, since the dependent name
is a subset of determinant set {roll_no, name}. Similarly, roll_no → roll_no is also an example
of trivial functional dependency.
2. Non-trivial Functional Dependency
In Non-trivial functional dependency, the dependent is strictly not a subset of the
determinant. i.e. If X → Y and Y is not a subset of X, then it is called Non-trivial functional
dependency.
Example 1 :
• Id -> Name
• Name -> DOB
Example 2:
age
roll_no name
42 abc 17
43 pqr 18
44 xyz 18
Here, roll_no → name is a non-trivial functional dependency, since the dependent name is
not a subset of determinant roll_no. Similarly, {roll_no, name} → age is also a non-trivial
functional dependency, since age is not a subset of {roll_no, name}
3. Semi Non Trivial Functional Dependencies
A semi non-trivial functional dependency occurs when part of the dependent attribute (right-
hand side) is included in the determinant (left-hand side), but not all of it. This is a middle
ground between trivial and non-trivial functional dependencies. X -> Y is called semi non-
trivial when X intersect Y is not NULL.
Example:
Consider the following table:
Course_Name
Student_ID Course_ID
Functional Dependency:
{StudentID,CourseID}→CourseID
This is semi non-trivial because:
• Part of the dependent attribute (Course_ID) is already included in the determinant
({Student_ID, Course_ID}).
• However, the dependency is not completely trivial because {StudentID}→CourseID is not
implied directly.
4. Multivalued Functional Dependency
In Multivalued functional dependency, entities of the dependent set are not dependent on
each other. i.e. If a → {b, c} and there exists no functional dependency between b and c, then
it is called a multivalued functional dependency.
Example:
color
bike_model manuf_year
2007 Black
tu1001
2007 Red
tu1001
2008 Black
tu2012
2008 Red
tu2012
2009 Black
tu2222
2009 Red
tu2222
In this table:
• X: bike_model
• Y: color
• Z: manuf_year
For each bike model (bike_model):
1. There is a group of colors (color) and a group of manufacturing years (manuf_year).
2. The colors do not depend on the manufacturing year, and the manufacturing year does not
depend on the colors. They are independent.
3. The sets of color and manuf_year are linked only to bike_model.
That’s what makes it a multivalued dependency.
In this case these two columns are said to be multivalued dependent on bike_model. These
dependencies can be represented like this:
Read more about Multivalued Dependency in DBMS.
5. Transitive Functional Dependency
In transitive functional dependency, dependent is indirectly dependent on determinant. i.e.
If a → b & b → c, then according to axiom of transitivity, a → c. This is a transitive functional
dependency.
Example:
building_no
enrol_no name dept
42 abc CO 4
43 pqr EC 2
44 xyz IT 1
45 abc EC 2
Here, enrol_no → dept and dept → building_no. Hence, according to the axiom of
transitivity, enrol_no → building_no is a valid functional dependency. This is an indirect
functional dependency, hence called Transitive functional dependency.
6. Fully Functional Dependency
In full functional dependency an attribute or a set of attributes uniquely determines another
attribute or set of attributes. If a relation R has attributes X, Y, Z with the dependencies X->Y
and X->Z which states that those dependencies are fully functional.
Read more about Fully Functional Dependency.
7. Partial Functional Dependency
In partial functional dependency a non key attribute depends on a part of the composite key,
rather than the whole key. If a relation R has attributes X, Y, Z where X and Y are the
composite key and Z is non key attribute. Then X->Z is a partial functional dependency in
RBDMS.
Types of Decomposition
Lossless Decomposition
The process in which where we can regain the original relation R with the help of joins from
the multiple relations formed after decomposition. This process is termed as lossless
decomposition. It is used to remove the redundant data from the database while retaining
the useful information. The lossless decomposition tries to ensure following things:
• While regaining the original relation, no information should be lost.
• If we perform join operation on the sub-divided relations, we must get the original relation.
Example:
There is a relation called R(A, B, C)
C
A B
55 16 27
48 52 89
B
A
55 16
48 52
R2(B, C)
C
B
16 27
52 89
After performing the Join operation we get the same original relation
C
A B
55 16 27
48 52 89
Lossy Decomposition
As the name suggests, lossy decomposition means when we perform join operation on the
sub-relations it doesn't result to the same relation which was decomposed. After the join
operation, we always found some extraneous tuples. These extra tuples genrates difficulty
for the user to identify the original tuples.
Example:
We have a relation R(A, B, C)
C
A B
1 2 1
2 5 3
3 3 3
B
A
1 2
2 5
3 3
R2(B, C)
C
B
2 1
5 3
3 3
C
A B
1 2 1
C
A B
2 5 3
2 3 3
3 5 3
3 3 3
Properties of Decomposition
• Lossless: All the decomposition that we perform in Database management system should be
lossless. All the information should not be lost while performing the join on the sub-relation
to get back the original relation. It helps to remove the redundant data from the database.
• Dependency Preservation: Dependency Preservation is an important technique in database
management system. It ensures that the functional dependencies between the entities is
maintained while performing decomposition. It helps to improve the database efficiency,
maintain consistency and integrity.
• Lack of Data Redundancy: Data Redundancy is generally termed as duplicate data or
repeated data. This property states that the decomposition performed should not suffer
redundant data. It will help us to get rid of unwanted data and focus only on the useful data
or information.
Axioms
• Axiom of Reflexivity: If A is a set of attributes and B is a subset of A, then A holds B. If B⊆A
then A→B. This property is trivial property.
• Axiom of Augmentation: If A→B holds and Y is the attribute set, then AY→BY also holds.
That is adding attributes to dependencies, does not change the basic dependencies. If A→B,
then AC→BC for any C.
• Axiom of Transitivity: Same as the transitive rule in algebra, if A→B holds and B→C holds,
then A→C also holds. A→B is called A functionally which determines B. If X→Y and Y→Z,
then X→Z.
Example:
Let’s assume the following functional dependencies:
{A} → {B}
{B} → {C}
{A, C} → {D}
1. Reflexivity: Since any set of attributes determines its subset, we can immediately infer the
following:
• {A} → {A} (A set always determines itself).
• {B} → {B}.
• {A, C} → {A}.
2. Augmentation: If we know that {A} → {B}, we can add the same attribute (or set of
attributes) to both sides:
• From {A} → {B}, we can augment both sides with {C}: {A, C} → {B, C}.
• From {B} → {C}, we can augment both sides with {A}: {A, B} → {C, B}.
3. Transitivity: If we know {A} → {B} and {B} → {C}, we can infer that:
• {A} → {C} (Using transitivity: {A} → {B} and {B} → {C}).
Although Armstrong’s axioms are sound and complete, there are additional rules for
functional dependencies that are derived from them. These rules are introduced to simplify
operations and make the process easier.
Secondary Rules
These rules can be derived from the above axioms.
• Union: If A→B holds and A→C holds, then A→BC holds. If X→Y and X→Z then X→YZ.
• Composition: If A→B and X→Y hold, then AX→BY holds.
• Decomposition: If A→BC holds then A→B and A→C hold. If X→YZ then X→Y and X→Z.
• Pseudo Transitivity: If A→B holds and BC→D holds, then AC→D holds.
If X→Y and YZ→W then XZ→W.
Example:
Let’s assume we have the following functional dependencies in a relation schema:
{A} → {B}
{A} → {C}
{X} → {Y}
{Y, Z} → {W}
Now, let’s apply the Secondary Rules to derive new functional dependencies.
1. Union Rule: If A → B and A → C, then by the Union Rule, we can infer:
• A → BC This means if A determines both B and C, it also determines their combination, BC.
2. Composition Rule: If A → B and X → Y hold, then by the Composition Rule, we can infer:
• AX → BY
3. Decomposition Rule: If A → BC holds, then by the Decomposition Rule, we can infer:
• A → B and A → C
4. Pseudo Transitivity Rule: If A → B and BC → D hold, then by the Pseudo Transitivity Rule,
we can infer:
• AC → D
Agent
Company Product
C1 TV Aman
C1 AC Aman
C2 Refrigerator Mohan
C2 TV Mohit
Table: R1
Product
Company
C1 TV
C1 AC
C2 Refrigerator
C2 TV
Table: R2
Agent
Product
TV Aman
AC Aman
Refrigerator Mohan
TV Mohit
Agent
Company Product
C1 TV Aman
C1 TV Mohan
C1 AC Aman
C2 Refrigerator Mohan
C2 TV Aman
C2 TV Mohit
Here, we can see that we got two additional tuples after performing join i.e. (C1, TV, Mohan)
& (C2, TV, Aman) these tuples are known as Spurious Tuple, which is not the property of Join
Dependency. Therefore, we will create another relation R3 and perform its natural join with
(R1 ⨝ R2). So, here it is:
Table: R3
Agent
Company
C1 Aman
Agent
Company
C2 Mohan
C2 Mohit
Agent
Company Product
C1 TV Aman
C1 AC Aman
C2 Refrigerator Mohan
C2 TV Mohit
Now, we got our original relation, that we had earlier decomposed, in this way you can
decompose the original relation and check for the join dependency among them.
Importance of Join Dependencies
Join dependency can be very important for several reasons as it helps in maintaining data
integrity, possess normalization, helps in query optimization within a database. Let us see
each point in a detail:
• Data Integrity: Join Dependency helps maintain data integrity in a database. Database
designers can make sure that the queries are consistent after checking for the dependencies.
Like in lossless join dependency no information is lost, which means data is accurate. This will
remove the data that is not accurate. Similarly, a join dependency is a constraint that
maintains data integrity.
• Query Optimization: Query optimization leads to improving the performance of the
database system. The database designers can choose the best join order to execute the
queries which in turn reduces the computational costs, memory utilization and i/o
operations to get the queries executed quickly.
Stages of Transaction
Note: The updated value of Account A = 450₹ and Account B = 850₹.
All instructions before committing come under a partially committed state and are stored in
RAM. When the commit is read the data is fully accepted and is stored on a Hard Disk.
If the transaction is failed anywhere before committing we have to go back and start from
the beginning. We can’t continue from the same state. This is known as Roll Back.
Desirable Properties of Transaction (ACID Properties)
Transaction management in a Database Management System (DBMS) ensures that database
transactions are executed reliably and follow ACID properties: Atomicity, Consistency,
Isolation, and Durability. These principles help maintain data integrity, even during failures
or concurrent user interactions, ensuring that all transactions are either fully completed or
rolled back if errors occur.
For a transaction to be performed in DBMS, it must possess several properties often
called ACID properties.
• A – Atomicity
• C – Consistency
• I – Isolation
• D – Durability
Transaction States
Transactions can be implemented using SQL queries and Servers. In the diagram, you can see
how transaction states work.
Transaction States
The transaction has four properties. These are used to maintain consistency in a database,
before and after the transaction.
Property of Transaction:
• Atomicity
• Consistency
• Isolation
• Durability
Atomicity
• States that all operations of the transaction take place at once if not, the transactions are
aborted.
• There is no midway, i.e., the transaction cannot occur partially. Each transaction is treated as
one unit and either run to completion or is not executed at all.
• Atomicity involves the following two operations:
• Abort: If a transaction stops or fails, none of the changes it made will be saved or visible.
• Commit: If a transaction completes successfully, all the changes it made will be saved and
visible.
Consistency
• The rules (integrity constraint) that keep the database accurate and consistent are followed
before and after a transaction.
• When a transaction is completed, it leaves the database either as it was before or in a new
stable state.
• This property means every transaction works with a reliable and consistent version of the
database.
• The transaction is used to transform the database from one consistent state to another
consistent state. A transaction changes the database from one consistent state to another
consistent state.
Isolation
• It shows that the data which is used at the time of execution of a transaction cannot be used
by the second transaction until the first one is completed.
• In isolation, if the transaction T1 is being executed and using the data item X, then that data
item can’t be accessed by any other transaction T2 until the transaction T1ends.
• The concurrency control subsystem of the DBMS enforced the isolation property
Durability
• The durability property is used to indicate the performance of the database’s consistent
state. It states that the transaction made the permanent changes.
• They cannot be lost by the erroneous operation of a faulty transaction or by the system
failure. When a transaction is completed, then the database reaches a state known as the
consistent state. That consistent state cannot be lost, even in the event of a system’s failure.
• The recovery subsystem of the DBMS has the responsibility of Durability property.
Implementing of Atomicity and Durability
The recovery-management component of a database system can support atomicity and
durability by a variety of schemes. E.g. the shadow-database scheme:
Shadow copy
• In the shadow-copy scheme, a transaction that wants to update the database first creates a
complete copy of the database.
• All updates are done on the new database copy, leaving the original copy, the shadow copy,
untouched. If at any point the transaction has to be aborted, the system merely deletes the
new copy. The old copy of the database has not been affected.
• This scheme is based on making copies of the database, called shadow copies, assumes that
only one transaction is active at a time.
• The scheme also assumes that the database is simply a file on disk. A pointer called db
pointer is maintained on disk, It points to the current copy of the database.
Transaction Isolation Levels in DBMS
Some other transaction may also have used value produced by the failed transaction. So we
also have to rollback those transactions. The SQL standard defines four isolation levels:
• Read Uncommitted: Read Uncommitted is the lowest isolation level. In this level, one
transaction may read not yet committed changes made by other transaction, there by
allowing dirty reads. In this level, transactions are not isolated from each other.
• Read Committed: This isolation level guarantees that any data read is committed at the
moment it is read. Thus it does not allows dirty read. The transaction holds a read or write
lock on the current row, and thus prevent other transactions from reading, updating or
deleting it.
• Repeatable Read: This is the most restrictive isolation level. The transaction holds locks on
all rows it references and writes locks on all rows it inserts, updates, deletes. Since other
transaction cannot read, update or delete these rows, consequently it
avoids non-repeatable read.
• Serializable: This is the Highest isolation level. A serializable execution is guaranteed to be
serializable. Serializable execution is defined to be an execution of operations in which
concurrently executing transactions appears to be serially executing.
Failure Classification
To find that where the problem has occurred, we generalize a failure into the following
categories:
• Transaction failure
• System crash
• Disk failure
1. Transaction failure
The transaction failure occurs when it fails to execute or when it reaches a point from where
it can’t go any further. If a few transactions or process is hurt, then this is called as
transaction failure.
Reasons for a transaction failure could be –
1. Logical errors: If a transaction cannot complete due to some code error or an internal error
condition, then the logical error occurs.
2. Syntax error: It occurs where the DBMS itself terminates an active transaction because the
database system is not able to execute it. For example, The system aborts an active
transaction, in case of deadlock or resource unavailability.
2. System Crash
System failure can occur due to power failure or other hardware or software failure.
Example: Operating system error.
• Fail-stop assumption: In the system crash, non-volatile storage is assumed not to be
corrupted.
3. Disk Failure
• It occurs where hard-disk drives or storage drives used to fail frequently. It was a common
problem in the early days of technology evolution.
• Disk failure occurs due to the formation of bad sectors, disk head crash, and unreachability
to the disk or any other failure, which destroy all or part of disk storage.
Serializability
It is an important aspect of Transactions. In simple meaning, you can say that serializability is
a way to check whether two transactions working on a database are maintaining database
consistency or not.
It is of two types:
1. Conflict Serializability
2. View Serializability
Schedule
Schedule, as the name suggests is a process of lining the transactions and executing them
one by one. When there are multiple transactions that are running in a concurrent manner
and the order of operation is needed to be set so that the operations do not overlap each
other, Scheduling is brought into play and the transactions are timed accordingly.
It is of two types:
1. Serial Schedule
2. Non-Serial Schedule
Uses of Transaction Management
• The DBMS is used to schedule the access of data concurrently. It means that the user can
access multiple data from the database without being interfered with by each other.
Transactions are used to manage concurrency.
• It is also used to satisfy ACID properties.
• It is used to solve Read/Write Conflicts.
• It is used to implementRecoverability , Serializability , and Cascading.
• Transaction Management is also used forConcurrency Control Protocols and the Locking of
data.
Advantages of using a Transaction
• Maintains a consistent and valid database after each transaction.
• Makes certain that updates to the database don’t affect its dependability or accuracy.
• Enables simultaneous use of numerous users without sacrificing data consistency.
Disadvantages of using a Transaction
• It may be difficult to change the information within the transaction database by end-users.
• We need to always roll back and start from the beginning rather than continue from the
previous state.
Transaction States
These are different types of Transaction States :
1. Active State – This is the first stage of a transaction, when the transaction’s instructions
are being executed.
• It is the first stage of any transaction when it has begun to execute. The execution of the
transaction takes place in this state.
• Operations such as insertion, deletion, or updation are performed during this state.
• During this state, the data records are under manipulation and they are not saved to the
database, rather they remain somewhere in a buffer in the main memory.
2. Partially Committed –
• The transaction has finished its final operation, but the changes are still not saved to the
database.
• After completing all read and write operations, the modifications are initially stored in main
memory or a local buffer. If the changes are made permanent on the DataBase then the state
will change to “committed state” and in case of failure it will go to the “failed state”.
3. Failed State –If any of the transaction-related operations cause an error during the active
or partially committed state, further execution of the transaction is stopped and it is brought
into a failed state. Here, the database recovery system makes sure that the database is in a
consistent state.
5. Aborted State- If a transaction reaches the failed state due to a failed check, the database
recovery system will attempt to restore it to a consistent state. If recovery is not possible, the
transaction is either rolled back or cancelled to ensure the database remains consistent.
In the aborted state, the DBMS recovery system performs one of two actions:
• Kill the transaction: The system terminates the transaction to prevent it from affecting other
operations.
• Restart the transaction: After making necessary adjustments, the system reverts the
transaction to an active state and attempts to continue its execution.
6. Commuted- This state of transaction is achieved when all the transaction-related
operations have been executed successfully along with the Commit operation, i.e. data is
saved into the database after the required manipulations in this state. This marks the
successful completion of a transaction.
7. Terminated State – If there isn’t any roll-back or the transaction comes from the
“committed state”, then the system is consistent and ready for new transaction and the old
transaction is terminated.
Example of Transaction States
Imagine a bank transaction where a user wants to transfer $500 from Account A to Account
B.
Transaction States:
1. Active State:
The transaction begins. It reads the balance of Account A and checks if it has enough funds.
• Example: Read balance of Account A = $1000.
2. Partially Committed State:
The transaction performs all its operations but hasn’t yet saved (committed) the changes to
the database.
• Example: Deduct $500 from Account A’s balance ($1000 – $500 = $500) and
temporarily update Account B’s balance (add $500).
3. Committed State:
The transaction successfully completes, and the changes are saved permanently in the
database.
• Example: Account A’s new balance = $500; Account B’s new balance = $1500.
Changes are written to the database.
4. Failed State:
If something goes wrong during the transaction (e.g., power failure, system crash), the
transaction moves to this state.
• Example: System crashes after deducting $500 from Account A but before adding it
to Account B.
5. Aborted State:
The failed transaction is rolled back, and the database is restored to its original state.
• Example: Account A’s balance is restored to $1000, and no changes are made to
Account B.
2. Timestamp-Based Protocols
Timestamp-based protocols use a global timestamp to order transactions and determine the
order in which they can access data. Each transaction is given a unique timestamp when it
starts. The idea is to prevent conflicts by enforcing an order in which transactions can access
the data based on their timestamps.
Types of Timestamp Protocols:
• Basic Timestamp Ordering (TO): Each transaction is assigned a timestamp when it begins.
The protocol ensures that the operations (reads and writes) are performed in the order of
their timestamps. Specifically:
o A read operation by a transaction is allowed only if no later transaction has written
to the same data item.
o A write operation by a transaction is allowed only if no later transaction has read or
written to the same data item.
• Thomas’ Write Rule: A variation of the basic timestamp ordering protocol that allows certain
writes to be ignored if they are no longer relevant (i.e., the data has already been
overwritten by a transaction with a later timestamp). This helps improve efficiency without
sacrificing consistency.
Advantage:
• No Locks: Timestamp-based protocols don't require locks, so they avoid the possibility of
deadlocks.
Disadvantage:
• Rollback Overhead: Transactions might need to be rolled back and restarted if they violate
the timestamp order, leading to additional overhead.
T2
[Link] T1
1 lock-X(B)
2 read(B)
3 B:=B-50
4 write(B)
5 lock-S(A)
6 read(A)
7 lock-S(B)
T2
[Link] T1
8 lock-X(A)
9 …… ……
1. Deadlock
In the given execution scenario, T1 holds an exclusive lock on B, while T2 holds a shared lock
on A. At Statement 7, T2 requests a lock on B, and at Statement 8, T1 requests a lock on A.
This situation creates a deadlock, as both transactions are waiting for resources held by the
other, preventing either from proceeding with their execution.
2. Starvation
Starvation is also possible if concurrency control manager is badly designed. For example: A
transaction may be waiting for an X-lock on an item, while a sequence of other transactions
request and are granted an S-lock on the same item. This may be avoided if the concurrency
control manager is properly designed.
Transaction-2
Transaction-1
R(a)
W(a)
R(b)
W(b)
R(b)
R(a)
Transaction-2
Transaction-1
W(b)
W(a)
We can observe that Transaction-2 begins its execution before Transaction-1 is finished, and
they are both working on the same data, i.e., "a" and "b", interchangeably. Where "R"-Read,
"W"-Write
Serializability testing
We can utilize the Serialization Graph or Precedence Graph to examine a schedule's
serializability. A schedule's full transactions are organized into a Directed Graph, what a
serialization graph is.
Precedence
Graph
It can be described as a Graph G(V, E) with vertices V = "V1, V2, V3,..., Vn" and directed edges
E = "E1, E2, E3,..., En". One of the two operations—READ or WRITE—performed by a certain
transaction is contained in the collection of edges. Where Ti -> Tj, means Transaction-Ti is
either performing read or write before the transaction-Tj.
Types of Serializability
There are two ways to check whether any non-serial schedule is serializable.
Types
of Serializability - Conflict & View
1. Conflict serializability
Conflict serializability refers to a subset of serializability that focuses on maintaining the
consistency of a database while ensuring that identical data items are executed in an order.
In a DBMS each transaction has a value and all the transactions, in the database rely on this
uniqueness. This uniqueness ensures that no two operations with the conflict value can
occur simultaneously.
For example lets consider an order table and a customer table as two instances. Each order is
associated with one customer even though a single client may place orders. However there
are restrictions for achieving conflict serializability in the database. Here are a few of them.
1. Different transactions should be used for the two procedures.
2. The identical data item should be present in both transactions.
3. Between the two operations, there should be at least one write operation.
Example
Three transactions—t1, t2, and t3—are active on a schedule "S" at once. Let's create a graph
of precedence.
Transaction - 3 (t3)
Transaction - 1 (t1) Transaction - 2 (t2)
R(a)
R(b)
R(b)
W(b)
W(a)
W(a)
R(a)
W(a)
It is a conflict serializable schedule as well as a serial schedule because the graph (a DAG) has
no loops. We can also determine the order of transactions because it is a serial schedule.
DAG of
transactions
As there is no incoming edge on Transaction 1, Transaction 1 will be executed first. T3 will run
second because it only depends on T1. Due to its dependence on both T1 and T3, t2 will
finally be executed.
Therefore, the serial schedule's equivalent order is: t1 --> t3 --> t2
Note: A schedule is unquestionably consistent if it is conflicting serializable. A non-conflicting
serializable schedule, on the other hand, might or might not be serial. We employ the idea of
View Serializability to further examine its serial behavior.
2. View Serializability
View serializability is a kind of operation in a serializable in which each transaction should
provide some results, and these outcomes are the output of properly sequentially executing
the data item. The view serializability, in contrast to conflict serialized, is concerned with
avoiding database inconsistency. The view serializability feature of DBMS enables users to
see databases in contradictory ways.
To further understand view serializability in DBMS, we need to understand the schedules S1
and S2. The two transactions T1 and T2 should be used to establish these two schedules.
Each schedule must follow the three transactions in order to retain the equivalent of the
transaction. These three circumstances are listed below.
1. The first prerequisite is that the same kind of transaction appears on every schedule. This
requirement means that the same kind of group of transactions cannot appear on both
schedules S1 and S2. The schedules are not equal to one another if one schedule commits a
transaction but it does not match the transaction of the other schedule.
2. The second requirement is that different read or write operations should not be used in
either schedule. On the other hand, we say that two schedules are not similar if schedule S1
has two write operations whereas schedule S2 only has one. The number of the write
operation must be the same in both schedules, however there is no issue if the number of
the read operation is different.
3. The second to last requirement is that there should not be a conflict between either
timetable. execution order for a single data item. Assume, for instance, that schedule S1's
transaction is T1, and schedule S2's transaction is T2. The data item A is written by both the
transaction T1 and the transaction T2. The schedules are not equal in this instance. However,
we referred to the schedule as equivalent to one another if it had the same number of all
write operations in the data item.
What is view equivalency?
Schedules (S1 and S2) must satisfy these two requirements in order to be viewed as
equivalent:
1. The same piece of data must be read for the first time. For instance, if transaction t1 is
reading "A" from the database in schedule S1, then t1 must also read A in schedule S2.
2. The same piece of data must be used for the final write. As an illustration, if transaction t1
updated A last in S1, it should also conduct final write in S2.
3. The middle sequence need to follow suit. As an illustration, if in S1 t1 is reading A, and t2
updates A, then in S2 t1 should read A, and t2 should update A.
View Serializability refers to the process of determining whether a schedule's views are
equivalent.
Example
We have a schedule "S" with two concurrently running transactions, "t1" and "t2."
Schedule - S:
Transaction-2 (t2)
Transaction-1 (t1)
R(a)
W(a)
R(a)
W(a)
R(b)
W(b)
R(b)
W(b)
By switching between both transactions' mid-read-write operations, let's create its view
equivalent schedule (S').
Schedule - S':
Transaction-2 (t2)
Transaction-1 (t1)
R(a)
W(a)
Transaction-2 (t2)
Transaction-1 (t1)
R(b)
W(b)
R(a)
W(a)
R(b)
W(b)
Transaction - 3 (t3)
Transaction - 1 (t1) Transaction - 2 (t2)
R(a)
R(b)
R(b)
W(b)
W(a)
W(a)
Transaction - 3 (t3)
Transaction - 1 (t1) Transaction - 2 (t2)
R(a)
W(a)
It is a conflict serializable schedule as well as a serial schedule because the graph (a DAG) has
no loops. We can also determine the order of transactions because it is a serial schedule.
DAG of
transactions
As there is no incoming edge on Transaction 1, Transaction 1 will be executed first. T3 will run
second because it only depends on T1. Due to its dependence on both T1 and T3, t2 will
finally be executed.
Therefore, the serial schedule's equivalent order is: t1 --> t3 --> t2
Note: A schedule is unquestionably consistent if it is conflicting serializable. A non-conflicting
serializable schedule, on the other hand, might or might not be serial. We employ the idea of
View Serializability to further examine its serial behavior.
2. View Serializability
View serializability is a kind of operation in a serializable in which each transaction should
provide some results, and these outcomes are the output of properly sequentially executing
the data item. The view serializability, in contrast to conflict serialized, is concerned with
avoiding database inconsistency. The view serializability feature of DBMS enables users to
see databases in contradictory ways.
To further understand view serializability in DBMS, we need to understand the schedules S1
and S2. The two transactions T1 and T2 should be used to establish these two schedules.
Each schedule must follow the three transactions in order to retain the equivalent of the
transaction. These three circumstances are listed below.
4. The first prerequisite is that the same kind of transaction appears on every schedule. This
requirement means that the same kind of group of transactions cannot appear on both
schedules S1 and S2. The schedules are not equal to one another if one schedule commits a
transaction but it does not match the transaction of the other schedule.
5. The second requirement is that different read or write operations should not be used in
either schedule. On the other hand, we say that two schedules are not similar if schedule S1
has two write operations whereas schedule S2 only has one. The number of the write
operation must be the same in both schedules, however there is no issue if the number of
the read operation is different.
6. The second to last requirement is that there should not be a conflict between either
timetable. execution order for a single data item. Assume, for instance, that schedule S1's
transaction is T1, and schedule S2's transaction is T2. The data item A is written by both the
transaction T1 and the transaction T2. The schedules are not equal in this instance. However,
we referred to the schedule as equivalent to one another if it had the same number of all
write operations in the data item.
What is view equivalency?
Schedules (S1 and S2) must satisfy these two requirements in order to be viewed as
equivalent:
4. The same piece of data must be read for the first time. For instance, if transaction t1 is
reading "A" from the database in schedule S1, then t1 must also read A in schedule S2.
5. The same piece of data must be used for the final write. As an illustration, if transaction t1
updated A last in S1, it should also conduct final write in S2.
6. The middle sequence need to follow suit. As an illustration, if in S1 t1 is reading A, and t2
updates A, then in S2 t1 should read A, and t2 should update A.
View Serializability refers to the process of determining whether a schedule's views are
equivalent.
Example
We have a schedule "S" with two concurrently running transactions, "t1" and "t2."
Schedule - S:
Transaction-2 (t2)
Transaction-1 (t1)
R(a)
W(a)
R(a)
W(a)
R(b)
W(b)
R(b)
W(b)
By switching between both transactions' mid-read-write operations, let's create its view
equivalent schedule (S').
Schedule - S':
Transaction-2 (t2)
Transaction-1 (t1)
R(a)
W(a)
R(b)
W(b)
R(a)
W(a)
R(b)
W(b)
Serializability Testing
Serializability testing involves checking whether a given schedule of transactions is
serializable, i.e., whether it can be rearranged into an equivalent serial schedule without
violating any data consistency or integrity rules.
Steps to Test Serializability
1. Conflict Graph Method (Precedence Graph Method): The most common method for testing
conflict serializability is to use a precedence graph (also called a conflict graph or
serializability graph). This method works as follows:
o Step 1: Build the Precedence Graph
▪ Create a directed graph where each node represents a transaction.
▪ Draw a directed edge from transaction T1 to transaction T2 if T1 conflicts
with T2. A conflict occurs if:
▪ Both transactions access the same data item.
▪ At least one of the transactions is a write operation.
▪ The direction of the edge indicates the order of execution. If T1 writes a data
item and T2 reads or writes the same data item later, draw an edge from T1
to T2.
o Step 2: Check for Cycles
▪ If the graph contains any cycles, the schedule is not conflict serializable
because cycles indicate conflicting transactions that cannot be reordered
into a serial schedule.
▪ If the graph is acyclic, the schedule is conflict serializable and can be
transformed into a serial schedule by following the topological order of the
transactions in the graph.
Example of a conflict graph:
o Suppose we have two transactions, T1 and T2, with the following operations:
▪ T1: Write(A)
▪ T2: Read(A)
▪ T1: Write(B)
▪ T2: Write(A)
o We would create a graph with nodes for T1 and T2, and draw directed edges based
on conflicts:
▪ There's a conflict between T1: Write(A) and T2: Read(A) (T1 → T2).
▪ There's a conflict between T1: Write(B) and T2: Write(A) (T1 → T2).
o If there are no cycles in the graph, the schedule is conflict-serializable.
2. Serializable Schedule Definition via Transaction Graphs:
o Transaction graphs can also be used to model schedules. These graphs represent the
transaction dependencies (i.e., which transactions must wait for others).
o A schedule is serializable if there exists a serial execution that respects these
dependencies.
3. Lock-based Serializability Testing:
o Lock-based concurrency control methods, like Two-Phase Locking (2PL), can be used
to test serializability by observing whether a schedule conforms to the rules of
locking and whether the transactions can be reordered without violating isolation.
o In lock-based methods, the database ensures serializability by acquiring locks for
each operation (read or write). The success of the lock acquisition process can be
checked to verify if the transactions are serializable.
4. Serialization Graph Method (also known as a Serializable Precedence Graph):
o This method builds a directed graph to model the dependency between
transactions. The graph is built based on read and write operations, and edges are
added between transactions that have conflicts (e.g., one writes and another reads
or writes the same data).
o After constructing the graph, a topological sort is performed. If the sort produces a
cycle, the schedule is not serializable; otherwise, the schedule is serializable.
Write T1 A
Operation Transaction Data Item
Read T2 A
Write T2 B
Write T1 B
1. Conflict analysis:
o T1 and T2 both access A, and T1 writes it while T2 reads it. Therefore, there is a
conflict, and we add a directed edge from T1 to T2 (T1 → T2).
o T1 and T2 both access B, and T1 writes it while T2 writes it too. There is a conflict,
and we add another directed edge from T1 to T2 (T1 → T2).
2. Precedence graph:
o Nodes: T1, T2
o Directed edges: T1 → T2 (for both A and B).
3. Checking for cycles:
o There is no cycle in the graph. Therefore, the schedule is conflict serializable.
4. Result:
o The schedule is conflict serializable, and we can find an equivalent serial schedule by
following the topological order (in this case, T1 → T2).
Closed Hashing
• Quadratic probing: Quadratic probing is very much similar to open hashing or linear probing.
Here, The only difference between old and new buckets is linear. The quadratic function is
used to determine the new bucket address.
• Double Hashing: Double Hashing is another method similar to linear probing. Here the
difference is fixed as in linear probing, but this fixed difference is calculated by using another
hash function. That’s why the name is double hashing.
Dynamic Hashing
The drawback of static hashing is that it does not expand or shrink dynamically as the size of
the database grows or shrinks. In Dynamic hashing, data buckets grow or shrink (added or
removed dynamically) as the records increase or decrease. Dynamic hashing is also known
as extended hashing. In dynamic hashing, the hash function is made to produce a large
number of values. For Example, there are three data records D1, D2, and D3. The hash
function generates three addresses 1001, 0101, and 1010 respectively. This method of
storing considers only part of this address – especially only the first bit to store the data. So it
tries to load three of them at addresses 0 and 1.
dynamic hashing
But the problem is that No bucket address is remaining for D3. The bucket has to grow
dynamically to accommodate D3. So it changes the address to have 2 bits rather than 1 bit,
and then it updates the existing data to have a 2-bit address. Then it tries to accommodate
D3.
dynamic hashing
45. Explain about Indexed sequential access method (ISAM) File Organization
Indexed sequential access method also known as ISAM method, is an upgrade to the
conventional sequential file organization method. You can say that it is an advanced version
of sequential file organization method. In this method, primary key of the record is stored
with an address, this address is mapped to an address of a data block in memory. This
address field works as an index of the file.
In this method, reading and fetching a record is done using the index of the file. Index field
contains the address of a data record in memory, which can be quickly used to read and
fetch the record from memory.
Advantages of ISAM
1. Searching a record is faster in ISAM file organization compared to other file organization
methods as the primary key can be used to identify the record and since primary key also has
the address of the record, it can read and fetch the data from memory.
2. This method is more flexible compared to other methods as this allows to generate the
index field (address field) for any column of the record. This makes searching easier and
efficient as searches can be done using multiple column fields.
3. This allows range retrieval of the records since the address file is stored with the primary key
of the record, we can retrieve the record based on a certain range of primary key columns.
4. This method allow partial searches as well. For example, employee name starting with “St”
can be used to search all the employees with the name starting with letters “St”. This will
result all the records where employee name begins with the letters “St”.
Disadvantages of ISAM
1. Requires additional space in the memory to store the index field.
2. After adding a record to the file, the file needs to be re-organized to maintain the sequence
based on primary key column.
3. Requires memory cleanup because when a record is deleted, the space used by the record
needs to be released in order to be used by the other record.
4. Performance issues are there if there are frequent deletion of records, as every deletion
needs a memory cleanup and optimization.
• The search key is the database’s first column, and it contains a duplicate or copy of the
table’s candidate key or primary key. The primary key values are saved in sorted order so that
the related data can be quickly accessible.
• The data reference is the database’s second column. It contains a group of pointers that
point to the disk block where the value of a specific key can be found.
Methods of Indexing
Ordered Indices
To make searching easier and faster, the indices are frequently arranged/sorted. Ordered
indices are indices that have been sorted.
Example
Let’s say we have a table of employees with thousands of records, each of which is ten bytes
large. If their IDs begin with 1, 2, 3,…, etc., and we are looking for the student with ID-543:
• We must search the disk block from the beginning till it reaches 543 in the case of a DB
without an index. After reading 543*10=5430 bytes, the DBMS will read the record.
• We will perform the search using indices in the case of an index, and the DBMS would read
the record after it reads 542*2 = 1084 bytes, which is significantly less than the prior
example.
Primary Index
• Primary indexing refers to the process of creating an index based on the table’s primary key.
These primary keys are specific to each record and establish a 1:1 relationship between
them.
• The searching operation is fairly efficient because primary keys are stored in sorted order.
• There are two types of primary indexes: dense indexes and sparse indexes.
Dense Index
Every search key value in the data file has an index record in the dense index. It speeds up
the search process. The total number of records present in the index table and the main
table are the same in this case. It requires extra space to hold the index record. A pointer to
the actual record on the disk and the search key are both included in the index records.
Sparse Index
Only a few items in the data file have index records. Each and every item points to a certain
block. Rather than pointing to each item in the main database, the index, in this case, points
to the records that are present in the main table that is in a gap.
Clustering Index
• An ordered data file can be defined as a clustered index. Non-primary key columns, which
may or may not be unique for each record, are sometimes used to build indices.
• In this situation, we’ll join two or more columns to acquire the unique value and generate an
index out of them to make it easier to find the record. A clustering index is a name for this
method.
• Records with comparable properties are grouped together, and indices for these groups are
constructed.
Example
Assume that each department in a corporation has numerous employees. Assume we utilise
a clustering index, in which all employees with the same Dept_ID are grouped together into a
single cluster, and index pointers refer to the cluster as a whole. Dept_Id is a non-unique key
in this case.
Because one disk block is shared by records from various clusters, the previous structure is a
little unclear. It is referred to as a better strategy when we employ distinct disk blocks for
separate clusters.
Secondary Index
When using sparse indexing, the size of the mapping grows in sync with the size of the table.
These mappings are frequently stored in primary memory to speed up address fetching. The
secondary memory then searches the actual data using the address obtained through
mapping. Fetching the address becomes slower as the mapping size increases. The sparse
index will be ineffective in this scenario, so secondary indexing is used to solve this problem.
Another level of indexing is introduced in secondary indexing to reduce the size of the
mapping. The massive range for the columns is chosen first in this method, resulting in a
small mapping size at the first level. Each range is then subdivided into smaller groups.
Because the first level’s mapping is kept in primary memory, fetching the addresses is faster.
The second-level mapping, as well as the actual data, are kept in secondary memory (or hard
disk).
Example
• In case we want to find the record for roll 111 in the diagram, it will look for the highest item
in the first level index that is equal to or smaller than 111. At this point, it will get a score of
100.
• Then it does max (111) <= 111 in the second index level and obtains 110. Using address 110,
it now navigates to the data block and begins searching through each record until it finds
111.
• In this method, a search is carried out in this manner. In the same way, you can insert,
update, or delete data.
Because one disk block is shared by records from various clusters, the previous structure is a
little unclear. It is referred to as a better strategy when we employ distinct disk blocks for
separate clusters.
Secondary Index
When using sparse indexing, the size of the mapping grows in sync with the size of the table.
These mappings are frequently stored in primary memory to speed up address fetching. The
secondary memory then searches the actual data using the address obtained through
mapping. Fetching the address becomes slower as the mapping size increases. The sparse
index will be ineffective in this scenario, so secondary indexing is used to solve this problem.
Another level of indexing is introduced in secondary indexing to reduce the size of the
mapping. The massive range for the columns is chosen first in this method, resulting in a
small mapping size at the first level. Each range is then subdivided into smaller groups.
Because the first level’s mapping is kept in primary memory, fetching the addresses is faster.
The second-level mapping, as well as the actual data, are kept in secondary memory (or hard
disk).
Example
• In case we want to find the record for roll 111 in the diagram, it will look for the highest item
in the first level index that is equal to or smaller than 111. At this point, it will get a score of
100.
• Then it does max (111) <= 111 in the second index level and obtains 110. Using address 110,
it now navigates to the data block and begins searching through each record until it finds
111.
• In this method, a search is carried out in this manner. In the same way, you can insert,
update, or delete data.
A B-Tree is a specialized m-way tree designed to optimize data access, especially on disk-based
storage systems.
• In a B-Tree of order m, each node can have up to m children and m-1 keys, allowing it to
efficiently manage large datasets.
• One of the standout features of a B-Tree is its ability to store a significant number of keys
within a single node, including large key values. It significantly reduces the tree’s height,
hence reducing costly disk operations.
• B Trees allow faster data retrieval and updates, making them an ideal choice for systems
requiring efficient and scalable data management. By maintaining a balanced structure at all
times,
• B-Trees deliver consistent and efficient performance for critical operations such as search,
insertion, and deletion.
Following is an example of a B-Tree of order 5 .
Properties of a B-Tree
A B Tree of order m can be defined as an m-way search tree which satisfies the following
properties:
1. All leaf nodes of a B tree are at the same level, i.e. they have the same depth (height of the
tree).
2. The keys of each node of a B tree (in case of multiple keys), should be stored in the
ascending order.
3. In a B tree, all non-leaf nodes (except root node) should have at least m/2 children.
4. All nodes (except root node) should have at least m/2 - 1 keys.
5. If the root node is a leaf node (only node in the tree), then it will have no children and will
have at least one key. If the root node is a non-leaf node, then it will have at least 2
children and at least one key.
6. A non-leaf node with n-1 key values should have n non NULL children.