0% found this document useful (0 votes)
237 views60 pages

Understanding ACID Transactions in DBMS

The document discusses the concept of transactions in database management systems (DBMS), detailing transaction states, ACID properties, and concurrency control mechanisms. It outlines the lifecycle of transactions, including active, partially committed, committed, failed, aborted, and terminated states, while emphasizing the importance of ACID properties for data integrity. Additionally, it addresses concurrency issues such as dirty reads and lost updates, and introduces methods for ensuring serializability and maintaining consistency during concurrent executions.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
237 views60 pages

Understanding ACID Transactions in DBMS

The document discusses the concept of transactions in database management systems (DBMS), detailing transaction states, ACID properties, and concurrency control mechanisms. It outlines the lifecycle of transactions, including active, partially committed, committed, failed, aborted, and terminated states, while emphasizing the importance of ACID properties for data integrity. Additionally, it addresses concurrency issues such as dirty reads and lost updates, and introduces methods for ensuring serializability and maintaining consistency during concurrent executions.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

UNIT-V

Transaction Concept:

Transaction State, ACID properties, Concurrent Executions, Serializability, Recoverability,


Implementation of Isolation, Testing for Serializability, lock based, time stamp based, optimistic,
concurrency protocols, Deadlocks, Failure Classification, Storage, Recovery and Atomicity,
Recovery algorithm.

Introduction to Indexing Techniques:

B+ Trees, operations on B+Trees, Hash Based Indexing


Transaction Concept

Transaction State

A transaction can be viewed as a set of operations used to perform a logical set of tasks.
Transactions are used to change data in the database. This can be done by inserting new data,
modifying existing data, or deleting existing data. It must follow ACID properties (Atomicity,
Consistency, Isolation, Durability) to ensure data integrity.

What are the Transaction States in DBMS?

The different stages a transaction goes through during its lifecycle are known as the transaction
states. The following is a diagrammatic representation of the different stages of a transaction.

We shall discuss all the different stages as can be seen in the diagram above.

 Active state: This is the very first state of the transaction. All the read-write operations
of the transaction are currently running then the transaction is in the active state. If there
is any failure, it goes to the failed state. If all operations are successful then the transaction
moves to the partially committed state. All the changes that are carried out in this stage
are stored in the buffer memory.

Syntax:
 START TRANSACTION;
 UPDATE accounts SET balance = balance - 500 WHERE account_no = 101;

The transaction is active while this statement is executing.


 Partially Committed state: Once all the instructions of the transaction are successfully
executed, the transaction enters the Partially Committed state. If the changes are made
permanent from the buffer memory, then the transaction enters the Committed state.
Otherwise, if there is any failure, it enters the failed state. The main reason for this state
is that every time a database operation is performed, a transaction can involve a large
number of changes to the database, and if a power failure or other technical problem
occurs when the system goes down the transaction will result in Inconsistent changes to
the database.

Syntax:
 UPDATE accounts SET balance = balance - 500 WHERE account_no = 101;
 UPDATE accounts SET balance = balance + 500 WHERE account_no = 102;

After executing these statements, the transaction moves to the Partially Committed state.

 Committed state: Once all the operations are successfully executed and the transaction
is out of the partially committed state, all the changes become permanent in the database.
That is the Committed state. There’s no going back! The changes cannot be rolled back
and the transaction goes to the terminated state.

Syntax:
 COMMIT;

The changes are now permanently stored in the database.

 Failed state: In case there is any failure in carrying out the instructions while the
transaction is in the active state, or there are any issues while saving the changes
permanently into the database (i.e. in the partially committed stage) then the transaction
enters the failed state.

Syntax:
 UPDATE accounts SET balance = balance - 500 WHERE account_no = 101;

-- System crashes before transferring to account_no = 102

The transaction fails and moves to the Failed state.

 Aborted state: If any of the checks fail and the transaction reaches a failed state, the
database recovery system ensures that the database is in a previously consistent state.
Otherwise, the transaction is aborted or rolled back, leaving the database in a consistent
state. If a transaction fails in the middle of a transaction, all running transactions are rolled
back to a consistent state before executing the transaction.

Syntax:
 ROLLBACK;

This command undoes all changes made by the transaction.

 Terminated state: If a transaction is aborted, then there are two ways of recovering the
DBMS, one is by restarting the task, and the other is by terminating the task and making
itself free for other transactions. The latter is known as the terminated state.

Example:

Online Money Transfer (Step-by-Step Transaction States)

Step 1: Active State

 User initiates a ₹500 transfer from Account A to Account B.

 System reads the balance and starts the process.

Step 2: Partially Committed State

 ₹500 is deducted from Account A.

 Amount is yet to be credited to Account B.

Step 3a: Committed State (Successful Transaction)

 ₹500 is successfully credited to Account B.

 Transaction is permanently stored in the database.

 Transaction Completed.

OR

Step 3b: Failed State (Error Occurs)

 Network issue or server failure occurs.

 ₹500 is deducted from Account A, but not credited to Account B.

Step 4: Aborted State (Rollback Process)


 Since the transaction failed, ₹500 is returned to Account A.

Step 5: Terminated State

 The transaction ends successfully (either committed or rolled back).

Final Result:

 If successful: ₹500 moves from A → B.

 If failed: ₹500 is returned to A (rollback).

ACID properties

ACID properties are a set of properties that guarantee reliable processing of transactions in a
database management system (DBMS). Transactions are a sequence of database operations that
are executed as a single unit of work, and the ACID properties ensure that transactions are
processed reliably and consistently in a DBMS.

ACID stands for Atomicity, Consistency, Isolation, and Durability.

1. Atomicity (All or Nothing Rule)

 Atomicity ensures that a transaction is either completely executed or completely rolled


back if any part fails.

 If a transaction has multiple steps, either all steps execute successfully or none.

 If a failure occurs at any step, all previous changes made by the transaction are undone
(rolled back).
 No partial execution is allowed.

Example of Atomicity:
Transaction T1: Transferring $500 from Account A to Account B

Step 1: Deduct $500 from A.

Step 2: Add $500 to B. (Fails due to system crash)

Problem:

 The money is deducted from Account A but not added to Account B due to failure.

 This breaks Atomicity because the transaction is partially executed.

Solution:

 The DBMS will rollback the transaction and restore Account A's balance.

2. Consistency (Data Integrity Must Be Maintained)

 The database must always remain in a valid state before and after a transaction.

 Data must follow all integrity constraints and business rules before and after execution.

 No transaction should violate referential integrity (foreign keys, unique constraints, etc.).

 A transaction should transform the database from one consistent state to another.

Example of Consistency:

Transaction T2: Transferring $200 from A to B

Before Transaction:

o Account A = $1000

o Account B = $500

o Total = $1500 (consistent)

After Transaction:

o Account A = $800

o Account B = $700

o Total = $1500 (consistent)


Problem (If Consistency is Violated):

 If Account A is deducted but B is not credited, the total sum of money ($1300 instead of
$1500) becomes inconsistent.

Solution:

 DBMS ensures that data follows integrity constraints and remains consistent before and
after transaction execution.

3. Isolation (Transactions Should Not Interfere With Each Other)

 Multiple transactions running concurrently should not interfere with each other.

 One transaction should not read or modify uncommitted changes made by another
transaction.

 Prevents issues like dirty reads, lost updates, and unrepeatable reads.

 The result should be the same as if transactions were executed sequentially.

Example of Isolation:
Two Transactions Running Concurrently

 Transaction T3: Deducts $100 from Account A.

 Transaction T4: Reads balance of Account A before T3 completes.

 If Transaction T4 reads before T3 commits, it may see an inconsistent value (Dirty Read
Problem).

Solution:

 Isolation levels (READ COMMITTED, REPEATABLE READ, SERIALIZABLE)


ensure proper execution order.

 Locks prevent a transaction from reading uncommitted data.

4. Durability (Changes Must Persist After Commit)

 Once a transaction is committed, its changes must be permanent, even if the system
crashes.

 The database system must store committed changes in non-volatile memory (like a hard
disk).
 Ensures that data remains safe even in case of system failure.

 Data should not be lost after a successful transaction.

Example of Durability:
Transaction T5:

Step 1: User books a train ticket online.

Step 2: Payment is processed and confirmed.

Step 3: Transaction commits successfully.

Step 4: Power failure occurs.

Problem (Without Durability):

 If the transaction is not saved permanently, the ticket booking may be lost after a system
crash.

Solution:

 DBMS writes committed data to disk (WAL - Write-Ahead Logging).

 After a crash, recovery mechanisms ensure changes are restored.

Why Are ACID Properties Important?

 Ensures data reliability in databases.


 Prevents inconsistencies and corruption.
 Provides fault tolerance during system failures.
 Ensures correct concurrent execution.

Real-Life Example Where ACID is Used:

 Banking Transactions (Ensures correct balance updates).

 E-commerce Orders (Ensures payments are processed correctly).

 Railway/Airline Ticket Booking (Prevents duplicate or lost bookings).


Concurrent Executions

Concurrent execution in a multi-user database system enables multiple users to access and
perform operations on the database at the same time. This improves efficiency but introduces
challenges such as lost updates, dirty reads, and unrepeatable reads. To maintain consistency,
transactions must execute in an interleaved manner, ensuring that no operation affects others
incorrectly. Effective concurrency control mechanisms, such as locking protocols, timestamp
ordering, and validation techniques, help manage these challenges and preserve data integrity.

Concurrency execution Problems

Several problems that arise when numerous transactions execute simultaneously in a random
manner are referred to as Concurrency Control Problems.

1. Dirty Read Problem

The dirty read problem in DBMS occurs when a transaction reads the data that has been updated
by another transaction that is still uncommitted. It arises due to multiple uncommitted
transactions executing simultaneously.

Example: Consider two transactions A and B performing read/write operations on a data DT in


the database DB. The current value of DT is 1000:

The following table shows the read/write operations in A and B transactions.

Explanation of the Steps

1. t1: T1 reads DT = 1000.

2. t2: T1 updates DT to 1500, but it is not yet committed.

3. t3: T2 reads the uncommitted value (1500) instead of 1000 → Dirty Read Occurs.

4. t4: T2 performs calculations using DT = 1500.


5. t5: T1 fails/rolls back → DT reverts to 1000.

6. t6: T2 has already used incorrect value (1500) for further operations, leading to data
inconsistency.

2. Unrepeatable Read Problem

The unrepeatable read problem occurs when two or more different values of the same data are
read during the read operations in the same transaction.

Example: Consider two transactions A and B performing read/write operations on a data DT in


the database DB. The current value of DT is 1000:

The following table shows the read/write operations in A and B transactions.

Transaction A and B initially read the value of DT as 1000. Transaction A modifies the value of
DT from 1000 to 1500 and then again transaction B reads the value and finds it to be 1500.
Transaction B finds two different values of DT in its two different read operations.

3. Phantom Read Problem

In the phantom read problem, data is read through two different read operations in the same
transaction. In the first read operation, a value of the data is obtained but in the second operation,
an error is obtained saying the data does not exist.

Example: Consider two transactions A and B performing read/write operations on a data DT in


the database DB. The current value of DT is 1000: The following table shows the read/write
operations in A and B transactions.
Transaction B initially reads the value of DT as 1000. Transaction A deletes the data DT from
the database DB and then again transaction B reads the value and finds an error saying the data
DT does not exist in the database DB.

4. Lost Update Problem

The Lost Update problem arises when an update in the data is done over another update but by
two different transactions.

Example: Consider two transactions A and B performing read/write operations on a data DT in


the database DB. The current value of DT is 1000:

The following table shows the read/write operations in A and B transactions.


Transaction A initially reads the value of DT as 1000. Transaction A modifies the value of DT
from 1000 to 1500 and then again transaction B modifies the value to 1800. Transaction A again
reads DT and finds 1800 in DT and therefore the update done by transaction A has been lost.

Concurrency Control

Concurrency Control refers to the techniques used to manage concurrent transactions while
ensuring correctness, isolation, and serializability.

 It ensures that transactions execute safely without interfering with each other.

Key Concepts in Concurrency Control:

1. Need for Concurrency Control

o Prevents data inconsistencies.

o Ensures transactions follow ACID properties (Atomicity, Consistency, Isolation,


Durability).

2. Ensuring Serializability

o Conflict Serializability: Transactions must execute in a way that maintains


conflict-free order.

o View Serializability: The final result of concurrent execution should match that
of some serial execution.

3. Concurrency Control Mechanisms (General Methods to Handle Concurrency)

o Locking Mechanisms (Restrict access to data).

o Timestamp Mechanisms (Order transactions based on their start time).

o Validation Mechanisms (Optimistically validate transactions before committing).

Serializability

Serializability of schedules ensures that a non-serial schedule is equivalent to a serial schedule.


It helps in maintaining the transactions to execute simultaneously without interleaving one
another. In simple words, serializability is a way to check if the execution of two or more
transactions are maintaining the database consistency or not.

Schedules and Serializable Schedules in DBMS

A schedule in DBMS is a sequence of operations from multiple transactions. Operations include:

 R(X): Reading the value of X.

 W(X): Writing the value of X.

Schedules in DBMS are of two types:

1. Serial Schedule

A schedule in which only one transaction is executed at a time, i.e., one transaction is executed
completely before starting another transaction.

Example:
Here, we can see that Transaction-2 starts its execution after the completion of Transaction-1.

Serial schedules are always serializable because the transactions only work one after the other.
Also, for a transaction, there are n! serial schedules possible (where n is the number of
transactions).

2. Non-serial Schedule

A schedule in which the transactions are interleaving or interchanging. There are several
transactions executing simultaneously as they are being used in performing real-world database
operations. These transactions may be working on the same piece of data. Hence, the
serializability of non-serial schedules is a major concern so that our database is consistent before
and after the execution of the transactions.

Example:

We can see that Transaction-2 starts its execution before the completion of Transaction-1, and
they are interchangeably working on the same data, i.e., "a" and "b". Convert it into an equivalent
serial schedule (Serializable Schedule).

What is a serializable schedule?

A non-serial schedule is called a serializable schedule if it can be converted to its equivalent


serial schedule. In simple words, if a non-serial schedule and a serial schedule result in the same
then the non-serial schedule is called a serializable schedule.
Types of Serializability

Serializability of any non-serial schedule can be verified using two types mainly: Conflict
Serializability and View Serializability.

One more way to check serializability is by forming an equivalent serial schedule that results in
the same as the original non-serial schedule. Since this process only focuses on the output rather
than the operations taking place in between the switching of transactions, it is not practically
used. Now let's discuss Conflict and View Serializability in detail.

Conflict Serializability and Conflict Serializable Schedules

A non-serial schedule is a conflict serializable if, after performing some swapping on the non-
conflicting operation results in a serial schedule. It is checked using the non-serial schedule and
an equivalent serial schedule. This process of checking is called Conflict Serializability in
DBMS.

It is tedious to use if we have many operations and transactions as it requires a lot of swapping.

For checking, we will use the same Precedence Graph technique discussed above. First, we will
check conflicting pairs operations (read-write, write-read, and write-write) and then form
directed edges between those conflicting pair transactions. If we can find a loop in the graph,
then the schedule is non-conflicting serializable otherwise it is surely a conflicting serializable
schedule.

Conflicting Operations

The two operations become conflicting if all conditions satisfy:

1. Both belong to separate transactions.

2. They have the same data item.

3. They contain at least one write operation.

If two transactions are performing

Operations are on the different data items then always it is Non conflict operations

Operations are on the same data items then it is sometimes non conflict operations/ Conflict
operations based on the 4 cases:

T1 T2
Read(A) Read(A) - Non Conflict

Read(A) Write(A) - Conflict

Write(A) Read(A) - Conflict

Write(A) Write(A) - Conflict

1. Non-Conflict Case: Read(A) - Read(A)

Swapping Allowed

Swapped:

Since Read(A) - Read(A) is non-conflicting, swapping is allowed.

2. Conflict Case: Read(A) - Write(A)

Swapping NOT Allowed

Swapping Attempt:

This is incorrect! Read(A) - Write(A) is a conflicting pair, so swapping is not allowed.


Example:

step-by-step swapping process in table format to convert Non-serial Schedule (S1) → Serial
Schedule (S2) using Conflict Serializability.

Non-Serial Schedule (S1):

Swap 1: Move Read(A) (T1) above Read(A) (T2)

Swap 2: Move Read(A) (T1) above Read(B) (T2)

Swap 3: Move Read(A) (T1) above Write(B) (T2)


S1 is converted to S2 successfully by swapping its non-conflict operations, so S1 is a conflict
serializable schedule that is it will not violate the consistency of DB.

View Serializability and View Serializable Schedules

If a non-serial schedule is view equivalent to some other serial schedule then the schedule is
called View Serializable Schedule. It is needed to ensure the consistency of a schedule.

What is view equivalency?

The two conditions needed by schedules(S1 and S2) to be view equivalent are:

1. Initial read must be on the same piece of data.

Example: If transaction t1 is reading "A" from database in schedule S1, then in schedule S2, t1
must read A.

2. Update Read must be on the same piece of data.

3. Final write must be on the same piece of data.

Example: If a transaction t1 updated A at last in S1, then in S2, t1 should perform final write
as well.

Example1: Non-Serial Schedule (S1):

DataItem: A

Intial read(A): T1

Update Read(A): T2

Final Write(A): T2
DataItem: B

Intial read(A): T1

Update Read(A): T2

Final Write(A): T2

Serial Schedule (S2):

DataItem: A

Intial read(A): T1

Update Read(A): T2

Final Write(A): T2

DataItem: B

Intial read(B): T1

Update Read(B): T2

Final Write(B): T2

Example2: Non-Serial Schedule (S1):

DataItem: Q
Intial read(Q): T1

Update Read(Q): NIL

Final Write(Q): T1

Serial Schedule (S2):

DataItem: Q

Intial read(Q): T1

Update Read(Q): NIL

Final Write(Q): T2

Testing of Serializability

To test the serializability of a schedule, we can use Serialization Graph or Precedence Graph. A
serialization Graph is nothing but a Directed Graph of the entire transactions of a schedule.

It can be defined as a Graph G(V, E) consisting of a set of directed-edges E = {E1, E2, E3, ...,
En} and a set of vertices V = {V1, V2, V3, ...,Vn}. The set of edges contains one of the two
operations - READ, WRITE performed by a certain transaction.

Graph Representation:

 Vertices (V): Each transaction.


 Edges (E): Directed edges from T_i to T_j indicate T_i performs a conflicting operation
before T_j.

If there is a cycle present in the serialized graph then the schedule is non-serializable because the
cycle resembles that one transaction is dependent on the other transaction and vice versa. It also
means that there are one or more conflicting pairs of operations in the transactions. On the other
hand, no-cycle means that the non-serial schedule is serializable.

What is a conflicting pair in transactions?

Two operations inside a schedule are called conflicting if they meet these three conditions:

1. They belong to two different transactions.

2. They are working on the same data piece.

3. One of them is performing the WRITE operation.

To conclude, let’s take two operations on data: "a". The conflicting pairs are:

1. READ(a) - WRITE(a)

2. WRITE(a) - WRITE(a)

3. WRITE(a) - READ(a)

There can never be a read-read conflict as there is no change in the data.

For example:
Explanation:

Read(A): In T1, no subsequent writes to A, so no new edges


Read(B): In T2, no subsequent writes to B, so no new edges
Read(C): In T3, no subsequent writes to C, so no new edges
Write(B): B is subsequently read by T3, so add edge T2 → T3
Write(C): C is subsequently read by T1, so add edge T3 → T1
Write(A): A is subsequently read by T2, so add edge T1 → T2
Write(A): In T2, no subsequent reads to A, so no new edges
Write(C): In T1, no subsequent reads to C, so no new edges
Write(B): In T3, no subsequent reads to B, so no new edges

Precedence graph for schedule S1:

The precedence graph for schedule S1 contains a cycle that's why Schedule S1 is non-
serializable.

Example 2:
Explanation:

Read(A): In T4,no subsequent writes to A, so no new edges


Read(C): In T4, no subsequent writes to C, so no new edges
Write(A): A is subsequently read by T5, so add edge T4 → T5
Read(B): In T5,no subsequent writes to B, so no new edges
Write(C): C is subsequently read by T6, so add edge T4 → T6
Write(B): A is subsequently read by T6, so add edge T5 → T6
Write(C): In T6, no subsequent reads to C, so no new edges
Write(A): In T5, no subsequent reads to A, so no new edges
Write(B): In T6, no subsequent reads to B, so no new edges

Precedence graph for schedule S2:

The precedence graph for schedule S2 contains no cycle that's why ScheduleS2 is serializable.

Recoverability
Irrecoverable schedules/ non-recoverable schedules:

If a transaction does a dirty read operation from an uncommitted transaction and commits before
the transaction from where it has read the value, then such a schedule is called an irrecoverable
schedule.

Example

Let us consider a two transaction schedules as shown below –

The above schedule is a irrecoverable because of the reasons mentioned below

 The transaction T2 which is performing a dirty read operation on A.

 The transaction T2 is also committed before the completion of transaction T1.

 The transaction T1 fails later and there are rollbacks.

 The transaction T2 reads an incorrect value.

 Finally, the transaction T2 cannot recover because it is already committed.

Recoverable Schedules
If any transaction that performs a dirty read operation from an uncommitted transaction and also its
committed operation becomes delayed till the uncommitted transaction is either committed or
rollback such type of schedules is called as Recoverable Schedules.

Example:

Let us consider two transaction schedules as given below


The above schedule is a recoverable schedule because of the reasons mentioned below −

 The transaction T2 performs dirty read operation on A.

 The commit operation of transaction T2 is delayed until transaction T1 commits or


rollback.

 Transaction commits later.

 In the above schedule transaction T2 is now allowed to commit whereas T1 is not yet
committed.

 In this case transaction T1 is failed, and transaction T2 still has a chance to recover by
rollback.

However, not all recoverable schedules are the same. There are three types based on how they
handle rollback situations:

1. Cascading Schedule

2. Cascadeless Schedule

3. Strict Schedule

1. Cascading Schedule

A cascading schedule allows a transaction to read uncommitted data from another transaction. If
the first transaction fails and rolls back, all dependent transactions must also rollback, causing a
cascading effect.

Multiple rollbacks can happen, slowing down the system.


Example

 T2 reads uncommitted A from T1.


 T3 reads uncommitted A from T2.
 If T1 fails, T2 and T3 must also rollback.

This cascading rollback wastes time and resources.

2. Cascadeless Schedule

A cascadeless schedule ensures that a transaction only reads committed data, preventing
cascading rollbacks.

Advantage:

 No rollback propagation

 Better performance

Example
 T2 waits until T1 commits before reading A.
 No cascading rollback possible.
 Faster and more efficient.

3. Strict Schedule

A strict schedule prevents any transaction from reading OR writing an uncommitted value from
another transaction.

Stricter than cascadeless schedules, Transactions must wait until the writing transaction commits
or rolls back.

Advantage:

 No cascading rollback.

 No dirty reads (reading uncommitted data).

 No dirty writes (overwriting uncommitted data).

Example

 T2 cannot read OR write A until T1 commits.


 Ensures strong consistency.
 Prevents dirty reads and dirty writes.

Strict schedules allow concurrency but make transactions wait only when they try to read/write
uncommitted data.
They are NOT the same as sequential execution because independent transactions can still
execute in parallel.
Implementation of Isolation

Isolation is one of the ACID (Atomicity, Consistency, Isolation, Durability) properties of a


database transaction. Isolation ensures that transactions execute independently without
interfering with each other, maintaining database consistency. It prevents issues like dirty reads,
lost updates, and non-repeatable reads.

Levels of Isolation:

Isolation is divided into four stages. The ability of users to access the same data concurrently is
constrained by higher isolation. The greater the isolation degree, the more system resources are
required, and the greater the likelihood that database transactions would block one another.

o “Serializable” ensures the final result is the same as if the transactions executed one by
one (serially), but they can still run concurrently.

o Repeatable Reads all Ensures that if a transaction reads the same row twice, the value
will always be the same (until the transaction ends). Prevents dirty reads and non-
repeatable reads. But does NOT prevent phantom reads

o Read Committed Only allows reading data that has been committed. A transaction cannot
read uncommitted changes from another transaction. Eliminates dirty reads, but non-
repeatable reads and phantom reads are still possible.

o Read Uncommitted is the lowest level of isolation, allowing access to data before
modifications are performed.

Users are more prone to experience read phenomena like uncommitted dependencies, often
known as dirty reads, where data is read from a row that has been modified by another user but
has not yet been committed to the database, and the lower the isolation level.

Why is Isolation Important?

Without isolation, several issues can occur:

 Dirty Reads – Reading uncommitted changes.


 Lost Updates – One transaction overwrites another.
 Non-Repeatable Reads – Re-reading data gives different values.
 Phantom Reads – A query returns different results due to other transactions.

Techniques for Isolation:


1. Lock-Based Protocols

 Uses Shared (S) and Exclusive (X) locks.

 Two-Phase Locking (2PL): Acquires locks in growing phase and releases in shrinking
phase.

 Strict 2PL: Holds all locks until commit/rollback.

2. Timestamp-Based Protocols

 Assigns timestamps (TS) to transactions.

 A transaction is rolled back if it tries to access data in the wrong order.

3. Optimistic Concurrency Control (OCC)

 Transactions execute independently, and validation is done at commit time.

 If conflicts exist, the transaction restarts.

4. Multi-Version Concurrency Control (MVCC)

 Multiple versions of data are stored.

 Readers see old values while writers update new versions.

Each technique has its own advantages. Strict 2PL ensures isolation but can cause deadlocks,
while MVCC improves performance by allowing concurrent access.

Concurrency Control Protocols

 Multiple users can access and use the same database at one time, which is known as the
concurrent execution of the database.
 It ensures that Database transactions are performed concurrently and accurately.
 It confirms that produce correct results without violating data integrity of the respective
Database.
 Concurrency Control is the working concept that is required for controlling and managing
the concurrent execution of database operations.
 It avoiding the inconsistencies in the database.
 The concurrency control protocols ensure the atomicity, consistency, isolation,
durability, and serializability of the concurrent execution of the database transactions.

Lock Based Protocol

 It is very essential in concurrency control which controls concurrent access to a data item.

 It ensures that one transaction should not read and write record while another transaction
is performing a write operation on it.

 A lock is a data variable which is associated with a data item.

 This lock signifies that operations that can be performed on the data item.

 All lock requests are made to the concurrency-control manager.

 Transactions proceed only once the lock request is granted.

Types of Lock Based Protocol:

1. Shared Lock (S)

2. Exclusive Lock (X)

Example:

✔ In traffic light signal that indicates stop and go, when one signal is allowed to pass at a time and
other signals are locked.
✔ In the same way in a database transaction, only one transaction is performed at a time meanwhile
other transactions are locked.

Shared Lock

 It is also known as a Read-only lock.

 Shared locks can only read without performing any changes to it from the database.

 Other transactions also read the same data item but can’t update it until the read is
completed.

 Shared Locks are represented by S.

Example:

Exclusive Lock

 The data item can be both reads as well as written by the transaction.
 In this lock, multiple transactions do not modify the same data simultaneously.
 Exclusive Locks are represented by X.
Timestamp Ordering Protocol

 The Timestamp Ordering Protocol is used to order the transactions based on their
Timestamps.
 The order of transaction is nothing but the ascending order of the transaction creation.
 The priority of the older transaction is higher that's why it executes first.
 To determine the timestamp of the transaction, this protocol uses system time, logical
counter or unique value.
 The timestamp ordering protocol also maintains the timestamp of last 'read' and 'write'
operation on a data.

Timestamp ordering Protocol functions

1. TS (TI): Indicate Timestamp of Transection Ti.

Example: T1 10, T2: 20, T3:30

2. R_TS(X):

Last (latest) transition no. which perform read successfully.

Example: R_TS(A) = 30

3. W TS(X):

Last (latest) transition no. which perform write successfully.

Example: W_TS(A) = 20
Timestamp Ordering Rules: Read()

a. Check the following condition whenever a transaction Ti issues a Read (X) operation:

b. If W_TS (A) > TS (Ti) then Operation rejected & rollback. (Not Allow)

c. Otherwise execute R(A) operation. Set R_TS(A) = Last (R_TS(A), TS(Ti)). (Allow)

Timestamp Ordering Rules: Write()

1. Check the following condition whenever a transaction Ti issues a Write(X) operation:

a. If R_TS (A) > TS (Ti) then Operation rejected & rollback. (Not Allow)

b. If W_TS (A) > TS (Ti) then Operation rejected & rollback. (Not Allow)

c. Otherwise execute Write(A) operation. Set W_TS(A) = Last (TS(Ti)). (Allow)


Optimistic Concurrency Control (OCC)

Optimistic Concurrency Control (OCC) is a method used in databases to manage multiple


transactions at the same time without locking data. It assumes that most transactions don’t
conflict, so instead of blocking access, it checks for conflicts only when a transaction is about to
save its changes.

Phases of Optimistic Concurrency Control

OCC operates in three distinct phases:

1. Read Phase (Begin-Transaction Phase)

 The transaction begins and reads the required data from the database.

 A copy of the data is stored in local memory (workspace) of the transaction.

 No locks are applied to the data, so other transactions can freely access and modify the
same data.

Example:
Aravind and Ashish open the booking website.

 Both see Seat No. 10 is available (because the database still shows it as free).

 They select the seat and proceed to enter their details.

At this point, no actual booking has been made. The system allows both users to proceed without
locking the seat.
2. Validation Phase (Before Commit Phase)

 The transaction prepares to commit.

 The system checks whether any other transactions have modified the data that this
transaction has read.

 If no conflict is found, the transaction is allowed to proceed to the write phase.

 If a conflict is detected (i.e., another transaction modified the same data), the transaction
is aborted and must restart.

Example:

 Aravind and Ashish now try to confirm their bookings at the same time.

 The system checks:

o Has Seat No. 10 been booked by someone else since Aravind or Ashish last
checked?

 If no change → The booking proceeds.

 If a change is detected → One transaction fails.

Here, suppose Aravind confirms her booking first. The system now marks Seat No. 10 as booked.

3. Write Phase (Commit Phase)

 If the validation is successful, the transaction writes changes to the database.

 If validation fails, the transaction is aborted and restarted.

 Since changes are applied only in this phase, the database remains consistent.

Example:

 Since Aravind’s transaction passes validation, the system confirms her booking.

 Ashish’s transaction now reaches validation and sees Seat No. 10 is no longer available.

 Ashish’s transaction is cancelled, and he must choose another seat.

Ashish receives a message: "Sorry, the seat is no longer available. Please select another seat."

Advantages of Optimistic Concurrency Control


1. Suitable for applications with low conflict rates.
2. Reduces overhead since no locks are used.
3. Provides better performance in read-heavy workloads.

Disadvantages of Optimistic Concurrency Control

1. Transactions may restart frequently if conflicts occur often.


2. Not ideal for high-conflict environments like banking transactions.

Deadlock

Deadlock occurs when two or more transactions cannot proceed because each is waiting for the
other to release resources. This results in a cycle of dependencies where no transaction can be
completed. Deadlocks typically happen when transactions hold some resources and wait for
others, leading to a halt. DBMS uses deadlock detection and resolution techniques, like timeouts
or transaction aborts, to break the cycle and restore progress.

Types of Deadlock in DBMS

There are two types of deadlocks in Database management system such as:

 Resource Deadlocks

 Communication Deadlocks

1. Resource Deadlocks
Resource Deadlocks occur when multiple processes require access to resources that are held by
other processes, leading to a cycle of waiting.

For example, if Process A holds Resource 1 and waits for Resource 2, while Process B holds
Resource 2 and waits for Resource 1, a deadlock situation arises.

2. Communication Deadlocks

Communication Deadlocks are less common but can occur in distributed systems where
processes communicate through messages.

For example, process A waits for a signal from B, B waits for a signal from C, and C waits for a
signal from A. As each process depends on another to proceed, none can move forward, creating
a deadlock where all processes are stuck, and unable to progress.

Coffman Conditions for Deadlock

Deadlocks are characterized by four necessary conditions, commonly known as Coffman


Conditions. These conditions provide a theoretical framework for understanding deadlocks in
DBMS such as:

1. Mutual Exclusion
The Mutual Exclusion condition states that only one process can use a resource at a time. If
multiple processes are attempting to access the same resource, the resource must be locked by
one process, preventing others from accessing it at the same time.

2. Hold and Wait

The Hold and Wait condition occurs when a process is using one resource and is waiting for
additional resources held by other processes. This creates a cycle where each process is waiting
for resources that are locked by others.

[Link] Preemption

The no preemption is the condition where resources cannot be taken from a process by force. A
process can only release a resource voluntarily after it has completed its task.

4. Circular Wait

It is the condition where a set of processes are waiting for each other in a circular chain. For
example, Process A waits for Resource 1, Process B waits for Resource 2, and Process C waits
for Resource 3, but ultimately, they all depend on each other, causing a circular wait.

Deadlock Handling in DBMS

In DBMS, it is the process of managing and understanding deadlocks to prevent them from
impacting the system's performance and reliability. There are several strategies for handling
deadlocks effectively such as:

1, Deadlock Avoidance

This is a technique in a database management system (DBMS) that prevents deadlocks from
occurring by monitoring the system's state and making decisions to keep processes from getting
stuck. It's a proactive strategy that's often better than recovering from a deadlock, which can
waste time and resources.

2. Deadlock Detection

Deadlock detection is a process that identifies if any processes in a system are stuck waiting for
each other, preventing them from moving forward. This can be done by using a Wait-for Graph,
where the system monitors the relationships between processes and resources. If the graph
contains a cycle, a deadlock is detected, and necessary actions are taken.

3. Deadlock Prevention
Deadlock prevention is a method to ensure that processes do not get stuck waiting for each other
and cannot move forward. It involves establishing rules to manage resource usage so that
processes do not get into a deadlock situation.

4. Deadlock Recovery

This is the process of breaking a deadlock, which is when two or more transactions cannot
proceed because they are waiting for resources held by other transactions.

Applications of Deadlock in DBMS

Here are some applications of deadlock in database management systems such as:

 It ensures data consistency and proper execution of multiple concurrent transactions


through deadlock detection and resolution.

 Deadlocks prevent transactions from waiting indefinitely for locks. When a transaction
times out, it can be forced to release its current resources and try again later.

 A deadlock creates a cycle of dependencies where no transaction can continue.

 Deadlocks can significantly affect how well a system performs

 Allocates resources in a specific order to prevent circular waits.

Drawbacks of Deadlocks in DBMS

Here are some drawbacks of deadlocks in database management systems such as:

 Deadlocks can cause the system to stop working, which can result in a loss of revenue
and productivity for businesses that use the DBMS.

 When transactions are blocked, the resources they require remain unused, resulting in a
drop in system efficiency and wasted resources.

 Deadlocks can lead to a decrease in system concurrency, which can result in slower
transaction processing and reduced throughput.

 Resolving a deadlock can be a complex and time-consuming process that requires system
administrators to manually get involved.

 In some cases, recovery algorithms may require rolling back the state of one or more
processes, which can lead to data loss or corruption.
Wait-for Graph in Deadlock Detection

A Wait-for Graph (WFG) is a directed graph used for deadlock detection in operating systems
and relational database systems. It represents the dependencies among processes and resources
in a system, helping to identify potential deadlocks.

Construction of the Wait-for Graph

In a WFG, processes are represented as nodes, and edges indicate the waiting relationship
between processes. An edge from process Pj to Pk represents that Pj is waiting for Pk to release
a lock on a resource. If a process is waiting for more than one resource to become available,
multiple edges may represent a conjunctive (and) or disjunctive (or) set of different resources.

Detection of Deadlocks

The possibility of a deadlock is implied by graph cycles in the conjunctive case, and by knots in
the disjunctive case. A cycle in the WFG indicates that a process is waiting for another process
to release a resource, which in turn is waiting for a third process to release a resource, and so on.
This creates a circular dependency, leading to a deadlock.

Failure Classification in DBMS

In DBMS there are several transactions running in a specified schedule. However sometimes
these transactions fail due to several reasons.

Failures in DBMS are classified as follows:


1. Transaction failure

2. Underlying System crash

3. Data transfer fail / Disk fail

1. Transaction Failure

A transaction is a set of statements, if a transaction fails it means there is a statement in the


transaction which is not able to execute. This can happen due to various reasons such as:

Logical Error: If the logic used in the statement itself is wrong, it can be fail.

System Error: When the transaction is executing but due to a fault in system, the transaction
fails abruptly.

2. Underlying System Crash

The system on which the transactions are running can crash and that can result in failure of
currently running transactions.

System can crash due to various reasons such as:

 Power supply disruptions

 Software issues such as Operating system issues

 Hardware issues

3. Hard-disk fail

Hard-disk fail can also cause transaction failure. When transactions are reading and writing data
into the disk, the failure in an underlying disk can cause failure of currently running transaction.
This is because transactions are unable to read and write data in disks due to disk not working
properly. This can result in loss of data as well.

There can be several reasons of a disk failure such as: formation of bad sectors in disk, corruption
of disk, viruses, not enough resources available on disk.
Introduction to Indexing Techniques

B+ Trees

B+ Tree is a type of self-balancing tree structure commonly used in databases and file systems to
maintain sorted data in a way that allows for efficient insertion, deletion, and search operations.

Unlike binary trees, B+ trees maintain balance by keeping all leaf nodes at the same level.

The data pointers are present only at the leaf nodes on a B+ tree whereas the data pointers are
present in the internal, leaf or root nodes on a B-tree.

The leaves are not connected with each other on a B-tree whereas they are connected on a B+
tree.

Operations on a B+ tree are faster than on a B-tree.

Properties of a B+ Tree

1. All leaves are at the same level.

2. All data records are stored at leaf nodes.

3. The root has at least two children.

4. Each node except root can have a maximum of m children and at least m/2 children.

5. Each node can contain a maximum of m - 1 keys and a minimum of ⌈m/2⌉ - 1 keys.

Operations on B+ Trees

1. Search: Starts at the root and traverses down the tree, guided by the key values in each
node, until it reaches the appropriate leaf node.

2. Insert: Inserts a new key-value pair and then reorganizes the tree as needed to maintain
its properties.

3. Delete: Removes a key-value pair and then reorganizes the tree, again to maintain its
properties.
Insertion on a B+ Tree

Inserting an element into a B+ tree consists of three main events: searching the appropriate
leaf, inserting the element and balancing/splitting the tree.

Insertion Operation

Before inserting an element into a B+ tree, these properties must be kept in mind.

 The root has at least two children.

 Each node except root can have a maximum of m children and at least m/2 children.

 Each node can contain a maximum of m - 1 keys and a minimum of ⌈m/2⌉ - 1 keys.

The following steps are followed for inserting an element.

1. Since every element is inserted into the leaf node, go to the appropriate leaf node.

2. Insert the key into the leaf node.

Case I

1. If the leaf is not full, insert the key into the leaf node in increasing order.

Case II

1. If the leaf is full, insert the key into the leaf node in increasing order and balance the tree
in the following way.

2. Break the node at m/2th position.

3. Add m/2th key to the parent node as well.

4. If the parent node is already full, follow steps 2 to 3.

Insertion Example

The elements to be inserted are 5,15, 25, 35, 45.

1. Insert 5
Insert 15

Insert 25

Insert 35.

Insert 45
Deletion from a B+ Tree

Deleting an element on a B+ tree consists of three main events: searching the node where the
key to be deleted exists, deleting the key and balancing the tree if required. Underflow is a
situation when there is less number of keys in a node than the minimum number of keys it should
hold.

Deletion Operation

Before going through the steps below, one must know these facts about a B+ tree of degree m.

1. A node can have a maximum of m children. (i.e. 3)

2. A node can contain a maximum of m - 1 keys. (i.e. 2)

3. A node should have a minimum of ⌈m/2⌉ children. (i.e. 2)

4. A node (except root node) should contain a minimum of ⌈m/2⌉ - 1 keys. (i.e. 1)

While deleting a key, we have to take care of the keys present in the internal nodes (i.e. indexes)
as well because the values are redundant in a B+ tree. Search the key to be deleted then follow
the following steps.

Case I

The key to be deleted is present only at the leaf node not in the indexes (or internal nodes). There
are two cases for it:

1. There is more than the minimum number of keys in the node. Simply delete the key.
There is an exact minimum number of keys in the node. Delete the key and borrow a key from
the immediate sibling. Add the median key of the sibling node to the parent.

Case II

The key to be deleted is present in the internal nodes as well. Then we have to remove them from
the internal nodes as well. There are the following cases for this situation.

1. If there is more than the minimum number of keys in the node, simply delete the key
from the leaf node and delete the key from the internal node as well.
Fill the empty space in the internal node with the inorder successor.
If there is an exact minimum number of keys in the node, then delete the key and borrow a key
from its immediate sibling (through the parent).
Fill the empty space created in the index (internal node) with the borrowed key.

This case is similar to Case II(1) but here, empty space is generated above the immediate parent
node.
After deleting the key, merge the empty space with its sibling.
Fill the empty space in the grandparent node with the inorder successor.
Case III

In this case, the height of the tree gets shrinked. It is a little complicated. Deleting 55 from the
tree below leads to this condition. It can be understood in the illustrations below.

Searching on a B+ Tree

The following steps are followed to search for data in a B+ Tree of order m. Let the data to be
searched be k.

1. Start from the root node. Compare k with the keys at the root node [k1, k2, k3,......km -
1.

2. If k < k1, go to the left child of the root node.

3. Else if k == k1, compare k2. If k < k2, k lies between k1 and k2. So, search in the left
child of k2.

4. If k > k2, go for k3, k4,...km-1 as in steps 2 and 3.


5. Repeat the above steps until a leaf node is reached.

6. If k exists in the leaf node, return true else return false.

Searching Example on a B+ Tree

Let us search k = 45 on the following B+ tree.

Compare k with the root node.

Since k > 25, go to the right child.

Compare k with 35. Since k > 35, compare k with 45.


Since k ≥ 45, so go to the right child.

k is found.

Hash Based Indexing

Hash-Based Indexing is a technique used to locate data in a database efficiently using a hash
function.

 Instead of searching sequentially, it computes a hash value from a key and places or finds
the record at a position corresponding to that hash value.

 It is used mainly for equality searches (e.g., WHERE id = 101).


Hashing is a technique that maps data directly to its location using a hash function. Instead of
traversing indexes, hashing allows direct access to data in constant time (O(1)) by computing a
hash value.

A hash function takes an input (usually a key) and returns the address of a data block where the
corresponding record is stored.

This reduces search time and provides faster data retrieval.

Important Terminologies

1. Data Bucket:
Memory location where actual data records are stored.

2. Hash Function:
A mathematical function used to compute the address of a data bucket using the record's
key (usually the primary key).
Example: h(x) = x mod 7

3. Hash Index:
The result (address) generated by the hash function which points to the bucket.

4. Linear Probing:
If the computed bucket is full (collision occurs), linear probing checks the next
available bucket sequentially.

5. Quadratic Probing:
Instead of searching linearly, it uses a quadratic formula like i^2 to find the next
available bucket.

6. Bucket Overflow:
Occurs when a hash function maps multiple records to the same bucket (collision),
causing the bucket to exceed its capacity.

Types of Hashing in DBMS

Hashing is broadly classified into:

1. Static Hashing
2. Dynamic Hashing

1. Static Hashing in DBMS

In Static Hashing, the number of buckets is fixed in advance. The hash function will always
map keys to these fixed buckets. The hash function will always return the same bucket address
for the same key.
The number of buckets does not change even if the number of records increases or decreases.

Hash Function example:

h(x) = x % 5
Here, the modulus operator (% 5) ensures all keys are mapped to buckets 0 to 4.

Types of Static Hashing

Static Hashing is mainly divided into two types based on how it handles collisions:

i) Open Addressing

ii) Closed Addressing (Chaining)

i) Open Addressing

In Open Addressing, if the target bucket (calculated by the hash function) is already occupied,
the system searches for the next available bucket inside the hash table itself.

Techniques of Open Addressing:

a) Linear Probing

b) Quadratic Probing

c) Double Hashing (optional, often treated as a separate improvement)

a) Linear Probing (Open Addressing) in Static Hashing

Linear Probing is a collision resolution technique in Open Addressing Hashing, where if a


collision occurs (i.e., two keys hash to the same index), we linearly search the next available
empty bucket (slot) and place the key there.

Linear Probing Algorithm Steps:

1. Compute the hash address using hash function:


h(k) = k % table_size or the provided function like h(k) = 2k + 1 % table_size.
2. If the computed bucket is empty, insert the key there.

3. If the bucket is already occupied, then:

o Move to the next bucket (index + 1) % table_size.

o Keep moving linearly (one step at a time) until you find an empty bucket.

4. Wrap around to the start if you reach the end of the table (circular).

Example of Linear Probing

Hash Function:

h(k) = k % 7
Table size = 7

Keys to Insert:

50, 700, 76, 85, 92, 73

Step-by-step insertion:

Final Hash Table

b) Quadratic Probing (Open Addressing) in Static Hashing

Quadratic Probing is another collision resolution technique under Open Addressing, where
instead of moving linearly (like linear probing), we move in quadratic steps (i.e., 1², 2², 3², …)
to find the next empty slot.
Quadratic Probing Formula:

When collision occurs at h(k),


we probe in this sequence:
h(k) + 1², h(k) + 2², h(k) + 3², ... mod table_size

General formula:

Index=(h(k)+i2)%table_size\text{Index} = \left( h(k) + i^2 \right) \%


\text{table\_size}Index=(h(k)+i2)%table_size

Where i = 0, 1, 2, 3...

Example of Quadratic Probing

Hash Function:

h(k) = k % 7
Table size = 7

Keys to Insert:

50, 700, 76, 85, 92, 73

Step-by-step insertion:

Final Hash Table


c) Double Hashing (Open Addressing) in Static Hashing

Double Hashing is a collision resolution technique where two different hash functions are used:

 One for the initial hash position.

 The second for calculating the step size (gap) to jump on collision.

This helps to reduce both primary and secondary clustering problems.

Double Hashing Formula

When collision occurs, we compute:

Index=(h1(k)+i⋅h2(k))%table_size\text{Index} = \left( h_1(k) + i \cdot h_2(k) \right) \%


\text{table\_size}Index=(h1(k)+i⋅h2(k))%table_size

Where:

 h₁(k) = k % table_size ➡️ Main hash function

 h₂(k) = R - (k % R) ➡️ Step hash function (R < table size and usually a prime)

i = 0, 1, 2, 3...

Example of Double Hashing

Table Size = 7

Prime number R = 5 (smaller than 7)

Hash Functions:

 h₁(k) = k % 7

 h₂(k) = 5 - (k % 5)

Keys to Insert:

50, 700, 76, 85, 92

Step-by-step Insertion
Final Hash Table

ii) Closed Addressing (Chaining) in Static Hashing

Closed Addressing means that all elements that hash to the same index are stored together in a
list or bucket at that index (instead of probing to find another empty slot like in open addressing).
If two keys hash to the same index, just "chain" them together using a linked list or another
structure!

How it works:

 Hash Table is an array of linked lists (or other dynamic structures like arrays or trees).

 When collision occurs, we append the item to the linked list at that slot.

Steps:

1. Compute index = h(k) % table_size.

2. If slot is empty ➡️ insert directly.

3. If slot has other keys (collision) ➡️ chain the new key at the end of the linked list.

Example of Chaining
Table Size = 7

Hash Function:

h(k)=k%7h(k) = k \% 7h(k)=k%7

Keys to Insert:

50, 700, 76, 85, 92, 73

Step-by-step Insertion

Final Hash Table

Dynamic Hashing

o The dynamic hashing method is used to overcome the problems of static hashing like
bucket overflow.

o In this method, data buckets grow or shrink as the records increases or decreases. This
method is also known as Extendable hashing method.
o This method makes hashing dynamic, i.e., it allows insertion or deletion without resulting
in poor performance.

Terminology:

Global Depth (GD): Number of bits used to index the directory.

Local Depth (LD): Number of bits used to index into a specific bucket.

Directory: An array of pointers to buckets. Size = 2^GD.

Bucket: Where the actual records (keys) are stored. Each bucket has a fixed capacity.

How to search a key

o First, calculate the hash address of the key.

o Check how many bits are used in the directory, and these bits are called as i.

o Take the least significant i bits of the hash address. This gives an index of the directory.

o Now using the index, go to the directory and find bucket address where the record might
be.

How to insert a new record

o Firstly, you have to follow the same procedure for retrieval, ending up in some bucket.

o If there is still space in that bucket, then place the record in it.

o If the bucket is full, then we will split the bucket and redistribute the records.

For example:

Consider the following grouping of keys into buckets, depending on the prefix of their hash
address:
The last two bits of 2 and 4 are 00. So it will go into bucket B0. The last two bits of 5 and 6 are
01, so it will go into bucket B1. The last two bits of 1 and 3 are 10, so it will go into bucket B2.
The last two bits of 7 are 11, so it will go into B3.

Insert key 9 with hash address 10001 into the above structure:

o Since key 9 has hash address 10001, it must go into the first bucket. But bucket B1 is
full, so it will get split.

o The splitting will separate 5, 9 from 6 since last three bits of 5, 9 are 001, so it will go
into bucket B1, and the last three bits of 6 are 101, so it will go into bucket B5.

o Keys 2 and 4 are still in B0. The record in B0 pointed by the 000 and 100 entry because
last two bits of both the entry are 00.

o Keys 1 and 3 are still in B2. The record in B2 pointed by the 010 and 110 entry because
last two bits of both the entry are 10.

o Key 7 are still in B3. The record in B3 pointed by the 111 and 011 entry because last two
bits of both the entry are 11.
Advantages of dynamic hashing

o In this method, the performance does not decrease as the data grows in the system. It
simply increases the size of memory to accommodate the data.

o In this method, memory is well utilized as it grows and shrinks with the data. There will
not be any unused memory lying.

o This method is good for the dynamic database where data grows and shrinks frequently.

Disadvantages of dynamic hashing

o In this method, if the data size increases then the bucket size is also increased. These
addresses of data will be maintained in the bucket address table. This is because the data
address will keep changing as buckets grow and shrink. If there is a huge increase in data,
maintaining the bucket address table becomes tedious.

o In this case, the bucket overflow situation will also occur. But it might take little time to
reach this situation than static hashing.

You might also like