0% found this document useful (0 votes)
26 views43 pages

Understanding Database Management Systems

The document provides an overview of Data Base Management Systems (DBMS), defining data, databases, and the role of DBMS in managing and organizing data. It outlines the applications of DBMS across various sectors, its advantages and disadvantages, and the different types of database users. Additionally, it describes the components of DBMS, the three-schema architecture, and the 1-Tier architecture for implementing databases.

Uploaded by

anfilofficial3
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
26 views43 pages

Understanding Database Management Systems

The document provides an overview of Data Base Management Systems (DBMS), defining data, databases, and the role of DBMS in managing and organizing data. It outlines the applications of DBMS across various sectors, its advantages and disadvantages, and the different types of database users. Additionally, it describes the components of DBMS, the three-schema architecture, and the 1-Tier architecture for implementing databases.

Uploaded by

anfilofficial3
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd

Data Base Management System

What is Data?
Data is a collection of a distinct small unit of information. It can be used in a variety of forms like text,
numbers, media, bytes, etc. it can be stored in pieces of paper or electronic memory, etc.
What is Database
The database is a collection of inter-related data where data can be easily accessed, manage, and
update. It is also used to organize the data in the form of a table, schema, views, and reports, etc. They
support electronic storage and manipulation of data. Data base make data management easily.
For example: The college Database organizes the data about the students such as name, age, roll no,
department etc.
Using the database, you can easily retrieve, insert, and delete the information.
Database Management System
o Database management system is a software which is used to manage the database. For
example: MySQL, Oracle, etc are a very popular commercial database which is used in different
applications.
o DBMS provides an interface to perform various operations like database creation, storing data in
it, updating data, creating a table in the database and a lot more.
o It provides protection and security to the database. In the case of multiple users, it also
maintains data consistency.
Application of DBMS
Database Management Systems (DBMS) are used across various fields and applications to manage,
store, -and retrieve data efficiently. Here are some common and notable applications of DBMS:
1. Financial Services
o Banking Systems: Manages customer accounts, transactions, loans, and financial records.
Ensures data security and handles large volumes of transactional data.
2. Healthcare
o Electronic Health Records (EHR): Stores patient information, medical history, and treatment
plans, improving patient care and data accessibility.
o Hospital Management Systems: Manages patient admissions, billing, scheduling, and other
administrative tasks.
3. Education
o Student Information Systems: Manages student records, grades, course registrations, and
academic progress.
o Learning Management Systems (LMS): Stores course materials, student progress, and
assessments, facilitating online and blended learning environments.
4. E-Commerce
o Online Retail Platforms: Manages product listings, customer orders, inventory, and transactions,
providing a seamless shopping experience.
5. Government and Public Sector
o Citizen Databases: Manages records related to citizenship, taxation, and public services.
o Voting Systems: Stores and processes voting data to ensure accurate and fair elections.
6. Media and Entertainment
o Content Management Systems (CMS): Manages digital content such as articles, videos, and
images, facilitating content creation, publishing, and retrieval.
7. Travel and Hospitality
o Reservation Systems: Manages booking information for airlines, hotels, and rental services,
optimizing availability and customer experience.
DBMS allows users the following tasks:
o Data Definition: It is used for creation, modification, and removal of definition that defines the
organization of data in the database.
o Data Updation: It is used for the insertion, modification, and deletion of the actual data in the
database.
o Data Retrieval: It is used to retrieve the data from the database which can be used by
applications for various purposes.
o User Administration: It is used for registering and monitoring users, maintain data integrity,
enforcing data security, dealing with concurrency control, monitoring performance and
recovering information corrupted by unexpected failure.
Characteristics of DBMS
Traditionally data was organised in file formats, DBMS was a new concept that all the research was done
to make it overcome the deficiencies in a traditional style of data management. The modern DBMS has
the following characteristics
Less Redundancy: Data Redundancy occurs when the same piece of data is stored in multiple
places within the database. For example, storing a customer's address in multiple tables can
lead to inconsistencies if the address changes and is not updated everywhere. DBMS follows the
rules of normalization which splits a relation when any of its attributes is having redundancy in
values. Normalization is a mathematically rich and scientific process that reduces data
redundancy.
Consistency: Consistency refers to the quality of maintaining a uniform and reliable state across
various contexts. In databases, data consistency means that data is accurate and synchronized
across different systems or instances. It ensures that data is not contradictory and that updates
are reflected across all relevant locations.
Relation based tables: DBMS allows the entities(tables) and relation among them to form
tables. The user can understand the architecture of a database just by looking at the table
names
Real world entity: A modern DBMS is more realistic and use real world entities to design its
architecture. It uses the behaviour and attributes. For e.g., a college database may use the
students as an entity and their age as an attribute

Student
age

Query language: Query languages are specialized languages used to interact with databases.
They allow users to retrieve, manipulate, and manage data within a database. The most
prominent query language is SQL (Structured Query Language)
ACID properties: DBMS follows the concept of Atomicity, Consistency, Isolation and Durability.
These concepts are applied on transactions which manipulate data in the database. ACID
properties are essential for ensuring that a database maintains integrity and reliability in the
face of concurrent transactions, system failures, and other issues.
Multiuser and Con-current access : In a Database Management System (DBMS), multiuser and
concurrent access refer to the system’s ability to handle multiple users or processes accessing
the database simultaneously. These features are crucial for ensuring that a database can
support multiple users interacting with it at the same time, while maintaining data integrity and
consistency.
Multiple views: It refer to the capability to create different virtual representations or
perspectives of the same underlying data. These views are designed to simplify data access,
enhance security, and provide tailored data representations for different users or applications.
Advantages of DBMS
o Controls database redundancy: Data redundancy refers to the unnecessary duplication of data
within a database or system. It occurs when the same piece of data is stored in multiple places.
This can lead to several issues, including increased storage costs, potential inconsistencies, and
inefficiencies in data management. It can control data redundancy because it stores all the data
in one single database file and that recorded data is placed in the database.
o Data sharing: Data can be shared by authorised users of the organization among multiple users.
Many users can be authorised to access the same data simultaneously. In DBMS, the authorized
users of an organization can share the data among multiple users.
o Data Security: Data security is the protection of data base from un-authorised access, to ensure
security DBMS provides security such as giving usernames and passwords. Implements user
permissions and roles to control who can access or modify data, ensuring that sensitive
information is protected.
o Easily Maintenance: It can be easily maintainable due to the centralized nature of the database
system.
o Graphical User Interfaces (GUIs): Provides user-friendly interfaces for managing and interacting
with data, making it easier for non-technical users to perform tasks.
o Reduce time: It reduces development time and maintenance need.
o Minimized Duplication: Reduces the need for data duplication through normalization and
efficient data organization.
o Backup and Recovery: It provides backup and recovery subsystems which create automatic
backup of data from hardware and software failures and restores the data if required.
o multiple user interface: It provides different types of user interfaces like graphical user
interfaces, application program interfaces

Disadvantages of DBMS
o Cost of Hardware and Software: It requires a high speed of data processor and large memory
size to run DBMS software. High-performance databases may require expensive hardware or
cloud resources.
o Size: It occupies a large space of disks and large memory to run them efficiently.
o Complexity: Databases can be complex to set up and maintain. This includes tasks like schema
design, indexing, and managing relationships between tables.
o Higher impact of failure: Failure is highly impacted the database because in most of the
organization, all the data stored in a single database and if the database is damaged due to
electric failure or database corruption then the data may be lost forever.
o Challenges: Migrating data from one database system to another can be a complex and error-
prone process, often requiring careful planning and execution.
Users of data base
Database users can access the database and retrieve the data from the database using applications and
interfaces provided by the Database Management System (DBMS).
Database users in DBMS can be categorized based on their interaction with the databases. According to
the tasks performed by the database users on the databases, we can categorize them into seven
categories as follows:
 Database Administrators (DBA)
 Database Designers
 System Analysts
 Application Programmers / Back-End Developers
 Naive Users / Parametric Users
 Sophisticated Users
 Casual Users / Temporary Users
Database Administrators (DBA): Database Administrators are the most important type of database
users in DBMS. Database Administrator is an individual or a team of users who defines the database
schema and takes charge of controlling various levels of the database within the organization. Database
Administrators (DBAs) have full control of the database and they are sometimes known as the super-
users of the database. They work alongside developers in order to discuss and design the overall
structure of the database including layouts, functionalities, workflow, etc. Database Administrators
(DBAs) can grant or revoke authorization permission to all other users at any point of time. In order to
access the database, DBAs have to provide login credentials (account ID and password) to all other users
when required. Database Administrators (DBAs) are solely responsible for providing security to the
database by restricting unauthorized users from accessing the database.
Database Designers: As the name suggests, Database Designers are the users in DBMS who design and
create the structure of the database including triggers, indexes, schemas, entity relationships, tables,
constraints,, etc. which complete the database. Database designers try to gather information depending
upon the requirements related to the database like the layout, looks, database functioning, costing,
technologies to be used & implementation techniques, and finally, they design the final layout of the
database for programmers to code its logic. Database Designers are the type of database users in DBMS
who are responsible for implementing the overall design of the database. They decide which form of
data needs to be stored, what kind of relations exist among different entities of the database, what will
be the type of attributes, etc.
System Analysts : System Analysts are the type of database users in DBMS who analyze the
requirements of Naive / Parametric End users. It is their responsibility to check whether all the
requirements of end users are satisfied or not.
Application Programmers / Back-End Developers : Application Programmers also known as Back-End
Developers, are computer professional users who are responsible for developing the application
programs (C, C++, Java, PHP, Python, etc.) or the user interface so that other users can use these
applications to interact with the database.
Application Programmers have deep knowledge of DBMS & databases and know everything in detail.
They interact with the database using DML (Data Manipulation Language) queries to store data inside
the database and when needed, they can also fetch the data from it.
Naive Users / Parametric Users: Naive users also known as Parametric End users, don't have any
knowledge of DBMS but still frequently use the database applications to get the desired results. With
the help of the interface provided by the DBMS applications, Naive users mostly use the database to fill
in or retrieve the information (view level of the database).
Sophisticated Users: Sophisticated users are the type of database users in DBMS who know DBMS (DDL
& DML commands) and are familiar with the database. Sophisticated users can be business analysts,
engineers, scientists, system analysts, etc.
Casual Users / Temporary Users: Casual users also known as temporary users, are the type of database
users in DBMS who frequently or occasionally use the database services. Whenever these users try to
access the database, they want all the information sorted in place. Casual/Temporary users have little
knowledge about DBMS and each time they try to access the database, they require new information.
COMPONENTS OF DBMS

There are many components available in the DBMS. Each component has a significant task in the DBMS.
The components are
 Hardware
 Software
 Data
 People/Users
 Procedures
 Data Access Language
Hardware: The h/w components play a crucial role in ensuring the system runs efficiently and
effectively. The hardware is the actual computer system used for storage and retrieval of the
database. This includes computers (PCs, workstations, servers and supercomputers), storage devices
(hard disks, magnetic tapes), network devices (hubs, switches, routers) and other supporting devices
for keeping and retrieval of data.
Software: DBMS s/w acts as a bridge between the user and the database. All requests from users for
access to the database are handled by the DBMS. Software is defined as the collection of programs
that are used to instruct the computer about its work. The software consists of a set of procedures,
programs, and routines associated with the computer system's operation and performance. Also, we
can say that computer software is a set of instructions that is used to instruct the computer
hardware for the operation of the computers. DBMS s/w consist of several s/w components that
handle various tasks such as data definition, data manipulation, data security, data integrity etc.
Data: It is the most important component of DBMS environment. The database contains operational
data and the meta data. For effective storage and retrieval of information, data is organized as
fields, records, and files.
Fields: A field is the smallest unit of stored data. Each field consists of data of a specific type
Records: A record is a collection of related fields.
File: A file is a collection of all occurrences of same type of records
Users :
Procedures: The procedure is a type of general instruction or guidelines for the use of DBMS. This
instruction includes how to set up the database, how to install the database, how to log in and log
out of the database, how to manage the database, how to take a backup of the database, and how
to generate the report of the database. The main purpose of the procedure is to guide the user
during the management and operation of the database.
Data Access Language: Data Access Language (DAL) in a Database Management System refers to
the set of commands used to interact with the data stored in the database. These commands allow
users to perform various operations such as querying, updating, inserting, and deleting data.

Three-schema architecture of DBMS


The three-schema architecture of a Database Management System (DBMS) is a framework that
separates the database's structure into three levels, promoting data independence, data abstraction and
providing different views for various users.
-
1. External Schema: It is also called view level because several users can view the data from the
database which they needed. The user doesn’t need to know the database schema details such as data
structure, table definition etc.. User is only concerned about the data which is retrieved from the
database. (These are different views of the data tailored for specific users. Each view shows only the
data that is relevant to the user.) External level is the “top level “ of the schema.
2. Conceptual Schema: It is also called logical level. The whole design of the database such as
relationship among data, schema of data etc.. are described in this level. Database constraints and
security are also implemented in this level. This level is maintained by DBA. (This defines what data is in
the database and how it’s related. It provides a complete view of all the data.)
3. Internal Schema: This level is also known as physical level. This level describes how the data is actually
stored in the storage devices. This level is also responsible for allocating space to the data. This is the
lowest level. (This is how data is physically stored in the database. It includes details about files, storage
methods, and data structures.)
DBMS architecture
1 Tier Architecture : In 1-Tier Architecture the database is directly available to the user, the user can
directly sit on the DBMS and use it that is, the client, server, and Database are all present on the same
machine. Any request made by the client doesn’t require a n/w connection to perform the action on the
database.
 Simple Architecture: 1-Tier Architecture is the most simple architecture to set up, as only a
single machine is required to maintain it.
 Cost-Effective: No additional hardware is required for implementing 1-Tier Architecture, which
makes it cost-effective.
 Easy to Implement: 1-Tier Architecture can be easily deployed, and hence it is mostly used in
small projects.
2 Tier Architecture : The 2-Tier Architecture is a client-server model where the application is divided
into two layers: the client tier and the server tier. This architecture is often used in database-driven
applications and is characterized by direct communication between the client and the server. In the
client machine the user interface resides. The client application runs on the user's device and is
responsible for displaying data and capturing user input. The server side contains the database
management system (DBMS) and handles data storage, retrieval, and management. It processes
requests from the client and executes queries against the database. An advantage of this type is that
maintenance and understanding are easier, and compatible with existing systems. However, this model

gives poor performance when there are a large number of users. 2-Tier Architecture is cheaper than 3-
Tier Architecture

3 Tier Architecture: In 3-Tier Architecture , there is another layer between the client and the server.
The client does not directly communicate with the server. Instead, it interacts with an application server
which further communicates with the database system and then the query processing and transaction
management takes place. This intermediate layer acts as a medium for the exchange of partially
processed data between the server and the client. This type of architecture is used in the case of large

web applications. 3-Tier Architecture Improves Security. This type of model prevents direct interaction
of the client with the server thereby reducing access to unauthorized data.
Data Models
Data Model in DBMS is the concept of tools that are developed to summarize the description of the
database. Data Models provide us with a transparent picture of data which helps us in creating an actual
database. It shows us from the design of the data to its proper implementation of data.
1. Hierarchical Database Model: It Is a way to organize data in a tree-like structure, where each record
has a single parent and can have multiple children.
Records: Basic units of data, organized in a tree format. Each record contains fields (attributes).
Parent-Child Relationship: Each parent record can link to multiple child records, but each child has only
one parent. This creates a hierarchy.
Root Record: The top-level record in the hierarchy, from which all other records descend.
Navigation: Data retrieval involves moving from parent to child, following the hierarchy.

The tree structure is easy to understand and navigate. Quick retrieval of data due to the predictable
paths through the hierarchy. Changes to the hierarchy (like adding new relationships) can be
complicated. Only supports one-to-many relationships; many-to-many relationships are difficult to
implement.
2. Network Model: It Is a way to organize data using a graph structure, where records are connected by
links.
Records: Basic units of data, like a row in a table.
Links: Connections between records that show how they relate to each other.
Sets: Groups of records linked together, defining relationships.
You find data by following links between records. Supports many-to-many connections easily. Can
quickly navigate through linked records. Can be difficult to understand and manage. Users need to know
how the data is linked to retrieve it effectively.

3. Relational Model
The relational model is a way to organize data in a database using tables. It helps store and manage
information efficiently, making it easy to retrieve and manipulate.
Key Components
Tables: Data is organized into tables, where each table represents a different type of entity (e.g.,
students, classes).Each table consists of rows (records) and columns (attributes).
Rows: Each row in a table represents a single entry or record. For example, a row in a Students table
could represent one student.
Columns :Each column represents a specific piece of information about the entity. For example, in a
Students table, columns might include Name, Age, and Grade.
Primary Key: Each table has a primary key, which is a unique identifier for each record. For instance,
Student ID could be the primary key in the Students table.
Foreign Key: A foreign key is a column that creates a link between two tables. It references the primary
key in another table, allowing for relationships between data. For example, a Class ID in a Students table
could link to a Classes table.
Relationships: Tables can be related to each other. For example, a student can be enrolled in multiple
classes, creating a many-to-many relationship that might require a separate table, like an Enrollments
table.

4. Entity-Relationship Model( ER Model):It is a diagrammatic way to represent data and its relationships
in a database.
Key Components
Entities: Objects or things in the real world that have a distinct existence (e.g., Student, Course).
Represented as rectangles in ER diagrams.
Attributes: Characteristics or properties of entities (e.g., Student ID, Name, Course Title). Represented as
ovals connected to their respective entities.
Relationships: Connections between entities that describe how they interact (e.g., Enrollment between
Student and Course). Represented as diamonds in ER diagrams.
Cardinality: Describes the number of instances of one entity that can be associated with instances of
another entity (e.g., one-to-many, many-to-many).

5. Object-Oriented Data Model


The Object-Oriented Data Model (OODM) is a way to organize and manage data using concepts from
object-oriented programming.
Key Ideas
Objects: Think of objects as real-world things, like a car or a person. Each object has data (like color or
name) and actions (like driving or talking).
Classes: A class is like a blueprint for creating objects. For example, a "Car" class describes what
attributes (like make, model) and methods (like drive, stop) all car objects will have.
Encapsulation: This means hiding the details of how an object works. You interact with the object
through its methods without needing to know its inner workings.
Inheritance: This allows one class to inherit properties and behaviors from another. For example, if you
have a "Vehicle" class, both "Car" and "Bike" can inherit from it.
Polymorphism: This lets you use the same method name for different types of objects. For example,
both a car and a bike can have a "move" method, but they might work differently.
Relationships: Objects can be connected in various ways, like having one object include another (like a
team having players) or having objects that depend on each other.

It mirrors real-world entities and their interactions, making it easier to understand and design.
It effectively manages complex data and relationships that traditional models might struggle with.

Entity Relationship Model [ERM]


The Entity-Relationship (ER) model is a conceptual framework used for designing and representing the
structure of a database. It provides a way to visually map out the entities within a system and their
relationships, making it easier to understand and design the logical structure of a database.
Entities
 Definition: An entity represents a real-world object or concept that can have data stored about
it. In a database, entities are typically represented as tables.
 Week Entity: A weak entity in a relational database is an entity that cannot be uniquely
identified by its own attributes alone.
 Example: In a university database, entities might include Student, Course, and Instructor.
Attributes
 Definition: Attributes are properties or characteristics of an entity. They represent the data
stored about each instance of an entity.
 Example: For the Student entity, attributes might include StudentID, Name, DateOfBirth, and
Email.
Relationships
 Definition: Relationships define how entities are associated with one another. They illustrate
the connections and interactions between different entities.
 Example: In a university database, a Registration relationship might link Student entities with
Course entities to indicate which students are enrolled in which courses.

Registration
Student

Course

Keys
 Primary Key: A primary key is a unique identifier for each record in a table. It ensures that no
two rows have the same value for the primary key attribute(s) and cannot contain null values.
For example, StudentID in the Student entity.
Characteristics:
1. Uniqueness: Each value must be unique within the table.
2. Non-nullable: A primary key cannot contain null values.
3. Immutable: Ideally, the value of a primary key should not change over time.
 Foreign Key: A foreign key is an attribute (or a set of attributes) in one table that references the
primary key in another table. It establishes a link between the two tables and enforces
referential integrity.
 Candidate Key: A Candidate Key is a column or set of columns that can uniquely identify a row
in a table. A table can have multiple candidate keys, but only one is chosen as the primary key.
All candidate keys must be unique and not null.
Symbols Used in ER Model
ER Model is used to model the logical view of the system from a data perspective which consists of
these symbols:
 Rectangles: Rectangles represent Entities in the ER Model.
 Ellipses: Ellipses represent Attributes in the ER Model.
 Diamond: Diamonds represent Relationships among Entities.
 Lines: Lines represent attributes to entities and entity sets with other relationship types.
 Double Ellipse: Double Ellipses represent Multi-Valued Attributes.
 Double Rectangle: Double Rectangle represents a Weak Entity.

Type of Attributes:
1. Simple (Atomic) Attributes
Definition: Simple attributes, also known as atomic attributes, cannot be further divided into smaller
parts. They represent a single value.
Example:
 Name: A simple attribute where the value is a single string.
 Age: A simple attribute where the value is a single integer.

2. Composite Attributes
Definition: Composite attributes are attributes that can be divided into smaller sub-parts, each
representing a distinct piece of information.
Example:
 Address: A composite attribute that can be divided into Street, City, State, and ZIP Code.
 Name: A composite attribute that can be divided into First Name and Last Name.

Name

First Last
Name
3. Derived Attributes
Definition: Derived attributes are attributes whose values can be derived from other attributes or
entities in the database. They are not physically stored but are computed based on other attributes.
Example:
 Age: Can be derived from the Date of Birth attribute.
 Full Name: Can be derived from combining First Name and Last Name.
4. Multi-Valued Attributes
Definition: Multi-valued attributes are attributes that can hold multiple values for a single entity
instance. They are used when an attribute can have more than one value.
Example:
 Phone Numbers: A person might have multiple phone numbers (e.g., home, mobile, work).
 Email Addresses: A person might have multiple email addresses.
5. Single-Valued Attributes
Definition: Single-valued attributes are attributes that can hold single value for a single entity instance.
Example:
 Age: Age attribute is single-valued because each person can only have one age at a time.

Relationship
In a Database Management System (DBMS), relationships define how data in different tables or entities
interact with one another. Understanding these relationships is crucial for designing an efficient and
effective database. Here are the main types of relationships:
1. One-to-One (1:1) Relationship
 Definition: A single record in Table A is related to a single record in Table B.
 Example: A person and their passport. Each person has one passport, and each passport is
assigned to one person.

1 1
Person A Has Passport

2. One-to-Many (1:N) Relationship


 Definition: A single record in Table A can relate to multiple records in Table B, but a record in
Table B relates to only one record in Table A.
 Example: A customer and their orders. A customer can place many orders, but each order is
associated with only one customer.

1 n
Cc Places Order
Customer
3. Many-to-One (N:1) Relationship
 Definition: This is essentially the inverse of the one-to-many relationship, where multiple
records in Table A relate to a single record in Table B.
 Example: Many students belong to one college.

n 1
Cc Joined College
Students

4. Many-to-Many (M:M) Relationship


 Definition: Multiple records in Table A can relate to multiple records in Table B.
 Example: Students and courses. A student can enroll in many courses, and a course can have
many students.

n n
 Cc Joined Courses
Students

Relational algebra
Relational algebra is a formal system for manipulating relations (tables) in a database. It consists of a set
of operations that take one or two relations as input and produce a new relation as output but don’t
change the actual data in the database. It I a relational intermediate language. Here are some of the key
operations in relational algebra:
Basic Operations
1. Select (σ): Filters rows based on a specified condition.
o Syntax: σ_condition(R)
o Example: σ_age>30(Employees) retrieves all employees older than 30.
2. Project (π): Selects specific columns from a relation.
o Syntax: π_column1, column2(R)
o Example: π_name, age(Employees) retrieves the names and ages of all employees.
3. Union (∪): Combines the tuples from two relations, removing duplicates.
o Syntax: R ∪ S
o Example: EmployeesA ∪ EmployeesB combines employees from two different branches.
4. Set Intersection(∩): Set Intersection in relational algebra is the same set intersection operation
in set theory.
o Syntax : R ∩ S
o EmployeesA ∩ EmployeesB display employees common from two different
branches
5. Set Difference (−): Retrieves tuples from one relation that are not in another.
o Syntax: R − S
o Example: EmployeesA − EmployeesB retrieves employees in branch A who are not in
branch B.
6. Cartesian Product (×): Combines all tuples from two relations.
o Syntax: R × S
o Example: Employees × Departments creates combinations of all employees with all
departments.
6. Join (⨝): Combines tuples from two relations based on a related attribute.
o Syntax: R ⨝ condition S
o Example: Employees ⨝ Departments retrieves tuples where employees belong to specific
departments.
7. Rename (ρ): Changes the name of a relation or its attributes.
o Syntax: ρ_newName(R) or ρ_newAttr1, newAttr2(R)
o Example: ρ(EmpList)(Employees) renames the Employees relation to EmpList.

Relational Calculus
Relational calculus is a non-procedural query language used in database systems, particularly in the
context of relational databases. It focuses on what data to retrieve rather than how to retrieve it. There
are two main forms of relational calculus:
1 .Tuple Relational Calculus (TRC): This form specifies queries using tuples. A TRC expression typically
looks like { T | P(T) }, where T is a tuple variable and P(T) is a predicate that defines the conditions tuples
must satisfy to be included in the result.
2. Domain Relational Calculus (DRC): This form focuses on the individual fields (domains) of tuples
rather than the tuples themselves. A DRC expression might look like { (a1, a2, ..., an) | P(a1, a2, ..., an) },
where a represents values from the domains of attributes.

Attribute data types


In a Database Management System (DBMS), attribute data types define the kind of data that can be
stored in a column of a table. Here are some common attribute data types used in various DBMS:
1. Numeric Data Types:
INT: Integer values (e.g., -2147483648 to 2147483647).
SMALLINT: Smaller range of integers (e.g., -32768 to 32767).
BIGINT: Larger range of integers (e.g., -9223372036854775808 to 9223372036854775807).
FLOAT: Floating-point numbers for approximate values.
DOUBLE: Double-precision floating-point numbers.
DECIMAL/NUMERIC: Fixed-point numbers with a defined precision (e.g., DECIMAL(10, 2)).
2. String Data Types:
CHAR(n): Fixed-length character string (n characters).
VARCHAR(n): Variable-length character string (up to n characters).
TEXT: Variable-length string for larger texts.
NCHAR(n): Fixed-length Unicode character string.
NVARCHAR(n): Variable-length Unicode string.
3. Date and Time Data Types:
DATE: Stores date values (year, month, day).
TIME: Stores time values (hours, minutes, seconds).
DATETIME: Combines date and time.
4. Boolean Data Type:
BOOLEAN: Represents true/false values (some databases may use TINYINT instead).
Integrity Constraints

o Integrity constraints are a set of rules. It is used to maintain the quality of information.

o Integrity constraints ensure that the data insertion, updating, and other processes have to be
performed in such a way that data integrity is not affected.

o Thus, integrity constraint is used to guard against accidental damage to the database.

Types of Integrity Constraint

1. Domain constraints

o Domain constraints can be defined as the definition of a valid set of values for an attribute.

o The data type of domain includes string, character, integer, time, date, currency, etc. The value
of the attribute must be available in the corresponding domain.

Example:

2. Entity integrity constraints

o The entity integrity constraint states that primary key value can't be null.
o This is because the primary key value is used to identify individual rows in relation and if the
primary key has a null value, then we can't identify those rows.

o A table can contain a null value other than the primary key field.

Example:

3. Referential Integrity Constraints

o A referential integrity constraint is specified between two tables.

o In the Referential integrity constraints, if a foreign key in Table 1 refers to the Primary Key of
Table 2, then every value of the Foreign Key in Table 1 must be null or be available in Table 2.

Example:
4. Key constraints

o Keys are the entity set that is used to identify an entity within its entity set uniquely.

o An entity set can have multiple keys, but out of which one key will be the primary key. A primary
key can contain a unique and null value in the relational table.

Example:

Common Types of Integrity Constraints:


1. Primary Key Constraint:
Ensures that a column (or a combination of columns) uniquely identifies each row in a table.
No duplicate values and no NULLs are allowed.
C REATE TABLE Employees (
EmployeeID INT PRIMARY KEY,
Name VARCHAR(100)
);
2. Foreign Key Constraint:
Establishes a link between the data in two tables.
Ensures that a value in one table corresponds to a value in another table, maintaining referential
integrity.
CREATE TABLE Orders (
OrderID INT PRIMARY KEY,
EmployeeID INT,
FOREIGN KEY (EmployeeID) REFERENCES Employees(EmployeeID)
);
3. Unique Constraint:
Ensures that all values in a column (or a combination of columns) are unique across the table.
NULLs are allowed unless the column is also a primary key.
CREATE TABLE Products (
ProductID INT PRIMARY KEY,
SKU VARCHAR(50) UNIQUE
);
4. Check Constraint:
Specifies a condition that must be met for values in a column.
CREATE TABLE Employees (
EmployeeID INT PRIMARY KEY,
Age INT CHECK (Age >= 18)
);
5. Not Null Constraint:
Ensures that a column cannot have NULL values.
Useful for mandatory fields.
CREATE TABLE Customers (
CustomerID INT PRIMARY KEY,
Email VARCHAR(255) NOT NULL
);
Redundancy
Redundancy in a Database Management System (DBMS) refers to the unnecessary duplication of data
within the database. While some level of redundancy can be useful for performance optimization or for
providing backup options, excessive redundancy can lead to various problems, including data
inconsistency, increased storage requirements, and maintenance difficulties.
Causes of Redundancy
Poor Database Design: Lack of normalization can lead to tables that contain repeated data.
Improperly structured relationships between tables can also contribute to redundancy.
Data Entry Errors: Manual data entry can lead to unintentional duplication of records.
Lack of Referential Integrity: If relationships between tables are not well-defined, it can result in
redundant or orphaned records.
Effects of Redundancy
Data Inconsistency: When the same data exists in multiple places, it can become inconsistent if one copy
is updated while others are not.
Increased Storage Costs: Storing duplicate data unnecessarily consumes more disk space.
Maintenance Overhead: Keeping data consistent across multiple locations increases the complexity and
time required for updates.
Complex Queries: Queries may become more complex and less efficient as they need to account for
duplicated data.
Managing Redundancy
Normalization: Apply normalization techniques (such as First, Second, and Third Normal Forms) to
eliminate redundancy by organizing data into related tables.
Implement Referential Integrity: Use primary and foreign keys to enforce relationships between tables,
ensuring that data is stored in one place.
Data Validation Controls: Implement controls to prevent duplicate entries during data entry processes.
Regular Audits: Periodically review the database for redundancy and clean up any duplicate records.
Use of Unique Constraints: Define unique constraints on columns that should not contain duplicate
values to prevent redundancy at the data entry level.
Example of Redundancy
Before Redundancy Removal:
Imagine a Customers table that contains redundant information:
CustomerID Name Address OrderID Amount
1 John Doe 123 Main St 1001 250.00
1 John Doe 123 Main St 1002 150.00
2 Jane Smith 456 Elm St 1003 300.00
Here, John Doe’s information is duplicated for each order.
After Normalization:
By normalizing the database, you could create two tables: Customers and Orders.
Customers Table:
CustomerID Name Address
1 John Doe 123 Main St
2 Jane Smith 456 Elm St

Orders Table:
OrderID CustomerID OrderAmount
1001 1 250.00
1002 1 150.00
1003 2 300.00

Functional Dependency
The functional dependency is a relationship that exists between two attributes. It typically exists
between the primary key and non-key attribute within a table.
X → Y
The left side of FD is known as a determinant, the right side of the production is known as a dependent.

For example:
Assume we have an employee table with attributes: Emp_Id, Emp_Name, Emp_Address.
Here Emp_Id attribute can uniquely identify the Emp_Name attribute of employee table because if we
know the Emp_Id, we can tell that employee name associated with it.
Functional dependency can be written as:
Emp_Id → Emp_Name
We can say that Emp_Name is functionally dependent on Emp_Id.
Types of Functional dependency

1. Trivial functional dependency


A → B has trivial functional dependency if B is a subset of A.
The following dependencies are also trivial like: A → A, B → B
Example:
Consider a table with two columns Employee_Id and Employee_Name.
{Employee_id, Employee_Name} → Employee_Id is a trivial functional dependency as
Employee_Id is a subset of {Employee_Id, Employee_Name}.
Also, Employee_Id → Employee_Id and Employee_Name → Employee_Name are trivial dependencies
too.
2. Non-trivial functional dependency
A → B has a non-trivial functional dependency if B is not a subset of A.
When A intersection B is NULL, then A → B is called as complete non-trivial.
Example:
ID → Name,
Name → DOB
Anomaly
An anomaly refers to an inconsistency or error that can occur during data operations, particularly when
inserting, updating, or deleting data. Anomalies often arise due to poor database design, especially in
non-normalized tables, and they can lead to data integrity issues. There are three main types of
anomalies: insertion anomalies, deletion anomalies, and update anomalies.
1. Insertion Anomalies
An insertion anomaly occurs when certain data cannot be added to the database without the presence
of other data. This often happens when a table is not properly normalized.
Example: Consider a table that stores information about students and their enrolled courses:
StudentID StudentName CourseName
1 Alice Math
1 Alice Science
2 Bob Math
If a new course, "History," is introduced but no students are enrolled yet, you cannot insert this course
into the table without also inserting student information. This limitation can hinder the ability to
manage courses independently of students.
2. Deletion Anomalies
A deletion anomaly occurs when the deletion of data leads to the unintended loss of other important
data. This often arises from a lack of proper normalization.
Example: Using the same student-course table, if you delete a row corresponding to a student (e.g.,
Alice) who is enrolled in multiple courses, you may inadvertently lose information about the courses
themselves:
StudentID StudentName CourseName
1 Alice Math
2 Bob Math
If Alice is deleted from the table, the association of "Math" and "Science" with her is also lost, and the
information about those courses could be lost if they were only associated with her.
3. Update Anomalies
An update anomaly occurs when changes to a data item require multiple updates across different
records. This can lead to inconsistencies if some records are updated and others are not.
Example: If you need to change the course name from "Math" to "Mathematics" in the same table, you
would have to update every instance of "Math." If you forget to update one of the records, you will have
inconsistent data:
StudentID StudentName CourseName
1 Alice Mathematics
1 Alice Science
2 Bob Math

Features of good relational design


A good relational database design is essential for ensuring that the database operates efficiently,
maintains data integrity, and is easy to manage and extend. Here are the key features of a good
relational design:
1. Normalization
Elimination of Redundancy: A well-normalized design reduces data duplication by organizing data into
separate tables based on their relationships. This minimizes the risk of update anomalies.
Minimized Anomalies: Proper normalization helps prevent insertion, deletion, and update anomalies,
ensuring that changes to the database maintain consistency.
2. Clear Entity Identification
Distinct Entities: Each entity should be clearly defined and represented by a separate table. This helps
avoid confusion and enhances data organization.
Unique Primary Keys: Each table should have a primary key that uniquely identifies each record,
ensuring data integrity and enabling efficient access.
3. Appropriate Use of Foreign Keys
Referential Integrity: Foreign keys establish relationships between tables and enforce referential
integrity, ensuring that references between tables remain consistent.
Clear Relationships: Relationships should be clearly defined and accurately represented, making it easy
to understand how entities interact.
4. Descriptive Naming Conventions
Intuitive Names: Use clear and descriptive names for tables and columns that reflect their content,
which improves readability and maintainability.
Consistency: Maintain consistent naming conventions throughout the schema to avoid confusion.
5. Data Integrity Constraints
Enforced Constraints: Implement constraints such as NOT NULL, UNIQUE, CHECK, and foreign key
constraints to maintain data accuracy and integrity.
Business Rules: Reflect business rules through constraints, ensuring that data adheres to the required
standards.
6. Appropriate Data Types
Optimized Storage: Choose appropriate data types for each attribute to optimize storage and improve
performance (e.g., using INT for numerical values, VARCHAR for strings).
and communication.
7. Security Measures
Access Control: Implement user roles and permissions to protect sensitive data and control access based
on the principle of least privilege.
Data Encryption: Consider encrypting sensitive data to protect it from unauthorized access, both at rest
and in transit.
Normalization
Normalization is the process of organizing data to reduce redundancy and improve data integrity. It
involves structuring a relational database in a way that minimizes duplicate data and ensures that
relationships between tables are properly defined. The primary goal of normalization is to ensure that
data is stored efficiently and can be retrieved easily without anomalies. Minimize duplicate data to save
storage space and avoid inconsistencies.
First Normal Form (1NF)
A table is in First Normal Form (1NF) if it satisfies the following conditions:
Atomicity: Each column must contain atomic (indivisible) values. In other words, there should be no
repeating groups or arrays in a single column. Each entry in a column should be a single value.
Uniqueness of Columns: Each column must have a unique name, and the order in which data is stored
does not matter.

Why is 1NF important?


Eliminates Redundancy: By ensuring that each column contains only one value, 1NF helps eliminate
redundancy that arises from storing multiple values in a single column.
Improves Data Integrity: With atomic values, data is easier to manage and less prone to inconsistencies.
Facilitates Queries: Queries become simpler and more efficient when the data structure adheres to 1NF.
Example:
Let’s look at a table that is not in 1NF:
In the above table:
The "Courses" column contains multiple values (e.g., "Math, Physics") in a single cell, which violates 1NF
because the values are not atomic.
Convert to 1NF:
To convert this table into 1NF, we need to make sure that each column contains only one atomic value
per row. So, instead of listing multiple courses in a single cell, we will create separate rows for each
course.

Now, the table is in 1NF because:

Each column has atomic values (no lists or multiple values in a single column).
Each row is uniquely identified by the combination of StudentID and Course.
Key Points to Remember:
Atomic values: Each column must contain only one value per row.
No multiple entries in a single column.
1NF focuses on ensuring that the database table has a structure where each field contains a single,
indivisible value, thereby avoiding repeating groups and multi-valued attributes. This is the first step
toward organizing data in a way that is easier to manage and query.
Second Normal Form (2NF)
It is a level of normalization designed to eliminate certain types of redundancy and anomalies in
relational database design. To achieve 2NF, a database must satisfy the following two conditions:

It must be in First Normal Form (1NF):


This means that the database is free from repeating groups or arrays, and every record is uniquely
identifiable (usually via a primary key).
It must not have partial dependency:
This means that every non-key attribute must be fully dependent on the whole primary key, not just part
of it. In other words, if the primary key is composed of multiple columns (a composite key), each non-
key column should depend on the entire composite key, not just a part of it.
Example of Partial Dependency:
Suppose you have a table Orders with the following columns: OrderID, ProductID, ProductName,
QuantityOrdered.

Primary Key: (OrderID, ProductID)


If ProductName is only dependent on ProductID (not on OrderID), then we have a partial dependency.
ProductName should be removed from this table and placed in a separate table where ProductID is the
primary key.

Steps to convert to 2NF:


Ensure 1NF: Remove repeating groups, ensure each column contains atomic values, and each row is
unique.
Remove Partial Dependencies: Identify and remove any partial dependencies by breaking the table into
multiple tables and ensuring non-key attributes depend on the whole primary key.
Example:
Before 2NF (with partial dependency):

Here, the primary key is (OrderID, ProductID).


ProductName is dependent only on ProductID, not on OrderID.
After 2NF (remove partial dependency):
Split into two tables:

Now, the ProductName is no longer part of the Orders table, which eliminates the partial dependency
and brings the database into 2NF.
By following these principles, you can improve data integrity, reduce redundancy, and make the
database more efficient for queries and updates.
Third Normal Form (3NF)
Third Normal Form (3NF) is a higher level of database normalization aimed at eliminating transitive
dependencies and ensuring that data is stored in a more efficient and logically consistent way.
To achieve 3NF, a table must meet the following conditions:
It must be in Second Normal Form (2NF): This means:
The table must first satisfy 1NF (atomic values, no repeating groups).
There must be no partial dependencies (i.e., all non-key attributes must depend on the entire primary
key).
It must not have transitive dependencies: A transitive dependency occurs when one non-key attribute
depends on another non-key attribute, rather than directly depending on the primary key.
What is a Transitive Dependency?
A transitive dependency exists when:
An attribute (say, C) depends on another attribute (B), and
B depends on the primary key (A).
Thus, C depends on A via B, creating an indirect dependency. This transitive dependency means that the
table structure is not fully normalized, and it could result in unnecessary redundancy.

Example of a Transitive Dependency:


Consider a table that stores Employee details, where the primary key is EmployeeID:

Primary Key: EmployeeID


Non-key Attributes: EmployeeName, DepartmentID, DepartmentName, Manager
Here’s the issue:
The DepartmentName and Manager are dependent on DepartmentID, which in turn depends on
EmployeeID.
So, DepartmentName and Manager are transitively dependent on the primary key EmployeeID through
DepartmentID.
Why is this a problem?
If the same department is used by multiple employees, you would have to repeat the department
information (name, manager) for each employee, leading to redundant data.
If a department's name or manager changes, you'd have to update it in multiple rows, which increases
the risk of data inconsistency.
To Remove Transitive Dependencies and Achieve 3NF we can decompose the table by creating separate
tables for each entity that depends on non-key attributes.
We can break the original table into two tables:
One for Employee details.
One for Department details, since the DepartmentName and Manager depend on the DepartmentID.
Now the structure satisfies 3NF:
No transitive dependencies:
DepartmentName and Manager now depend only on DepartmentID (which is the primary key in the
Department table).
Employee depends directly on EmployeeID (the primary key of the Employee table).
Reduced redundancy: Department details are stored only once, and if a department's name or manager
changes, it only needs to be updated in one place (the Department table).
Key:
2NF requirement: The table must first be in 2NF.
Eliminate transitive dependencies: All non-key attributes must depend directly on the primary key, not on
other non-key attributes.
Create separate tables for related entities: By decomposing tables in this way, you minimize redundancy,
improve consistency, and reduce the likelihood of update anomalies.
Practical Example in 3NF:
Consider a table for storing Course Registration details:

In this table:
InstructorEmail is dependent on the Instructor, not the StudentID or CourseID.
This is a transitive dependency: InstructorEmail depends on Instructor, which depends on CourseID,
which depends on the StudentID.
Step 1: Break it into Student-Course and Instructor tables.

This ensures that 3NF is achieved, and there are no transitive dependencies. Each attribute now
depends directly on the primary key of its table.

Boyce-Codd Normal Form (BCNF)


Boyce-Codd Normal Form (BCNF) is a stricter version of Third Normal Form (3NF). It addresses specific
anomalies that 3NF does not fully resolve. The main difference between 3NF and BCNF is that BCNF
requires every determinant to be a super key, whereas 3NF allows some non-prime attributes (non-key
attributes) to determine other non-prime attributes under certain conditions.

Definition of BCNF:
A relation (table) is in Boyce-Codd Normal Form (BCNF) if, and only if, for every functional dependency
𝑋→𝑌, the set of attributes X (the determinant) must be a super key.

In 3NF, we eliminate transitive dependencies (dependencies where a non-prime attribute depends on


another non-prime attribute). However, 3NF still allows cases where a non-prime attribute determines
another non-prime attribute, which can lead to redundancy. BCNF resolves this by requiring that every
determinant must be a superkey.

BCNF Violation Example:

Primary Key: (StudentID, CourseID)


Functional Dependencies:
StudentID, CourseID → Instructor (Each course for a student is taught by a specific instructor).
Instructor → InstructorEmail (Each instructor has a unique email).
Problem with BCNF:
In the Student-Course table:
The dependency Instructor → InstructorEmail is problematic for BCNF because Instructor is not a super
key. It is not a candidate key or super-key in this table because the primary key is (StudentID, CourseID).
Instructor determines InstructorEmail, but Instructor is not part of the primary key and thus violates
BCNF.
Converting to BCNF:
To convert the table to BCNF, we decompose it to ensure that all determinants are super-keys.
In the Instructor Table, Instructor is the super-key, which means that the dependency Instructor →
InstructorEmail is now valid according to BCNF (because the determinant is a super-key).
The Student-Course Table still maintains the relationship between students and the courses they take,
and the Instructor in this table is not responsible for determining InstructorEmail, so the table is now in
BCNF.
Another Example of BCNF Violation and Fix:

Functional Dependencies:
BookID → AuthorID, AuthorName, PublisherName, PublisherPhone
AuthorID → AuthorName (Each author has a unique name)
PublisherName → PublisherPhone (Each publisher has a unique phone number)
The primary key is BookID.
Problem: The dependency PublisherName → PublisherPhone violates BCNF because PublisherName is
not a superkey (the primary key is BookID).
Decomposing to BCNF:
To bring the table into BCNF, we decompose it into two tables:
Fourth Normal Form (4NF)
Fourth Normal Form (4NF) is a level of database normalization that builds on Boyce-Codd Normal Form
(BCNF) and aims to remove multivalued dependencies. 4NF specifically addresses scenarios where a
relation contains more than one independent multi-valued fact about an entity. These situations can
lead to redundancy and anomalies when inserting, updating, or deleting data.
A relation (table) is in Fourth Normal Form (4NF) if:
It is in Boyce-Codd Normal Form (BCNF), and
It has no multivalued dependencies.(A multivalued dependency occurs when one attribute (or set of
attributes) determines two or more independent sets of attributes. In a multivalued dependency, for
each value of the determining attribute, there can be multiple values for the dependent attributes, and
these dependent attributes are independent of each other.)
Example of Multivalued Dependency:
Consider the following Student-Subject-Interest table:

Primary Key: StudentID


Multivalued Dependency:
StudentID → Subject (A student can have multiple subjects.)
StudentID → Hobby (A student can have multiple hobbies.)
Here, the multivalued dependency is that for each StudentID, there are independent lists of Subjects
and Hobbies, meaning that the subject list does not depend on the hobby list, and vice versa.
In this example, if a student has multiple subjects and hobbies, you will have to repeat the StudentID
multiple times, leading to redundancy. For instance, if we want to change a student’s Hobby (e.g., from
"Music" to "Painting"), we would need to update each row where that student appears. This redundancy
can lead to inconsistencies and anomalies.
To Bring a Table to 4NF:
To convert a table into 4NF, we need to eliminate multivalued dependencies by decomposing the table
into smaller, more focused tables, such that each table contains a single multivalued fact.

Decomposing the Example to 4NF:


We have the table:

The multivalued dependency can be resolved by decomposing it into two tables:


Now, in both tables, the dependencies are single-valued (each student has a single subject or hobby per
row), and there are no multivalued dependencies. This decomposition ensures that both tables are in
4NF because each table represents a single multivalued fact and there is no redundancy in the data.

Example of a 4NF Violation:


Consider a Project-Employee-Task table:

Primary Key: (ProjectID, EmployeeID)


There are two multivalued dependencies:
ProjectID, EmployeeID → Skill
ProjectID, EmployeeID → Task
Each project-employee combination can have multiple skills and tasks, and these are independent of
each other.
How to Bring this to 4NF:
We need to decompose the table to remove the multivalued dependencies:

You might also like