Lecture#02
Advance Database Systems
Abdul Rafio Soomro
The Shaikh Ayaz University Shikarpur
Department of Computer Sciences
abdulrafio33@[Link]
Chapter 1
Introduction to Databases
Objective
• Introduction to database system
• Why need databases
• History of database
• Types of databases
• Database user
• DBMS
Why Study Databases?
• Databases are useful • Databases in CS
– Many computing – Databases are a ‘core
applications deal topic’ in computer
with large amounts science
of information – Basic concepts and
– Database systems skills with database
give a set of tools for systems are part of
storing, searching the skill set you will
and managing this be assumed to have
information as a CS graduate
What is a Database?
• “A set of information held in a computer”
Oxford English Dictionary
• “One or more large structured sets of
persistent data, usually associated with
software to update and query the data”
Free On-Line Dictionary of Computing
• “A collection of data arranged for ease and
speed of search and retrieval”
[Link]
Database
• Definition
– A collection of self-describing and integrated data
files
• System catalog
– Meta data
– Data dictionary
• Data abstraction
Databases
• Web indexes • Train timetables
• Library catalogues • Airline bookings
• Medical records • Credit card details
• Bank accounts • Student records
• Stock control • Customer histories
• Personnel systems • Stock market prices
• Product catalogues • Discussion boards
• Telephone directories • and so on…
File-Based Systems
• Early attempt to Computerize the manual
filing system
• Collection of application programs that
perform services for the end users (e.g.
reports).
• Each program defines and manages its own
data.
Manual Filing Systems
• Works well
– while number of items to be stored is small
– For only storage or retrieval functionality of large
number of items
File-Based Processing
Limitations of File-Based
Approach
Separation and isolation of data
Each program maintains its own set of data.
Users of one program may be unaware of potentially
useful data held by other programs.
For example, if we want to produce a list of all houses
that match the requirements of the clients.
Duplication of data
Decentralized approach taken by each department.
Same data is held by different programs.
Wasted space and potentially different values and/or
different formats for the same item.
Limitations of File-Based
Approach..
Data dependence
File structure is defined in the program code.
Incompatible file formats
Programs are written in different languages, and so cannot
easily access each other’s files.
FixedQueries/Proliferation of application
programs
Programs are written to satisfy particular functions.
Any new requirement needs a new program.
Database Approach
Arose because:
Definition of data was embedded in application programs,
rather than being stored separately and independently.
No control over access and manipulation of data beyond
that imposed by application programs.
Result:
the database and Database Management System (DBMS).
History of Database
Systems
Roots of the DBMS
Apollo moon-landing project, 1960s
NAA (North American Aviation), prime
contractor for the project
Developed a software GUAM (Generalized
Update Access Method), hierarchical
In mid – 1960s IBM joined NAA, result was
IMS(Information Management System)
History of Database
Systems..
IDS ( Integrated Data Store)
By General Electric, network, mid-1960
CODASYL ( Conference on Data Systems
Languages)
DBTG (Data Base Task Group)
History of Database
Systems..
DBTG proposal in 1971, components
The network schema: the logical
organization of the entire database as seen
by the DBA – which includes a definition of
the database name, the type of each record, and
the components of each record type.
The subschema: the part of the database as
seen by the user or application program;
A data management language to define the
data characteristics and the data structure, and
to manipulate the data.
History of Database
Systems..
DBTG specified three languages
A schema Data Definition Language (DDL),
which enables the DBA to define the schema.
A subschema DDL, which allows the
application programs to define the parts of
the database they require.
A Data Manipulation Language (DML), to
manipulate the data.
History of Database
Systems..
E. F. Codd, 1970
IBM Research Laboratory
Relational model
System R project by IBM’S San Jose
Research Laboratory California
Result of this project
Development of SQL
Commercial relational DBMS products e.g. DB2,
SQL/DS from IBM, Oracle from Oracle Corp.
History of Database Systems
• First generation
– Hierarchical model
• Information Management System (IMS)
– Network model
• Conference on Data System Languages (CODASYL)
• Data Base Task Group (DBTG)
– Limitation
• Complex program for simple query
• Minimum data independence
• No theoretical foundation
• Second generation
– Relational model
• E. R. Codd
• DB2, Oracle
– Limitation
• Limited data modeling
• Third generation
– Object-relational DBMS
– Object-oriented DBMS
Evolution of Databases
History of Database Systems
• File based systems
– File based systems came in 1960s and was widely used. It stores information and organize it into storage devices like a hard disk,
a CD-ROM, USB, SSD, floppy disk, etc.
• Relational Model
– Relational Model introduced by [Link] in 1969. The model stated that data will be represented in tuples. A relational model
groups data into one or more tables. These tables are related to each other using common records.
• Dbase
– Database like Dbase went on sale in 1980s. It was one of the first database management systems for microcomputers. Cecil
Wayne Ratliff developed it.
• Centralized DBMS and Data Warehousing
– In 1990s, centralized DBMS server was used. The period also witnessed the introduction of MS-Access. In addition, users worked
on Internet and data warehousing introduced.
• NoSQL
– NoSQL, Big Data came in 2008. Big Data described large value of both the structured and unstructured data. This data is so large
that traditional database cannot process it.
• Hadoop
– Hadoop and MongoDB launched in 2009. Hadoop use distributed file system for storing big data, and MapReduce to process it.
Hadoop excels in storing and processing of huge data of various formats such as arbitrary, semi-, unstructured, etc. MongoDB is
a cross-platform, document oriented database that provides, high performance, high availability, and easy scalability. It works
works on the concept of collection and document.
• Hbase
– It introduced in 2010 and is a database built on top of the HDFS. HBase provides fast lookups for larger tables.
Database Systems
• A database system • Database systems
consists of allow users to
– Data (the database) – Store
– Software – Update
– Hardware – Retrieve
– Users – Organise
• We focus mainly on – Protect
the software their data.
Database Users
• End users Data Administrator (DA)
- Database planning
– Use the database - Development and
system to achieve maintenance of
standards, policies and
some goal procedures
• Application • Database Administrator
(DBA)
developers – Designs & manages the
database system
– Write software to
• Database systems
allow end users to programmer
interface with the – Writes the database
database system software itself
Database Management Systems
• A database is a • Examples:
collection of – Oracle
information – DB2 (IBM)
• A database – MS SQL Server
management system
(DBMS) is the – MS Access
software that controls – Ingres
information – PostgreSQL
• Used to create, – MySQL
maintain, and access – OpenOffice Base
databases – Corel Paradox
What the DBMS does?
• Provides users with • DBMS provides
– Data definition language
(DDL) Permits specification – Concurrency
of data types, structures – Integrity
and any data constraints.
– Security
– Data manipulation – Data independence
language (DML) General
enquiry facility (query – Backup & recovery
language) of the data system
– Data control language
(DCL) • Data Dictionary
• Often these are all the – Describes the database
same language itself
Views
Allows each user to have his or her own
view of the database.
A view
is essentially some subset of the
database.
Views - Benefits
Reduce complexity
Provide a level of security
Provide a mechanism to customize the
appearance of the database
Present a consistent, unchanging
picture of the structure of the
database, even if the underlying
database is changed
Components of DBMS
Environment
• Hardware
– Client-server architecture
– Can range from a PC to a network of computers
• Software
– dbms, os, network, application
• Data
– Schema, subschema, table, attribute
• People
– Data administrator & database administrator
– Database designer: logical & physical
– Application programmer
– End-user: naive & sophisticated
• Procedure
– Start, stop, log on, log off, back up, recovery
Advantages of DBMS
• Control redundancy
• Consistency
• Integrity
• Security
• Concurrency control
• Backup & recovery
• Data standard
• More information
• Data sharing & conflict control
• Productivity & accessibility
• Economy of scale
• Maintenance
Limitations of DBMS
• Complexity
• Size
• Cost
– Software
– Hardware
– Conversion
• Performance
• Vulnerability
Data Dictionary - Metadata
• The dictionary or • The dictionary holds
catalogue stores – Descriptions of
information about database objects
the database itself (tables, users, rules,
views, indexes,…)
• This is data about
– Information about
data or ‘metadata’
who is using which
• Almost every aspect data (locks)
of the DBMS uses – Schemas and
the dictionary mappings
File Based Systems
• File based systems • Problems:
– Data is stored in files – No standards
– Each file has a – Data duplication
specific format – Data dependence
– Programs that use – No way to generate
these files depend on ad hoc queries
knowledge about – No provision for
that format security, recovery,
concurrency, etc.
Relational Systems
• Problems with early • Then, in 1970, E. F.
databases Codd wrote “A
– Navigating the Relational Model of
records requires Data for Large
complex programs Shared Databanks”
– There is minimal data and introduced the
independence relational model
– No theoretical
foundations
Relational Systems
• Information is stored • The relational
as tuples or records model covers 3
in relations or tables areas:
• There is a sound – Data structure
mathematical theory
– Data integrity
of relations
– Data manipulation
• Most modern DBMS
are based on the
relational model
DBMS vs File System
• There are following differences between
DBMS and File system: