This document provides an introduction to database concepts including definitions of data, information, and databases. It discusses the data processing cycle and differences between manual and computerized data processing. It also describes database users like system analysts, application programmers, and end users. Additionally, it covers database features such as redundancy control, data integrity, data sharing, and security. It discusses data abstraction, database models including hierarchical, network and relational models, as well as normalization. Other topics include database architecture, physical and logical data independence, and entity-relationship diagrams.
3. Introduction
Prof. K. Adisesha (Ph. D)
3
Definition:
Data:
Data is a collection of facts, numbers, letters or symbols that the computer process
into meaningful information.
Information:
Information is processed data, stored, or transmitted by a computer.
Database:
A Database is a collection of logically related data organized in a way that data can
be easily accessed, managed and updated.
4. Introduction
Prof. K. Adisesha (Ph. D)
4
Applications of Database:
Banking: For customer information, accounts and loans, and banking transactions.
Colleges: For student information, course registrations and grades.
Credit card transactions: For purchases on credit cards and generation of monthly
statements.
Finance: For storing information about holdings, sales and purchases of financial
instruments such as stocks and bonds.
Telecommunication: For keeping records of call made, generating monthly bills, and
storing information about the communication networks.
Voter id/Aadhaar database: This is the biggest database in the world storing a data about
60 million people residing in India.
Sales: For customer, product, and purchase information.
5. Introduction
Prof. K. Adisesha (Ph. D)
5
Difference between Manual and Computerized data processing:
Manual Data Processing Computerized Data Processing
• The volume of data, which can be
processed, is limited.
• The volume of data, which can be
processed is large
• Requires large quantity of paper • Requires less quantity of paper
• Speed and accuracy is executed is limited • Execution is Faster and Accurate
• Labor cost is high • Labor cost is low
• Storage medium is paper. • Storage medium is Hard disk etc.
6. Data processing cycle
Prof. K. Adisesha (Ph. D)
6
Data processing cycle:
The order in which information is processed in a computer information management
system is called data process cycle.
To design, use and maintain the database, Data processing cycle involves.
Data Collection
Data Input
Data Processing
Data storage
Output
Communication
7. Data processing cycle
Prof. K. Adisesha (Ph. D)
7
Data processing cycle:
To design, use and maintain the database, many peoples are involved.
Data Collection: It is the process of systematic gathering of data from various sources that has been
systematically observed, recorded and organized.
Data Input: The raw data is put into the computer using a keyboard, mouse or other devices such as
the scanner, microphone and the digital camera.
Data Processing: Processing is the series of actions or operations on the input data to generate
outputs.
Data storage: Data and information should be stored in memory so that it can be accessed later.
Output: The result obtained after processing the data must be presented to the user in user
understandable form.
Communication: Computers have communication ability in communication connections, data may be
transmitted as an e-mail or posted to the website where the online services are rendered.
8. Features of Database
Prof. K. Adisesha (Ph. D)
8
Features or advantages of Database:
Redundancy can be minimized or controlled: In DBMS environment if redundancy is
present, then it can be controlled by propagating updates in all the places where ever
redundant data is present.
Data Integrity: Data Integrity refers to the correctness of the data in the database. In
other words, the data available in the database is reliable data.
Data Sharing: In DBMS, data is stored in the centralized database and all the
permitted users can access the same piece of information required at the same time.
Database Security: DBMS provides a variety of security mechanisms for the user to
protect his or her data stored in the database.
Supports Concurrent access: DBMS supports concurrent access to the same data
stored in the database by applying locking and time stamp mechanisms.
9. Database users
Prof. K. Adisesha (Ph. D)
9
Database users:
To design, use and maintain the database, many peoples are involved.
The people who work with the database include:
System Analysts
Application programmers
Database Administrators (DBA)
End Users (Database Users)
10. Database users
Prof. K. Adisesha (Ph. D)
10
Database users:
System Analysts: System analysts determine the requirement of end users; (especially
end users), to create a solution for their business need and focus on non-technical and
technical aspects.
Application programmers: These are the computer professionals who implement the
specifications given by the system analysts and develop the application programs.
Database Administrators (DBA): DBA is a person who has central control over both
data and application. The responsibilities of DBA are authorization access, schema
definition and modification, software installation and security enforcement and
administration.
Database users: Are those who interact with the database in order to query and update
the database, and generate reports.
11. Data Abstraction
Prof. K. Adisesha (Ph. D)
11
Data Abstraction:
A major purpose of a database system is to provide users with an abstract view of the data.
That is the system hides certain details of how the data are stored and maintained.
There are three level of data abstraction.
Physical Level( Internal level)
Conceptual Level (Logical level)
View Level(External level)
12. Data Abstraction
Prof. K. Adisesha (Ph. D)
12
Data Abstraction:
Physical Level:
It is the lowest level of abstraction that describes how the data are actually stored.
The physical level describes complex low-level data structures in detail.
It contains the definition of stored record and method of representing the data fields
and access aid used.
13. Data Abstraction
Prof. K. Adisesha (Ph. D)
13
Data Abstraction:
Conceptual Level:
It is the next higher level of abstraction that describes what data are stored in the
database and what relationships exist among those data.
It also contains the method of deriving the objects in the conceptual view from the
objects in the internal view.
14. Data Abstraction
Prof. K. Adisesha (Ph. D)
14
Data Abstraction:
View Level:
It is the highest level of abstraction that describes only part of the entire database.
It also contains the method of deriving the objects in the external view from the objects
in the conceptual view.
15. DBMS Architecture
Prof. K. Adisesha (Ph. D)
15
DBMS Architecture:
The design of Database Management System highly depends on its architecture:
It can be centralized or decentralized or hierarchical.
Database architecture is logically divided into three types.
Logical one-tier in 1-tier Architecture
Logical two-tier Client/Server Architecture.
Logical three-tier Client/Server Architecture.
16. DBMS Architecture
Prof. K. Adisesha (Ph. D)
16
Logical one-tier in 1-tier Architecture:
DBMS is the only entity where user directly sits on DBMS and uses it.
Any changes done here will directly be on DBMS itself.
It does not provide handy tools for end users and preferably database designers and
programmers use single tier architecture
17. DBMS Architecture
Prof. K. Adisesha (Ph. D)
17
Logical two-tier Client/Server Architecture:
Two-tier Client / Server architecture is used for User Interface program and
Application Programs that runs on client side.
An interface called ODBC (Open Database Connectivity) provides an API that allows
client side program to call the DBMS.
Most DBMS vendors provide ODBC drivers.
A client program may connect to several DBMS's.
18. DBMS Architecture
Prof. K. Adisesha (Ph. D)
18
Logical three-tier Client/Server Architecture:
Three-tier Client / Server database architecture is commonly used architecture for web
applications. Intermediate layer called Application server or Web Server stores .
The web connectivity software and the business logic (constraints) part of application
used to access the right amount of data from the database server.
This layer acts like medium for sending partially processed data between the database
server and the client.
19. Database Model
Prof. K. Adisesha (Ph. D)
19
Database Model:
Data model is a collection of conceptual tools for describing data, data relationship, data
semantics and constraints.
Data model theory, which is a formal description of how data may be structured and used.
Data model instance, which is a practical data model designed for a particular
application.
In history of database design, three models have been in use.
Hierarchical Model
Network Model
Relational Model
20. Database Model
Prof. K. Adisesha (Ph. D)
20
Hierarchical data model:
The Hierarchical data model organizes data in a tree structure. In this data model, data is
represented by a collection of records and the relationships are represented by links.
In this model, each entity has only one parent but can have several children.
At the top of hierarchy, there is only one entity, which is called Root node.
21. Database Model
Prof. K. Adisesha (Ph. D)
21
Hierarchical data model:
Advantages:
Simplicity: The relationship between the various layers is logically simple.
Data Security: The data security is provided by the DBMS.
Data Integrity: There is always link between the parent segment and the child segment
under it.
Efficiency: It is very efficient because when the database contains a large number of one
to many relationships and when the user requires large number of transaction.
22. Database Model
Prof. K. Adisesha (Ph. D)
22
Hierarchical data model:
Disadvantages:
Implementation complexity
Database management problem
Lack of structural Independence.
Operational Anomalies
23. Database Model
Prof. K. Adisesha (Ph. D)
23
Network data model:
In 1971, the Conference on Data Systems Languages (CODASYL) formally defined the
network models. In this model, data is represented by a collection of records and the
relationships are represented by links.
Each record is collection of fields, which contains only one data value. A link is an
association between two records. In the network model, entities are organized in a graph,
in which some entities can be accessed through several paths.
24. Database Model
Prof. K. Adisesha (Ph. D)
24
Network data model:
Advantages:
It is simple and easy to implement.
It can handle many relationships within the organization.
It has better data independence compared to hierarchical model.
Disadvantages:
More complex system of database structure
Lack of structural dependence.
25. Database Model
Prof. K. Adisesha (Ph. D)
25
Relation Data Model:
E.F Codd developed the relation data model in 1970. Unlike, hierarchical and network
model, there are no physical links. All data is maintained in the form of tables consisting
of rows and columns.
Each row (record) represents an entity and a column (field) represents an attribute of the
entity.
In this model, data is organized in two-dimensional tables called relations.
The tables or relations are related to each other.
26. Database Model
Prof. K. Adisesha (Ph. D)
26
Normalization:
Normalization is a step by step process of removing the different kinds of redundancy
and anomaly one step at a time from the database.
E.F Codd developed for the relation data model in 1970.
Normalization rules are divided into following normal form:
27. Database Model
Prof. K. Adisesha (Ph. D)
27
Normalization:
Normalization is a step by step process of removing the different kinds of redundancy
and anomaly one step at a time from the database.
28. Data Independence
Prof. K. Adisesha (Ph. D)
28
Data Independence:
The capacity to change data at one layer does not affect the data at another layer is
called data independence.
Two types of data independence are
Physical Data Independence
File Organization
Data Model
Logical Data Independence
Relational Data Model
Entity Relationship
29. Data Independence
Prof. K. Adisesha (Ph. D)
29
Physical data independence :
It is the capacity to change the internal level without having to change either the
schemas at the conceptual or external level.
Changes to the internal schema may be needed because some physical files had to be
reorganized.
Physical data independence refers to the data insulation of an application from the
physical storage structure only, it is easier to achieve than logical data independence.
The physical data independence are:
File Organization
Database Architecture
Database Models
30. File organization
Prof. K. Adisesha (Ph. D)
30
File organization Methods:
The difference file organization methods are:
Serial File Organization:
Direct Access File Organization
Index sequential file organization (ISAM)
31. File organization
Prof. K. Adisesha (Ph. D)
31
File organization Methods:
The difference between serial and direct access file organization.
Serial File Organization:
Organization is continuous and simple.
Data processing, which requires the use of all records, is best suited to use this
method.
Direct Access File Organization
The type of storage device used is comparatively expensive.
It is less efficient in the usage of storage space compared to the sequential
organization.
32. File organization
Prof. K. Adisesha (Ph. D)
32
Index sequential file organization (ISAM):
The index sequential file organization is a combination of Sequential file organization
and an Index file. It is also referred as ISAM (indexed sequential access method).
Data is stored physically in adjacent storage locations and there exists a logical
relationship among the data stored by using ordering field. An additional file called as
Index file would be created, which contains n number of records.
Each record of index file has two fields:
The field is of the same data type as the ordering key field and
The second field is a pointer to a disk block (a block address).
33. E-R diagram
Prof. K. Adisesha (Ph. D)
33
Components of E-R model:
ER-Diagram is a visual representation of data that describes how data is related to each
other.
Entity:
An Entity can be any object, place, person or class.
Attribute:
An Attribute describes a property or characteristic of an entity.
Example: Roll_No, Name and Birth date can be attributes of a student
Relationship:
A relationship type is a meaningful association between entity types.
Relationship types are represented on the E-R diagram by a series of lines.
34. E-R diagram
Prof. K. Adisesha (Ph. D)
34
Different notations of E-R diagram:
ER-Diagram is a visual representation of data that describes how data is related to each
other.
Different notations of E-R diagram:
Entity: An entity is represented using rectangles.
Attribute: Attributes are represented by means of eclipses.
Relationship: Relationship is represented using diamonds shaped box.
35. E-R diagram
Prof. K. Adisesha (Ph. D)
35
Relationship:
A Relationship describes relations between entities. Relationship is represented using
diamonds shaped box.
There are three types of relationship that exist between entities:
Binary Relationship
Recursive Relationship
Ternary Relationship
36. E-R diagram
Prof. K. Adisesha (Ph. D)
36
Binary Relationship:
It means relation between two entities.
This is further divided into three types.
One to One
One to Many
Many to Many
One to One:
This type of relationship is rarely seen in real world.
The above example describes that one student can enroll only for one course and a
course will have only one Student. This is not what you will usually see in relationship.
37. E-R diagram
Prof. K. Adisesha (Ph. D)
37
Binary Relationship:
One to Many:
It reflects business rule that one entity is associated with many number of same
entity.
For example, Student enrolls for only one Course but a Course can have many
Students.
Many to Many:
It reflects business rule that many entity are associated with many number of same
entity.
The above diagram represents that many students can enroll for more than one
course.
38. Relational Keys
Prof. K. Adisesha (Ph. D)
38
Keys used in database:
The different types of keys are:
Primary key:
It is a field in a table which uniquely identifies each row/record in a database table.
Primary keys must contain unique values.
A primary key column cannot have NULL values.
Ex: In Relation STUDENT, Regno serves as a primary key.
Candidate Key:
When more than one or group of attributes serve as a unique identifier, they are
each called as candidate key.
39. Relational Keys
Prof. K. Adisesha (Ph. D)
39
Keys used in database:
The different types of keys are:
Alternate Key:
The alternate key of any table are those candidate keys, which are not currently
selected as the primary key. This is also known as secondary key.
Foreign key:
A key used to link two tables together is called a foreign key, also called as
referencing key.
Foreign key is a field that matches the primary key column of another table.
40. Generalization
Prof. K. Adisesha (Ph. D)
40
Generalization:
Generalization is a bottom-up approach in which two lower level entities combine to
form a higher level entity.
In generalization, a number of entities are brought together into one generalized entity
based on their similar characteristics.
For example, Student and Parent details can all be generalized as a group ‘Person’ as
Personal details.
41. Specialization
Prof. K. Adisesha (Ph. D)
41
Specialization:
Specialization is a Top-down approach in which one higher level entity can be broken
down into two lower level entities.
Specialization is the opposite of generalization.
In specialization, a group of entities is divided into sub-groups based on their
characteristics.
Take a group ‘Person’ for example. A person has name, date of birth, gender, etc.
Similarly, in a school database, persons can be specialized as teacher, student, or a staff,
based on what role they play in school as entities.
42. Relation Algebra
Prof. K. Adisesha (Ph. D)
42
Relation Algebra:
Relational algebra is a procedural query language that consists of a set of operations
that take one or more relations as input and result into a new relation as an output.
The relational algebraic operations can be divided into:
Basic set-oriented operations:
Union, Set different, Cartesian product
Relational-oriented operations:
Selection, Projection, Division, Joins
43. Data Warehouse
Prof. K. Adisesha (Ph. D)
43
Data Warehouse:
A data warehouse is a repository of an organization's electronically stored data.
Data warehouse are designed to facilitate reporting and supporting data analysis.
The concept of data warehouses was introduced in late 1980's.
The components of data warehouse are:
Data Source
Data Transformation
Reporting
Metadata
Additional components are Dependent data marts, Logical Data marts,
Operational Data store.
44. Data Mining
Prof. K. Adisesha (Ph. D)
44
Data Mining:
Data mining is a process of discovering patterns in large data sets involving
methods at the intersection of machine learning, statistics, and database systems.
Data mining is the process of finding anomalies, patterns and correlations within
large data sets to predict outcomes.
Data mining allows you to:
Sift through all the chaotic and repetitive noise in your data.
Understand what is relevant and then make good use of
that information to assess likely outcomes.
Accelerate the pace of making informed decisions.
45. Questions
Important Questions:
Define the following database terms:
a. Data Model b. Tuple c. Domain d. Primary key e. Foreign key
Write the difference between manual and electronic data processing.
Explain any five applications of database.
Briefly explain the data processing cycle.
Write the difference between Hierarchical data model and network data model.
What is normalization? Explain second normal form with an example.
What is database model? Explain Hierarchical model.
Explain 3-level DBMS architecture.
What is data warehouse? Briefly explain its components.