Tuesday, July 23, 2013

History of Databases and Database Management System

Having data is not a problem but managing data is a big problem. File System failed to manage data in many ways that is why RDBMS was born.

We will understand what are the advantage of RDBMS over File System.

Lets start from very basic, assume you are a small organization and your company is maintaining data in a excel file. What Problem you may face while maintaining data in excel file?

Problem 1 (Size)- Excel can store maximum 1048576 rows what if you want to store lot more rows?

Problem 2 (Data Modification)-You can share your data file to multiple user, but what if 200 users are accessing the file and few of them want to modify the data at same point of time.

Problem 3 (Data Accuracy)-When you enter data manually into the excel file, you may press wrong key and end up by entering wrong data.It doesn't check your data, because no data constraints.

Problem 4 (Security)-You data file contains some sensitive information and you don't to share that information among other users. what if you want to share some part of data?

Problem 5 (Data Redundancy)-You may enter same record couple of time, It will allow duplicates rows.

Problem 6 (Data Importance)-Every single row of data is very very important, in system failure you may loss your important data.System failure can corrupt your data file.

To over come from the above problems RDBMS was born.



Data Vs Information


Data:

Data is collection of raw facts or unorganized form (such as alphabets, numbers, or symbols)

Information (Collection of facts): Systematic and meaningful form of data. Information helps human beings in their decision making.




What are Flat Files?

             During earlier days of computing what we had were File Systems i.e. Flat files. Data was stored in flat files (Ex: Editors, Notepad, Text Pad etc.)




Defects:

1)No Proper Formatting of Text
2)No Search capabilities
3)Redundancy


What are Spread Sheets?

              A spreadsheet is a program that stores data in a grid of rows and columns allowing for calculations and the generation of graphs. Spreadsheets are easy to create and maintain. Some of the most commonly used are Microsoft Excel. You can define mathematical formulas and statistical calculations for individual data cells. Excel is better at a lot of things—displaying charts, showing PivotTables, displaying different types of data on the same worksheet.




Defects:

1)Redundancy.
2)Multi-User Access
3)Security


Database:

A database is an integrated collection of logically-related records or files consolidated into a common pool that provides data for one or more multiple uses.

A database is a systematic collection of information that is organized so that it can be easily searched, accessed, managed and updated.

Databases consist of software-based "containers" that are structured to collect and store information so users can retrieve, add, update or remove such information in an automatic fashion.

   A database concept can be understand and explained as a collection of information (i.e. meaningful data) or in other words it is a set of inter-related data. Database provides data storage and data retrieval functionality and makes the management of data an easy process. There is countless number of examples for usage of database.

Let’s take some examples to understand database.
  • Your bank account is stored in the database of your bank to maintain your data such as name, contact number, address, transaction history, account summary and lot more. 
  • We all use Facebook to communicate with our friends and get in touch with the outer world. Facebook also stores data of their members such as their friends, activities, messages, etc in its database. 
Shape of Database:




DBMS (Database Management System):

DBMS stands for Database Management System.A set of programs to access the interrelated the data.
DBMS is a Software system for creating, organizing and managing the database.DBMS contains information about a particular enterprise.

It provides an environment to the user to perform operations on the database for creation, Insertion, deletion, updating and retrieval of data.

It is software that enables computer to maintain data or in other words it is a collection of programs which allows its users to access database, query database to manipulate data, representation of the output data.

A Database Management System (DBMS) is a set of computer programs that controls the creation, maintenance, and the use of a database. It allows organizations to place control of database development in the hands of Database Administrations (DBA's) and other specialists.

A DBMS is a system software package that helps the use of integrated collection of data records and files known as databases. It allows different user application programs to easily access the same database.

Characteristics of DBMS are as follows :-
  • It allows many applications to share the data. 
  • It helps in controlling data redundancy. 
  • The correct, consistent, and inter relationship of data with respect to the application which uses the data leads to the maintenance of integrity of data. 
  • It supports for transaction control and recovery such as update takes place physically after a logical transaction is complete. 
  • It provides tools by which DBA can ensure about the security of the database. 
DBMS may use any of a variety of database models, such as the Hierarchical model, network model or relational model.

Examples of Database Management Systems – Microsoft Access, MySQL, Microsoft SQL Server, Oracle, File-Maker Pro and lot more.




Advantages of DBMS:

  • Controlling Data Redundancy: Data is recorded into only one place in the database and it is not duplicated.
  • Data Consistency: Data item appears only once, and the updated value is immediately available to all users.
  • Control over concurrency: In a computer-file based system in updating, one may overwrite the values recorded by the other.
  • Backup and Recovery procedures: Automatically create the backup of data and restore data if required.
  • Data Independence: Separation of data structure of database from application program that uses the data is called independence.

Disadvantages of DBMS:

  • Cost of Hardware and Software: Processor with high speed of data processing and memory of large size is required.
  • Cost of Data Conversion: Very difficult and costly method to convert data of data file into database.
  • Database Damage: All data is integrated into a single database. If database is damaged due to electrical failure or database is corrupted on the storage media, then your valuable data may be lost forever.

Evolution has started with: Flat files – Word Processors – Spreadsheets- Simple Databases – DBMS/RDBMS





Data Models:
                    A data model is a conceptual representation of the data structures that are required by a database. The data structures include the data objects, the associations between data objects, and the rules which govern operations on the objects. As the name implies, the data model focuses on what data is required and how it should be organized rather than what operations will be performed on the data. To use a common analogy, the data model is equivalent to an architect’s building plans or Engineers build a model of a car to work out any details before putting it into production.

A data model is independent of hardware or software constraints. Rather than try to represent the data as a database would see it, the data model focuses on representing the data as the user sees it in the “real world”. It serves as a bridge between the concepts that make up real-world events and processes and the physical representation of those concepts in a database.


  • Describes structure of database.
  • Aim to support the development of information systems by providing the definition and format of data.

Classification:

 High- Level Model (Entity, Attribute, Relationship)
               Ensures data requirement of the users. Not concerned with representation, but it’s a conceptual form.

 Representation Model (Hierarchical, Relational, Network)
                Represent of data stored inside a database. Describes the physical structure of database.

Low-Level Model


Different Data Models :-




Hierarchical Database Model:

  • The hierarchical model was developed by IBM in 1968.
  • The data is organize in a tree structure (Parent- Child relation ship).
  • Each box is called a ‘Node’. A nodes represent a record type.
  • Since the data is organized in a tree structure, the parent node has the links to its child nodes.
  • If we want to search a record, we have to traverse the tree from the root through all its parent nodes to reach the specific record. Thus, searching for a record is very time consuming.
  • SYSTEM2000 is an example of hierarchical database.
  • Its suited for One-to-many relation ship but difficult to implement many-to-many relation ship.






Relational Database Model (RDBMS)

                        RDBMS stands for Relational Database Management System. RDBMS is the basis for SQL, and for all modern database systems such as MS SQL Server, IBM DB2, Oracle, MySQL, and Microsoft Access.
  • The Relational Model is most common model.
  • Developed in 1970 by E.F.Codd, it became commercial in the 80’s.
  • E. F. Codd laid down 12 rules (known as Codd's 12 rules) that outline the minimum functionality of a RDBMS. A RDBMS must comply with at least 6 of the rules.
  • organizes data in the form of independent tables (consisting of rows and columns) that are related to each other. 




Who is the Father of RDBMS

A relational database management system (RDBMS) is a database management system (DBMS) that is based on the relational model as introduced by E. F. Codd. Most popular commercial and open source databases currently in use are based on the relational model.

Edgar Frank "Ted" Codd (August 23, 1923 – April 18, 2003) was a British computer scientist who, while working for IBM, invented the relational model for database management

E. F. Codd introduced the term “Relational Model” in his seminal paper "A Relational Model of Data for Large Shared Data Banks", published in 1970. In this paper and later papers he defined what he meant by relational. One well-known definition of what constitutes a relational database system is Codd's 12 rules.

A short definition of an RDBMS may be a DBMS in which data is stored in the form of tables and the relationship among the data is also stored in the form of tables.

Codd's Rule

E.F Codd was a Computer Scientist who invented Relational model for Database management. Based on relational model, Relation database was created. Codd proposed 13 rules popularly known as Codd's 12 rules to test DBMS's concept against his relational model. Codd's rule actually define what quality a DBMS requires in order to become a Relational Database Management System(RDBMS). Till now, there is hardly any commercial product that follows all the 13 Codd's rules. Even Oracle follows only eight and half out(8.5) of 13. The Codd's 12 rules are as follows.

Rule zero

This rule states that for a system to qualify as an RDBMS, it must be able to manage database entirely through the relational capabilities.

Rule 1 : Information rule

All information(including metadeta) is to be represented as stored data in cells of tables. The rows and columns have to be strictly unordered.

Rule 2 : Guaranteed Access

Each unique piece of data(atomic value) should be accessible by : Table Name + primary key(Row) + Attribute(column).

NOTE : Ability to directly access via POINTER is a violation of this rule.

Rule 3 : Systematic treatment of NULL

Null has several meanings, it can mean missing data, not applicable or no value. It should be handled consistently. Primary key must not be null. Expression on NULL must give null.
Rule 4 : Active Online Catalog

Database dictionary(catalog) must have description of Database. Catalog to be governed by same rule as rest of the database. The same query language to be used on catalog as on application database.

Rule 5 : Powerful language

One well defined language must be there to provide all manners of access to data. Example: SQL. If a file supporting table can be accessed by any manner except SQL interface, then its a violation to this rule.
Rule 6 : View Updation rule

All view that are theoretically updatable should be updatable by the system.

Rule 7 : Relational Level Operation

There must be Insert, Delete, Update operations at each level of relations. Set operation like Union, Intersection and minus should also be supported.

Rule 8 : Physical Data Independence

The physical storage of data should not matter to the system. If say, some file supporting table were renamed or moved from one disk to another, it should not effect the application.

Rule 9 : Logical Data Independence

If there is change in the logical structure(table structures) of the database the user view of data should not change. Say, if a table is split into two tables, a new view should give result as the join of the two tables. This rule is most difficult to satisfy.
Rule 10 : Integrity Independence

The database should be able to conforce its own integrity rather than using other programs. Key and Check constraints, trigger etc should be stored in Data Dictionary. This also make RDBMS independent of front-end.

Rule 11 : Distribution Independence

A database should work properly regardless of its distribution across a network. This lays foundation of distributed database.

Rule 12 : Non subversion rule

If low level access is allowed to a system it should not be able to subvert or bypass integrity rule to change data. This can be achieved by some sort of looking or encryption.


Data Integrity

Data integrity ensures quality of data. It helps keep the data unchanged and unique. Data types ensure that the data accepted by the column is restricted to the type specified. For e.g. an integer data type cannot accept strings

The following categories of the data integrity exist with each RDBMS:

Entity Integrity: There are no duplicate rows in a table.

Domain Integrity: Enforces valid entries for a given column by restricting the type, the format, or the range of values.

Referential integrity: Rows cannot be deleted, which are used by other records.

User-Defined Integrity: Enforces some specific business rules that do not fall into entity, domain or referential integrity.


Terminologies:


Data Values: Alphanumeric raw data.


Column (Field):

Every table is broken up into smaller entities called Column. A column is a vertical entity in a table that contains all information associated with a specific field in a table.


Record or Row:

A record, also called a row of data, is each individual entry that exists in a table. A record is a horizontal entity in a table.


Table:

The data in RDBMS is stored in database objects called tables. A table is a collection of related data entries and it consists of columns and rows.


Key:

Identifier (uniquely identifies a row in the table. It can be value of single or multiple column)


Users:

People who interact with the database. E.g. Application Programmers, End Users and DBA’s.


Software:

Lies between the stored data and Users. E.g. DBMS, Application Software, User Interface.


Hardware:

Physical device on which database resides. E.g.: Computers, Hard Disks


Network Database model
  • The Network model supports many-to many relations.
  • RDM (Raima Database Manager) Server is an example of a database management system that implements the network model.
  • Record relationship in the network model is implemented by using pointers.
  • Record relationship implementation is very complex since pointers are used. It supports many-to-many relationships and simplified searching of records since a record has many access paths.
  • DBTG Codasyl was the first network database.




Object-Oriented DBMS

           An object database (also object-oriented database management system) is a database management system in which information is represented in the form of objects as used in object-oriented programming.
  • The objects to be stored in the database have attributes (i.e. gender, ager) and methods that define what to do with the data.
  • PostgreSQL is an example of an object oriented relational DBMS.
  • When you integrate database capabilities with object programming language capabilities, the result is an object-oriented database management system or ODBMS. An ODBMS makes database objects appear as programming language objects in one or more existing programming languages. 


Schema:

Logical structure of the database.

Classification:
  • Physical
  • Conceptual
  • External 


Instances:

Actual data contained in database at a particular point of time.


Difference between DBMS and RDBMS:

Database management system (DBMS)

Database management system is a software used to define, record, query, update and manage data in a database. DBMS stores data in a structured way (relational, hierarchical, flat files, objects etc), and executes operations requested by various users on these data. It interacts with user, receives commands, and runs it against the database. DBMS can interact with some other applications, if required. A fourth generation query language (SQL – Structured Query Language) is used to communicate with recorded data in Database management systems.

A few application areas of DBMS are Finance, Marketing, Sales, Banking, Manufacturing, Airlines etc. Have a look on the below image which describes the DBMS:





Relational Database management system (RDBMS)

Relational database management system is a type of database management system which stores its data in tables and columns with relationships. It is based on the relational model of Edgar F. Codd well known as E. F. Codd, an English computer scientist. Below image explains the RDBMS:




Relational Database management system (RDBMS)