Wednesday, July 24, 2013

Normalization

Definition : 


     Normalization or data normalization is a process to organize the data into tabular format (database tables). A good database design includes the normalization, without normalization a database system may slow, inefficient and might not produce the expected result. Normalization reduces the data redundancy and inconsistent data dependency.

There are two goals of the normalization process: eliminating redundant data (for example, storing the same data in more than one table) and ensuring data dependencies make sense (only storing related data in a table). Both of these are worthy goals as they reduce the amount of space a database consumes and ensure that data is logically stored. There are several benefits for using Normalization in Database.

Normalization is a process of minimizing redundant data from database by decomposing the columns of a table into multiple tables. This process increases the number of tables in the database but it minimizes the redundant (duplicate), inaccurate and inconsistent data in database. Decomposed tables are connected using relationships (primary keys and foreign keys) to reduce the insert, update and delete anomalies. During normalization process, database designer decides the actual structure of the tables and their relationships. It is easy to find and fix any design problem at this early stage. During normalization process, we have various normal forms with some conditions. A table must satisfy all conditions of that normal form to qualify be in that normal form. Lets discuss what is a normal form in next sub section.

Benefits :

Eliminate data redundancy
Improve performance
Query optimization
Faster update due to less number of columns in one table
Index improvement 

Normal Forms:

We organize the data into database tables by using normal forms rules or conditions. Normal form defines a set of standard which must be followed for a good database design. During normalization process database designer converts the database tables in normal forms and check its functional dependency. Each normal form has some predefined standards which must be followed to qualify that normal form. To qualify each normal form, a table must qualify the previous normal form conditions first i.e. a table in 3NF must satisfy all conditions of 2NF which in turn must satisfy all conditions of 1NF. Generally we organize the data up to third normal form. We rarely use the fourth and fifth normal form.

Normal forms at a glance

Its time to summarize our reading. We have below image to summarize the reading on normal forms:



To understand normal forms consider the folowing unnormalized database table. Now we will normalize the data of below table using normal forms.



First Normal Form (1NF)

A database table is said to be in 1NF if it contains no repeating fields/columns. The process of converting the UNF table into 1NF is as follows:

Separate the repeating fields into new database tables along with the key from unnormalized database table.

The primary key of new database tables may be a composite key

1NF of above UNF table is as follows:

Second Normal Form (2NF)

A database table is said to be in 2NF if it is in 1NF and contains only those fields/columns that are functionally dependent(means the value of field is determined by the value of another field(s)) on the primary key. In 2NF we remove the partial dependencies of any non-key field. The process of converting the database table into 2NF is as follows:

Remove the partial dependencies(A type of functional dependency where a field is only functionally dependent on the part of primary key) of any non-key field.

If field B depends on field A and vice versa. Also for a given value of B, we have only one possible value of A and vice versa, Then we put the field B in to new database table where B will be primary key and also marked as foreign key in parent table.

2NF of above 1NF tables is as follows:

Third Normal Form (3NF) :

A database table is said to be in 3NF if it is in 2NF and all non keys fields should be dependent on primary key or We can also said a table to be in 3NF if it is in 2NF and no fields of the table is transitively functionally dependent on the primary key.The process of converting the table into 3NF is as follows:

Remove the transitive dependecies(A type of functional dependency where a field is functionally dependent on the Field that is not the primary key.Hence its value is determined, indirectly by the primary key )

Make separate table for transitive dependent Field. 

3NF of above 2NF tables is as follows: 


Boyce Code Normal Form (BCNF)

A database table is said to be in BCNF if it is in 3NF and contains each and every determinant as a candidate key.The process of converting the table into BCNF is as follows:

Remove the non trival functional dependency.

Make separate table for the determinants.

BCNF of below table is as follows:

Fourth Normal Form (4NF)

A database table is said to be in 4NF if it is in BCNF and primary key has one-to-one relationship to all non keys fields or We can also said a table to be in 4NF if it is in BCNF and contains no multi-valued dependencies.The process of converting the table into 4NF is as follows:

Remove the multivalued dependency.

Make separate table for multivalued Fields.

4NF of below table is as follows:


Fifth Normal Form (5NF)

A database table is said to be in 5NF if it is in 4NF and contains no redundant values or We can also said a table to be in 5NF if it is in 4NF and contains no join dependencies.The process of converting the table into 5NF is as follows:

Remove the join dependency.

Break the database table into smaller and smaller tables to remove all data redundancy.

5NF of below table is as follows:

Summary

In this article I try to explain the Normalization with example. I hope after reading this article you will be able to understand Normal Forms. I would like to have feedback from my blog readers. Please post your feedback, question, or comments about this article.

#############################################################################

                                                                Example - 2
#############################################################################

1st NF:

Removes repetitive groups
Create a PK



2nd NF: 

Should be in 1st NF
Remove columns which create duplicate data in a table and creates a new table with Primary Key – Foreign Key relationship





3rd NF: 

Should be in 2nd NF
Remove those non-key attributes (which are not PK) which can be derived from other non-key attributes.

Country can be derived from State also… so removing country




BCNF NF: 

Should be in 3rd NF
If PK is composed of multiple columns then all non-key attribute should be derived from FULL PK only. If some non-key attribute can be derived from partial PK then remove it




#############################################################################

                                                                Example - 3
#############################################################################

First Normal Form – 1NF

A table can be in First Normal Form (1NF), when it satisfies below conditions:
  • All columns of a table must have a single atomic value in each row of a table i.e. a cell (intersection of rows and columns) in a table must contain a single atomic value.
  • Each row of a table should have an unique identifier to uniquely identify the rows of the table i.e. no two rows of a table can be identical in 1NF. A primary key or composite key can be used to uniquely identify records of the table.
To understand the concept of First Normal Form (1NF), consider this employee table for an example:

   



To bring the employee Table in First Normal Form; 1) Organize the ProjectId and ProjectName attribute values in single atomic values and then 2) EmpId and ProjectId attributes can uniquely identify each row of the employee table.

Have a look on employee table after bringing it in First Normal Form:
  
   


Note: EmpId and ProjectId uniquely identifies each row of employee table.

Second Normal Form – 2NF

A table can be in Second Normal Form (2NF), if satisfies below conditions:
  • The table must satisfy all the conditions of 1NF.
  • All non-key attributes (columns) are dependent on key columns i.e. Each column of the table must be fully functionally dependent on key column (or set of columns). Partial dependencies must be removed from the table in case table has a composite primary key.
Note: A table with a single column primary key is automatically in 2NF and does not need to be tested for partial dependency.

In continuation of employee table as an example, to bring this table in 2NF, we have to find and remove any partial dependency of non key columns (EmpName, ProjectName, Zip and City) on key columns (EmpId and ProjectId). We can see that column ProjectName functionally depends on ProjectId but not on EmpId attribute which shows a partial dependency in the table. To bring this table in Second Normal Form, we need to break this table in two tables, employee table and project table as below:

Employee Table in 2NF:

     


Third Normal Form – 3NF

A table must satisfy below conditions to be in Third Normal Form (3NF):
  • The table must satisfy all conditions of 2NF.
  • Transitive dependency of non-key attributes on key column must be removed i.e. if column A depends on column B and column B depends on column C, column A is transitively depends on column C. Any non-key column must not dependent on another non-key column.

To bring employee table in Third Normal Form (3NF), we have to find and remove any transitive dependency from this table. We can see that attribute City depends on Zip, and attribute Zip depends on EmpId which shows a transitive dependency of city attribute on EmpId. To bring this table in 3NF, split the employee table as below:

Employee table in 3NF:

 

Boyce-Codd Normal Form

Boyce-Codd Normal Form (BCNF) is an extension of 3NF with strict condition. A table must satisfy below conditions to be in BCNF:
  • Table must be in 3NF.
  • For any functional dependency X -> Y, X should be a super key.

In employee table, in dependency EmpId -> EmpName, Zip, EmpId is the super key and in Zip – City table, in dependency Zip -> City, Zip is the super key for city.

We might need to test the above BCNF functional dependency in tables which have overlapping composite candidate keys.