Wednesday, August 7, 2013

Data Compression of MS SQL Server

Data Compression IN SQL Server 2008


In this article I am trying to illustrate the concepts of Database Compression on SQL Server 2008. It is only the concepts behind the compression only. For details level of learning you need to follow the MSDN or any MS Notes.

Where we can do the Data Compression:

Data compression can performs only in SQL Server 2008 and not for all versions.
It takes
1.    SQL Server 2008 Enterprise Edition
2.    SQL Server 2008 Developer Editions

What is the purpose of Data Compression?

The data compression has 2 purposes
1.    It reduces the disk usages by decreasing the size of the database.
2.    It improves the I/O Performance.
However implementation of data compression takes extra CPU costs.

Type of Compressions:

SQL Server provides 2 type of data compression

1.    PAGE Compression
2.    ROW Compression

ROW compression is the lower level compression which stores the fixed character strings by using variable-length format by not storing the blank characters. NULL and 0 values across all data types are optimized and take no bytes.

PAGE compression is the higher level compression. It is as similar to table partition, index partitions. Page compression uses two types of compression.

A.    Prefix compression
B.    Dictionary compression

Prefix compression works on common values pattern across all rows on the page. It looks for common patterns in the beginning of common value on given column across all rows on the page.

Dictionary compression works on exact values match pattern across all pages. It looks for exact value matches across all the columns and rows on each page.


This simple SQL statement illustrate that the Which Objects and What Compressions is used

SELECT *
FROM sys.partitions
WHERE data_compression_desc != 'NONE'

Planning:
There is a lot of work involve while planning for compression strategy for example, Estimating the space saving, Application Workload, Workspace requirements, and mainly what to compress.
Disadvantages:
One of the biggest disadvantages of Data compression is database with ROW/PAGE compression cannot be restored, attached or used on other editions.
I think that the article is quite informative and thanking you to provide time on it

Data Compression of MS SQL Server

Introduction

The performance of MS SQL Server depends on Disk I/O Efficiency. If the data is in MS SQL server is compressed the performance increase as a fewer pages is required.
Microsoft SQL Server 2008 Enterprise Edition supports the Data Compression mechanism that we discuss in this article.

Where we can Configure it
The data compression can be configured on Table, Clustered – Index, Non Clustered – Index, Index Views and Partition of Table or on Index.

Levels of Data Compression
The Data Compression can be implemented on two levels

1.    Row
2.    Page

Row level Data Compressions
The main objective is more rows can fit into a single data page.

1.    The overhead of metadata of the records is reduced.
2.    Row-level data compression is turning fixed length data types into variable length data types, freeing up empty space.

Let's take an example to understand the point no-2

Suppose 120 are stored in an INTEGER data type. We all know that INTEGER data type takes 4 bytes to store, So the value 120 also take 4 bytes to store.
We also know that INTEGER value between 0 to 255 can store on 1 Bytes. So after compression 3 bytes release and our value 120 takes 1 byte to store.

3.    It also has the ability to ignore zero and null values, saving additional space.

Page level Data Compression
It begins with row-level data compression. With his it takes two additional compression features called prefix and dictionary compression.

Prefix compression
In this technique finds duplicate prefixes on a page for each column, and replaces each duplicate with a small reference number.

Dictionary compressionIn this technique finds duplicate values on a page, collects them into a dictionary stored after the page header but before the data rows, and replaces the duplicate values with their corresponding offsets in the dictionary.

Example of Data Compression In MS SQL Server

Step – 1 [ Create The Base Table ]

IF OBJECT_ID(N'dbo.tbl_EMPLOYEE', N'U') IS NOT NULL
   BEGIN
     DROP TABLE [dbo].[tbl_EMPLOYEE];
   END
GO
CREATE TABLE [dbo].[tbl_EMPLOYEE]
   (
     EMPID    INT,
     EMPNAME  CHAR(50)
   )
GO

Step – 2 [ Insert Some Records ]

INSERT INTO [dbo].[tbl_EMPLOYEE]
     (EMPID, EMPNAME)
VALUES(1, 'Joydeep Das');
GO 5000

Step - 3 [ Check the Space used by Original Table ]

EXEC sp_spaceused tbl_EMPLOYEE
Name
Rows
Reserved    
Data     
Index_size
Unused
tbl_EMPLOYEE
5000
200KB
152KB
8KB
40KB


Step – 4 [ Now we Used Data Compression ROW ]

ALTER TABLE [dbo].[tbl_EMPLOYEE]
REBUILD WITH (DATA_COMPRESSION = ROW);
GO

Step – 5 [ Now Again Check the Space Used After Data Compression ROW ]

EXEC sp_spaceused tbl_EMPLOYEE

Name
Rows
Reserved    
Data     
Index_size
Unused
tbl_EMPLOYEE
5000
144KB
120KB
8KB
16KB


Step – 6 [ Now we Used Data Compression PAGE ]

ALTER TABLE [dbo].[tbl_EMPLOYEE]
REBUILD WITH (DATA_COMPRESSION = PAGE);
GO

Step – 7 [ Now Again Check the Space Used After Data Compression PAGE ]

EXEC sp_spaceused tbl_EMPLOYEE

Name
Rows
Reserved    
Data     
Index_size
Unused
tbl_EMPLOYEE
5000
80KB
64KB
8KB
8KB


Step – 8 [ Now We Comare all Again ]

Original Without Compression

Name
Rows
Reserved    
Data     
Index_size
Unused
tbl_EMPLOYEE
5000
200KB
152KB
8KB
40KB

Compression ROW

Name
Rows
Reserved    
Data     
Index_size
Unused
tbl_EMPLOYEE
5000
144KB
120KB
8KB
16KB

Compression PAGE

Name
Rows
Reserved    
Data     
Index_size
Unused
tbl_EMPLOYEE
5000
80KB
64KB
8KB
8KB


Compression Type
Data
NO Compression (Original)
152KB
Compression ROW
120KB
Compression PAGE
64KB



Hope you like it.



Database Compression – Data Compression Part-1



Database Compression – Data Compression Part-1

In this article I am trying to illustrate the feature of Data Compression in SQL Server 2008. Here I am trying to explain how the Data compression occurs.
The real facts we faced that a database must grow. When the data is growing so fast it is difficult for a DBA to maintain the database in terms of performance and other thing. Thinking about a situation when a query of a table contains more than 10 million of rows or you are going to take the backup of a database with 5 TB size.
While the facts is that as DBA we can't stop the growth of SQL server database but SQL server 2008 and letter gives us some tools to help us better deal with all this data, and that is "compression".

SQL Server 2008 supports the Compression to reduce the size of the MDF and backup compressions can help us reduce the amount of space our backup takes.   
Please note that for Compression we can also look at "Red Gate SQL Storage Compress" but it is not under the scope of this article.

The Compression not only reduces physical size, it reduce disk I/O, which can enhance the performance of database.   

When we are thinks about compression we must think it in two ways.
1.    The data compressions which includes row-level and Page-level compression, that occurs with MDF files of our database.
2.    The Backup compression, which occurs only when the data is backed up.

Data compressions come into two forms:

Row-Level data Compression

It turning fixed length data type into variable length data type and freeing up empty space. It also has the ability to ignore zero and NULL values and saving additional space. It tries more rows can fit in a single data page.

Page-level Data Compression

It starts with row-level data compressions and then add two additional compression feature named Prefix and Dictionary compression.

The backup compression comes with single form:

Backup Compression

It does not use the row-level or page level data compression. Instead backup compression occurs at the time of backup and it's use its own proprietary compression technique.


The data compression can be used only for those database objects mentioned bellow.

1.    A Table Stored as Heap
2.    A Table stored as a clustered Index
3.    A Non-Clustered index
4.    An Indexed View
5.    Partitioned Table and Index

How the Row level Data compression works

Reducing the Amount of metadata used to store rows.

Storing fixed length numeric data type as if they were variable length data type. For example, if you store the value 1 in a BIGINT data type, storage will only take 1 byte, not 8 bytes, which the BIGINT data types normally takes.

 Storing CHAR data types as variable-length data types. For example, if you have a CHAR (100) data type, and only store 10 characters in it, blank characters are not stored, and thus reducing the space needed to the store data.

Not storing NULL or 0 values.

How the Page level Compression Works

It starts out by using row-level data compression to get as many rows as it can on a single page.

Next, prefix compression is run. Essentially, repeating patterns of data at the beginning of the values of a given column are removed and substituted with an abbreviated reference that is stored in the compression information (CI) structure that immediately follows the page header of a data page.

And last, dictionary compression is used. Dictionary compression searches for repeated values anywhere on a page and stores them in the CI. One of the major differences between prefix and dictionary compression is that prefix compression is restricted to one column, while dictionary compression works anywhere on a data page.

My next article I am going to explore the step by step procedure of data compressions.
Hope you like it.

Database Compression – Data Compression Part-2


In my previous article, I am explaining about the Data compression mechanisms. How the data compression Works. In this article I am illustrating about the Step By Step Procedure of Data Compression by T-SQL Statement
.
We can configure the data compression in 2 ways
1.    Configure through T-SQL                                           
2.    Configure through GUI

Estimating the Compression ratio

Sp_estimate_data_compression_saving

Returns the current size of the requested objects and estimate the object size for the requested compression state. Compression can be evaluated for whole tables or part of tables. This includes heaps, clustered indexes, nonclustered indexes, indexed views, and table and index partitions. The objects can be compressed by using row compression or page compression. If the table, index, or partition is already compressed, you can use this procedure to estimate the size of the table, index, or partition if it is recompressed.

The syntax is mentioned bellow

sp_estimate_data_compression_savings
     [ @schema_name = ] 'schema_name' 
   , [ @object_name = ] 'object_name'
   , [@index_id = ] index_id
   , [@partition_number = ] partition_number
   , [@data_compression = ] 'data_compression'
[;]
       
Return "0" for success and "1" for failure.

Example:

USE AdventureWorks2012;
GO
EXEC sp_estimate_data_compression_savings
     'Production',
     'WorkOrderRouting',
     NULL,
     NULL,
     'ROW' ;
GO



Return Result Set.
Column name
Data type
Description
object_name
sysname
Name of the table or the indexed view.
schema_name
sysname
Schema of the table or indexed view.
index_id
int
Index ID of an index:
0 = Heap
1 = Clustered index
> 1 = Nonclustered index
partition_number
int
Partition number. Returns 1 for a non partitioned table or index.
size_with_current_compression_setting (KB)
bigint
Size of the requested table, index, or partition as it currently exists.
size_with_requested_compression_setting (KB)
bigint
Estimated size of the table, index, or partition that uses the requested compression setting; and, if applicable, the existing fill factor, and assuming there is no fragmentation.
sample_size_with_current_compression_setting (KB)
bigint
Size of the sample with the current compression setting. This includes any fragmentation.
sample_size_with_requested_compression_setting (KB)
bigint
Size of the sample that is created by using the requested compression setting; and, if applicable, the existing fill factor and no fragmentation.

Configuring data compression using page-level compression

ALTER TABLE [dbo].[Compression_Test] REBUILD PARTITION = ALL
WITH
(
      DATA_COMPRESSION = PAGE -- Specify the compression type here
)

Compression effect with partitionTable

When you use data compression with partitioned tables and indexes, be aware of the following considerations:


·         When partitions are split by using the ALTER PARTITION statement, both partitions inherit the data compression attribute of the original partition.

·         When two partitions are merged, the resultant partition inherits the data compression attribute of the destination partition.

·         To switch a partition, the data compression property of the partition must match the compression property of the table.

·         There are two syntax variations that you can use to modify the compression of a partitioned table or index:

The following syntax rebuilds only the referenced partition:


ALTER TABLE <table_name>
REBUILD PARTITION = 1 WITH (DATA_COMPRESSION =  <option>)

The following syntax rebuilds the whole table by using the existing compression setting for any partitions that are not referenced:

ALTER TABLE <table_name>
REBUILD PARTITION = ALL
WITH (DATA_COMPRESSION = PAGE ON PARTITIONS(<range>),
... )

Verify the compression setting for the table

SELECT  partition_id,
        object_name(object_id) ObjectName,
        data_compression_desc Data_Compression_Type
FROM    sys.partitions
WHERE   data_compression <>0

In my next article I am discussing about GUI method…..

Hope you like it.