Tuesday, August 6, 2013

Column Store Index in MS SQL 2012 / 2014

Column Store Index in MS SQL 2012 / 2014

Introduction

Column store index is the unique feature of Microsoft SQL Server starting from Microsoft SQL 2012 and improving with Microsoft SQL Server 2014. Here in this article I am going to discuss about the architecture of Column store Index.
Please note that this Column store Index architecture is suitable only for OLAP or BI environment not in OTLP environment, others the performance down.

Architecture of Column Store Index
Before going to the depth of the architecture of column store index in MS SQL 2012, we assume that the we all well known about the row based index and how it load in the memory to fulfill the request of a query.

First of all we look at the figure to understand the column store index properly



Segment Concept:
Each column in the column store index is it own segment. A Segment contains the value from one column only. Each and every column in a table that include in the column store index forms individual segment. Which allows the column data access individually. Each segment can from by multiple pages.

A segment has the limitation of 1 millions of rows. So for a big table it is not be possible to fit single entire columns within a single segment, so single column of a table may be fit with multiple segments.

Data is transferred from the disk to memory by segment, not by page. A segment is a highly compressed Large Object (LOB) that can contain up to one million rows. The data within each column’s segment matches row-by-row so that the rows can always be assembled correctly.

In our example the disk data came from three pages that is Page -1, Page-2 and Page-3. If we crate column store index in the column named “Name” then the segment is created from Page-1, Page-2 and Page-3.

Group Concept:
When a column is not fit in a single segment as the data of columns exceed the one million then columns takes multiple segments to store data. In such cases, multiple segments are created for each column and grouped into multiple row groups, one for each set of segments.

When a column store index is broken into multiple row groups, each row group contains a set of complete rows.  Please look at the bellow figure to understand it properly.



Dictionary Concept:
Besides column segments a column store index consists of another data storage element: dictionaries. Dictionaries are widely used in columnar storage as a means to efficiently encode large data types, like strings. The values stores in the column segments will be just entry numbers in the dictionary, and the actual values are stored in the dictionary. This technique can yield very good compression for repeated values, but yields bad results if the values are all distinct (the required storage actually increases). This is what makes large columns (strings) with distinct values very poor candidates for column store indexes. Column store indexes contain separate dictionaries for each column and string columns contain two types of dictionaries.

Primary Dictionary
This is a global dictionary used by all segments of a column.
Secondary Dictionary
This is an overflow dictionary for entries that did not fit in the primary dictionaries. It can be shared by several segments of a column: the relation between dictionaries and column segments is one-to-many.

Data Compression:
Column store indexes are based on xVelocity (formerly known as VertiPaq), an advanced storage and compression technology that originated with Power Pivot and Analysis Services but has been adapted to SQL Server 2012 databases.
New feature of Column Store in SQL 2014
  • We can create only one non-clustered column store index on a table although we can include all the columns of the table in the single index.

  • SQL Server 2014 enhances it to add support for creating Clustered Column store Index.
  • When we create a column store index and it makes the table read only.

  • With SQL Server 2014, we can create a column store index without having much impact on write-ability on the table. This means we can issue some INSERT, UPDATE, DELETE statements with a table with clustered column store index. No more tedious workaround is required for writing data to a table with column store index in this release like the previous release.

ColumnStore Index in MS SQL 2012 and Extension In MS SQL 2014

Introduction

In this article we are going to discuss about the ColumnStore Index of Microsoft SQL Server 2012. The feature is developed in MS SQL 2012 first and Modification of it came at MS SQL Server 2014. Here we discuss both of them. Hope it will be interesting and useful.

What Technology the ColumnStore used

ColumnStore indexes are based on xVelocity technology. This is formerly known as VertiPaq. It is an advanced storage and compression technology that originated with PowerPivot and Analysis Services but has been adapted to SQL Server 2012 databases.

For Which Purpose it is Created

ColumnStore Index is specially designed for handling large amount data in FACT table of Sql Server Analytical Services (SSAS) Cube.

Understanding ColumnStore Index

At the heart of this model is the columnar structure that groups data by columns rather than rows. To better understand it lets take a simple example.

First of all we create the Base Table

-- Base Tabhle
IF OBJECT_ID(N'dbo.tbl_EMPDTLS', N'U')IS NOT NULL
   BEGIN
      DROP TABLE [dbo].[tbl_EMPDTLS];
   END
GO
CREATE TABLE [dbo].[tbl_EMPDTLS]
   (
     EMPID    INT         NOT NULL IDENTITY PRIMARYKEY,
     EMPFNAME VARCHAR(50) NOT NULL,
     EMPLNAME VARCHAR(50) NOT NULL,
     EMPGRADE CHAR(1)     NOT NULL
   )
GO     

As the table named dbo.tbl_EMPDTLS contains a primary key on the columns EMPID. So a CLUSTERED INDEX is generated automatically on EMPID columns of the table. So it is a ROW store INDEX. Before understanding the ColumnStore Index we need to understand the ROW Store index first and how it’s works.

In the case of a Row Store Index, data from all the columns of the rows are stored together on the same page.



Now we put the query to retrieve data

SELECT EMPID, EMPGRADE FROM [dbo].[tbl_EMPDTLS];

Now we see what happened when the above query fires.



When the database engine processes the above query, it retrieves all three data pages into memory, fetching the entire table.  Even if most of the columns aren’t needed. Than the requested fetched and show us the result.

Now in case of ColumnStore Index store each column data in separate pages. That is column wise fashion, rather than the traditional Row Store Index, which stores data from all the columns of a row together contiguously (row wise fashion). If we query only selects a few columns of the index, it reads less pages as it needs to read data of selected columns only and improves the performance by minimizing the IO cost.



Creating ColumnStore Index

Creating ColumnStore Index is not a hard job. No extra syntax is needed. June one key word named COLUMNSTORE is needed with old Index syntax.

CREATE NONCLUSTERED COLUMNSTORE
INDEX IX_ClStr_tbl_EMPDTLS
ON [dbo].[tbl_EMPDTLS]
(EMPID, EMPFNAME, EMPLNAME, EMPGRADE);

Limitation

·         A table with a ColumnStore Index cannot be updated.
·         A table can have only one ColumnStore Index and hence you should consider including all         columns or at least all those frequently used columns of the table in the index.
·         A ColumnStore Index can only be non cluster and non unique index; you cannot specify       
      ASC/DESC or INCLUDE clauses.
·         The definition of a ColumnStore Index cannot be changed with the ALTER INDEX   
       command; you need to drop and create the index or disable it then rebuild it.
·         A ColumnStore Index cannot be created on view.
·         A ColumnStore Index cannot be created on table which uses features 
      like ReplicationChange TrackingChange Data Capture and Filestream

What’s New in 2014

A new storage engine in SQL Server 2014 overcomes that limitation.
Clustered Columnstore Index, allowing the table to operate normally when it comes to DML operations like INSERT, UPDATE and DELETE.