Friday, August 2, 2013

SQL data type

SQL data type

SQL data type defines the type and range of the data that can be used with SQL Server.

Commonly used SQL data types:

Data TypeSyntaxDescription
integerinteger Integer number.
smallintsmallint Small integer number.
numericnumeric(p,s)Where p is a precision value; s is a scale value. For example, numeric(7,3) is a number that has 4 digits before the decimal and 3 digits after the decimal.
decimaldecimal(p,s)Where p is a precision value; s is a scale value.
realrealSingle-precision floating point number
double precisiondouble precisionDouble-precision floating point number
floatfloat(p)Where p is a precision value.
characterchar(x)Where x is the number of characters to store. This data type is space padded to fill the number of characters specified.
character varyingvarchar2(x)Where x is the number of characters to store. This data type does NOT space pad.
bitbit(x)Where x is the number of bits to store.
bit varyingbit varying(x)Where x is the number of bits to store. The length can vary up to x.
datedateStores year, month, and day values.
timetimeStores the hour, minute, and second values.
timestamptimestampStores year, month, day, hour, minute, and second values.
time with time zonetime with time zoneExactly the same as time, but also stores an offset from UTC of the time specified.
timestamp with time zonetimestamp with time zoneExactly the same as timestamp, but also stores an offset from UTC of the time specified.
- See more at: http://www.beginnerstutorialexamples.com/sql-data-type/#sthash.P0O4wuAG.dpuf



DATATYPE


DataType
Storage Size (Bytes)
Values
Comment
Tinyint
1
0 to 255
Smallint
2
-2 x to 2 x - 1
(X: 15)
Int
4
-2 x to 2 x - 1
 (X: 31)
Bigint
8
-2 x to 2 x - 1
 (X: 63)
Numeric
5 to 17
Datetime
8
01/01/1753 to 12/31/9999
Accuracy: 3 miliseconds
smalldatetime
4
01/01/1900 to 06/06/2079
Accuracy: 1 minute
Datetime2
6 to 8
01/01/0001 to 12/31/9999
Accuracy: 100 nanoseconds
datetimeoffset
8 to 10
01/01/0001 to 12/31/9999
Accuracy: 100 nanoseconds
Date
3
01/01/0001 to 12/31/9999
Accuracy: 1 Day
Time
3 to 5
00:00:00.0000000 to
23:59:59.9999999
Accuracy: 100 nanoseconds
Char
As specified
Only english characters are allowed
Text
generally not used now
Varchar
as specified
max 2 GB
Only english characters are allowed
Nchar
As specified
(double the char)
Non-english characters are allowed
Ntext
generally not used now
Nvarchar
as specified
(double the varchar)
max: upto 2 GB
Can store non-english characters
Binary
As specified
Varbinary
Variable As specified
max: upto 2 GB
FILESTREAM can be used with VARBINARY(MAX)
Image
16
generally not used now
Cursor
Hierarchyid
892 bytes
Binary number maintains the
hierarchy
sql_variant
timestamp/
rowversion
nullable : varbinary(8)
non-nullable: binary(8)
Unique Binary number
throught database
uniqueidentifier
16
NEWID function can be used
Xml
max: upto 2 GB
Maximum 128 levels
Geography
max: upto 2 GB
Geometry
max: upto 2 GB
User Defined Data Types:
CREATE TYPE City FROM VARCHAR(255) NOT NULL
CREATE TABLE Info (INT, B CITY)




Data Types



As we all know the selection of proper data types is very important for a SQL Server Table objects. Without proper data types the performance is going down and it is un manageable for SQL developer.

In this article I am trying to gather and provide a chart that contains most useful data types used in Microsoft SQL Server Version 2000/2005/2008.


Datatype
Min
Max
Storage
Type
Notes
Bigint
-2^63
2^63-1
8 bytes
Exact numeric
Int
-2,147,483,648
2,147,483,647
4 bytes
Exact numeric
Smallint
-32,768
32,767
2 bytes
Exact numeric
Tinyint
0
255
1 bytes
Exact numeric
Bit
0
1
1 to 8 bit columns in the same table requires a total of 1 byte, 9 to 16 bits = 2 bytes, etc...
Exact numeric
Decimal
-10^38+1
10^38–1
Precision 1-9 = 5 bytes, precision 10-19 = 9 bytes, precision 20-28 = 13 bytes, precision 29-38 = 17 bytes 
Exact numeric
Decimal and numeric data type is exactly the same. Precision is the total number of digits. Scale is the number of decimals. For both the minimum is 1 and the maximum is 38.
Numeric
no
Money
-2^63 / 10000
2^63-1 / 10000
8 bytes
Exact numeric
Smallmoney
-214,748.3648
214,748.3647
4 bytes
Exact numeric
Float
-1.79E + 308
1.79E + 308
4 bytes when precision is less than 25 and 8 bytes when precision is 25 through 53
Approximate numeric
Precision is specified from 1 to 53.
Real
-3.40E + 38
3.40E + 38
4 bytes
Approximate numeric
Precision is fixed to 7.
Datetime
1753-01-01 00:00:00.000
9999-12-31 23:59:59.997
8 bytes
Date and time
If you are running SQL Server 2008 and need milliseconds precision, use datetime2(3) instead to save 1 byte.
Smalldatetime 
1900-01-01 00:00
2079-06-06 23:59
Date and time
Date

 (SQL2008)
0001-01-01
9999-12-31
Date and time
Time

(SQL2008)
00:00:00.0000000
23:59:59.9999999
Date and time
Specifying the precision is possible. TIME(3) will have milliseconds precision. TIME(7) is the highest and the default precision. Casting values to a lower precision will round the value.
Datetime2

(SQL2008)
0001-01-01 00:00:00.0000000
9999-12-31 23:59:59.9999999
Presicion 1-2 = 6 bytes precision 3-4 = 7 bytes precision 5-7 = 8 bytes
Date and time
Combines the date data type and the time data type into one. The precision logic is the same as for the time data type.
Datetimeoffset

(SQL2008)
0001-01-01 00:00:00.0000000 -14:00
9999-12-31 23:59:59.9999999 +14:00
Presicion 1-2 = 8 bytes precision 3-4 = 9 bytes precision 5-7 = 10 bytes
Date and time
Is a datetime2 data type with the UTC offset appended.
Char
0 chars
8000 chars
Defined width
Character string
Fixed width
Varchar
0 chars
8000 chars
2 bytes + number of chars
Character string
Variable width
Varchar(max)

(SQL2005/2008)
0 chars
2^31 chars
2 bytes + number of chars
Character string
Variable width
Text
0 chars
2,147,483,647 chars
4 bytes + number of chars
Character string
Variable width
Nchar
0 chars
4000 chars
Defined width x 2
Unicode character string
Fixed width
Nvarchar
0 chars
4000 chars
Unicode character string
Variable width
Nvarchar(max)

(SQL2005/2008)
0 chars
2^30 chars
Unicode character string
Variable width
Ntext
0 chars
1,073,741,823 chars
Unicode character string
Variable width
Binary
0 bytes
8000 bytes
Binary string
Fixed width
Varbinary
0 bytes
8000 bytes
Binary string
Variable width
Varbinary(max)

(SQL2005/2008)
0 bytes
2^31 bytes
Binary string
Variable width
Image
0 bytes
2,147,483,647 bytes
Binary string
Variable width
Sql_variant
Other
Stores values of various SQL Server-supported data types, except text, ntext, and timestamp.
Timestamp
Other
Stores a database-wide unique number that gets updated every time a row gets updated.
Uniqueidentifier
Other
Stores a globally unique identifier (GUID).
Xml
(SQL2005/2008)
Other
Stores XML data. You can store xml instances in a column or a variable.
Cursor
Other
A reference to a cursor.
Table
Other
Stores a result set for later processing.

Hope you like it.




Why nVARCHAR


One of my friends ask me a question why use nVARCHAR not the VARCHAR. He also told me when he makes any Table objects by nVARCHAR data types it takes double in length. I try to explain his query by T-SQL statements.
CREATE TABLE Table_1
       (
        Emp_name1 VARCHAR(20),
        Emp_Name2 nVARCHAR(20)
        )

GO
sp_columns Table_1
GO

COLUMN_NAME       TYPE_NAME         LENGTH
Emp_name1         varchar           20   
Emp_Name2         nvarchar          40
This article is dedicated to him to understand the difference between VARCHAR and nVARCHAR and why it takes double in length.
To explain it I am just inserting some values to my above table and observe the result closely.
INSERT INTO Table_1
            ( Emp_name1, Emp_Name2)
VALUES ('A', 'A')           

GO

SELECT DATALENGTH(Emp_name1) [VARCHAR_LENGHT],
       DATALENGTH(Emp_name2) [nVARCHAR_LENGHT]
FROM Table_1      

GO

VARCHAR_LENGHT    nVARCHAR_LENGHT
1                 2

If we observe the output closely we can see the VARCHAR takes 1 bytes and nVARCHAR takes 2 bytes.
VARCHAR is ASCII based and nVARCHAR is UNICODE based and 1 byte for ASCII and 2 byte for UNICODE.
So what it means, by using nVARCHAR we just loose the disk space?
In this article I am trying to explain the Difference UNICODE vs. ASCII and UNICODE advantage. To support my point I am gathering some facts that are mentioned bellow.
Here I am just mentioned some major differences only.
1.    UNICODE takes 2 byte. 1 byte for language page 1 byte for sign value.
2.     ASCII takes 1 byte. It doesn't contain info about language page and all bytes contain sign info.
3.     If we will use in our application different language in one time. I mean we can see record on English and Japan language - UNICODE can solve this problem as because it has language page info.
4.    If we will use in our application different language in one time. I mean we can see record on English and Japan language - ASCII can't solve this problem, because it can store info only about one language.
Hope you like it.




TIME data type


SQL Server 2008 introduced a new data type called TIME, which allow the TIME without DATE. Before SQL Server 2008 it is not possible.
In this article I am going to explain the TIME data type by creating scenario to understand it better way.

Scenario-1 [ The TIME data type ]

The Example
DECLARE @tm TIME = '16:30:12'
SELECT  @tm As [Time]
Result
Time
16:30:12.0000000

Scenario-2 [ The TIME data type Accuracy ]

The default accuracy of TIME data type is 100 nanoseconds. It also allows us to define the accuracy. This indicates how many places to the right of the decimal are stored for the second's portion. We can use 0 to 7 places to the right of the decimal.
Example
DECLARE @tm0 TIME(0) = '16:32:19.1234567',
        @tm7 TIME(7) = '16:32:19.1234567'
SELECT  @tm0 AS [Time0], @tm7 AS [Time7]
Result Set
Time0             Time7
16:32:19          16:32:19.1234567

Scenario-3 [ The TIME data type Storage ]
A TIME(0) takes three bytes to store and a TIME(7) takes five bytes to store. 

Scenario-4 [ The TIME data type Conversion ]
TIME will do the implicit conversion from DATETIME and stores only the time portion of it.
Example
DECLARE     @dt1 DATETIME = '12/29/2007 12:43:24.42',
            @tm1 TIME(2)
SELECT      @tm1 = @dt1
SELECT      @tm1 As [TimeOnly]
Result Set
TimeOnly
12:43:24.42

Scenario-5 [ The TIME data with TIME ZONE ]

The TIME() does not include any time zone information, it will accept a time with time zone information but will ignore the time zone info when displayed.
Example
DECLARE @tm TIME(0) = '12:45:11 -05:30'
SELECT  @tm AS [Time]
Result
Time
12:45:11



DATETIME vs DATETIME2


The new data type DATETIME2 is introduced in Microsoft SQL Server 2008.
It is recommended by Microsoft to use DATETIME2 instead of previous version DATETIME.
Before using the DATETIME2 we have to understand the differences between DATETIME and DATETIME2.
The differences are mentioned in the tabular format
Options
DATETIME
DATETIME2[(n)]
Min Value
1753-01-01 00:00:00
0001-01-01 00:00:00
Max Value
9999-12-31 23:59:59.997
9999-12-31 23:59:59.9999999
Storage Size
8 Bytes
6 to 8 bytes
Parameter n is optional and if it is not specified then fractional seconds precision is 7 digit and it can be from 0 to 7 digit.

For fractional seconds precision <3, takes 6 bytes
For fractional seconds precision 3 or 4 it will take 7 bytes
For fractional seconds precision >4 it will take 8 bytes
How to use it
DECLARE @param DATETIME
DECLARE @param DATETIME2(7)
Compliance
Is not an ANSI/ISO compliant
Is an ANSI/ISO compliant
Function Used
GETDATE() – It returns DB Current Date and Time of DateTime Data Type
SYSDATETIME()- It returns DB Current Date and Time of DateTime2 Data Type
Example: SELECT SYSDATETIME()
+/- of (days)
WORKS
Example: DECLARE @nowDateTime DATETIME = GETDATE()
SELECT @nowDateTime + 1
Result: 2011-09-17 13:44:31.247
FAILS – Need to use only DATEADD function
Example: DECLARE @nowDateTime2 DATETIME2= SYSDATETIME()
SELECT @nowDateTime2+1
Result: Msg 206, Level 16, State 2, Line 2
Operand type clash: datetime2 is incompatible with int


Hope you like it.



HerichyID in MS SQL 2008

One of the most tuffs' queries is to finding the hierarchy. It may be employee or other things. It is always the headache of developer to make tree like structure.
In SQL 2000 we use temporary table to solve this. But in SQL 2005 gives us the better facility to work like this by CTE. An example of SQL 2005 CTE is given bellow. Hope you understand it.
Table Structure:
CREATE TABLE my_EMP
 (empid     int         NOT NULL primary key,
  empname   varchar(50) NULL,
  managerid int)
Values Inserted:
INSERT INTO my_EMP(empid, empname, managerid)
  VALUES(1, 'Sudip Das', NULL),(2, 'Joydeep Das', 1),(3, 'Tufan',2),
        (4, 'Sangram',2)

SELECT * FROM my_EMP
The Output:
empid   empname           managerid
1              Sudip Das            NULL
2              Joydeep Das      1
3              Tufan                   2
4              Sangram              2

The CTE and Output (Hierarchy level)
WITH empharicy(empid, empname, haricyOrder)
  AS
  (
     SELECT empid, empname, 1 as haricyOrder
     FROM   my_EMP
     WHERE  managerid IS NULL
     UNION ALL
     SELECT a.empid, b.empname, haricyOrder+1
     FROM   my_EMP a
            INNER JOIN empharicy b ON a.managerid=b.empid
  )
  SELECT * FROM empharicy
empid   empname           haricyOrder
1              Sudip Das            1
2              Sudip Das            2
3              Sudip Das            3
4              Sudip Das            3

Great news in SQL 2008, no pain for developer there is a new data type namedHerichyID is your ultimate weapons.
Let's starts it.

To provide a real support of hierarchies, SQL Server 2008 introduces a new type of data: HierarchyID.
It is a managed type (.NET), handled by the SQLCLR of SQL Server. It not only stores the parent elements but also stores the set of information to identify the entire hierarchy.

My main purpose of this article to convert our existing table named "my_EMP" to newly created table which contains data type HierarchyID.
We are taking a new columns named "OrgNode" to understand the migration properly. So let's create the new table.

  CREATE TABLE My_EEMP_Converted
            (
              OrgNode      hierarchyid,
              empid        int,
              empname      varchar(50),
              managerid    int
              CONSTRAINT PK_My_EEMP_Converted
              PRIMARY KEY CLUSTERED (OrgNode)
            );

Now create a temporary table in the name of "#children" to contain columns name "Num" that contains the number of children's in each node. Create index on it.

CREATE TABLE #Children
   (
    empid     int,
    managerid int,
    Num       int
   );

GO 
CREATE CLUSTERED INDEX tmpind ON #Children(managerid, empid);

Now populate the temporary table. Here ROW_NUMBER function is used to populate the "Num" columns. I know you have idea on it.
INSERT #Children (empid, managerid, Num)
SELECT empid, managerid,
       ROW_NUMBER() OVER (PARTITION BY managerid ORDER BY managerid)
FROM   my_EMP
GO
Now review the child table.
SELECT * FROM #Children ORDER BY ManagerID, Num
The desired result set is displayed.
empid   managerid          Num
1              NULL                   1
2              1                            1
3              2                            1
4              2                            2

Now populate the table  "My_EEMP_Converted"
WITH paths(path, empid)
AS
(
      -- This section provides the value for the root of the hierarchy
    SELECT hierarchyid::GetRoot() AS OrgNode, empid
    FROM   #Children C
    WHERE ManagerID IS NULL
    UNION ALL
    -- This section provides values for all nodes except the root
    SELECT CAST(CONVERT(varchar,p.path)
                + CONVERT(varchar(30),C.Num)
                + '/' AS hierarchyid),
           C.empid
    FROM   #Children C
           INNER JOIN paths p ON C.managerid = p.empid
)


INSERT My_EEMP_Converted (OrgNode, O.empid, O.empname, O.managerid)
SELECT P.path, O.empid, O.empname, O.managerid
FROM   my_EMP AS O
       INNER JOIN Paths AS P ON O.empid = P.empid
GO

SELECT * FROM My_EEMP_Converted
To more understanding of "OrgNode" convert it to string.
SELECT CONVERT(VARCHAR,OrgNode)OrgNode , empid, empname, managerid
FROM   My_EEMP_Converted

Now drop the temp table
DROP TABLE #Children
GO
To help queries at the same level in the hierarchy, use the Get Level method to create a computed column that contains the level in the hierarchy. Create compound index on  level and the " Hierarchyid".

ALTER TABLE My_EEMP_Converted
      ADD H_Level AS OrgNode.GetLevel()
     
CREATE UNIQUE INDEX My_EEMP_Converted_idx
   ON My_EEMP_Converted(H_Level, OrgNode)
GO

SELECT * FROM My_EEMP_Converted

Now create unique index of empid.
Complete view
SELECT OrgNode, CONVERT(varchar,OrgNode) logical_level,
       empid, empname, managerid, H_Level
FROM   My_EEMP_Converted

OrgNode             logical_level      empid   empname              managerid          H_Level
0x                          /                             1              Sudip Das        NULL                    0
0x58                     /1/                           2              Joydeep Das   1                            1
0x5AC0                 /1/1/                       3              Tufan              2                            2
0x5B40                 /1/2/                       4              Sangram          2                            2




Hope you like it.




Trick- Date Range

To display continues date between 2 date ranges uses this trick
SET DATEFORMAT DMY
GO
IF NOT EXISTS(SELECT * FROM sysobjects WHERE xtype='U' and name like'Numbers')
      BEGIN
            SELECT TOP 10000 IDENTITY(int,1,1) AS Number
                     INTO Numbers
          FROM sys.columns s1
                   CROSS JOIN sys.columns s2
                   
          ALTER TABLE Numbers ADD CONSTRAINT PK_Numbers PRIMARY KEYCLUSTERED (Number)
      END
GO   

DECLARE @Start DATETIME
DECLARE @End DATETIME

SET @Start='01-01-2011'
SET @End='28-02-2011'
SELECT
    @Start+Number-1
    FROM Numbers
    WHERE Number<=DATEDIFF(day,@Start,@End)+1
GO



Which Data Type is good For Primary Key


The most important thing of a table is defining the primary key; it could be single column based or the composite primary key.

In my development life, I saw a lot of primary key with different type or patterns.
Such as single decimal number,  some use character with decimal numbers  and  some others use character+decimal+special character like  "SOM/001/2011-2012" like this.

When I am working with SQL server, it always my mind, that what is the better method to implement primary key rather what data type I choose to implement. For searching the suitable answer I read several articles related to it.

One thing is clear, Microsoft provide you the data type UNIQUE IDENTIFITRE  witNEWID() function to generate the world wide unique primary key. It 16 byte binary value and called globally unique identifiers (GUIDs).  It takes your processor ID, NIC card ID etc to generate the Primary key.

But is it a good data type for a primary key?

To get the suitable answer, we have to understand the definition of the primary key.

1.       It is unique and NOT NULL.
2.       The primary key contains the default clustered index (Physically sorted environment).
So, is the UNIQUE IDENTIFIRE is the good choice for cluster indexed data? In my view it is not. If your primary key is not well indexed is it going for good quality of performance?  No not at all.

So in my view

For primary key always choose the Integer (int) as best data type.

If you're primary key range is very large and not supported the range of Integer then go to Big integer (bigint).

Please review my article, and give me suitable comments related to this as all that I mentioned is not listed in MSDN or any other MS guide line. It is just my point of view only.





The NEW Data Types MS SQL 2008

 


The data type of SQL server is a very important thing to building the table objects. So we must have clear point of view before using different data types.

In my professional life I have seen may DBA uses the wrong type of data type in table objects and results,  performance slow down, Time out …expire etc, as they don't have enough idea of data type related to SQL server.

In these articles, I try to define some new data types that came with SQL 2008 and not used in previous version of MS SQL server.

DATE TIME
It's much easier with SQL 2008; MS provide us better control related to Date and time data types. In SQL 2008 introduced:   DATE, TIME, DATETIME2, and DATETIMEOFFSET.

[A]
DATE

The DATE data type stores only date in YYYY-MM-DD format.
 It stores range of 0001-01-01 through 9999-12-32. It takes 3 bytes to store.

[B]
TIME

It stores only time in the format   hh:mm:ss.nnnnnnn, and range between 00:00:00.0000000 through 23:59:59:9999999 and is accurate to 100 nanoseconds. It takes 3 to 5 bytes to store.

[C]
DATETIME2
It is as similar to the SQL 2005 DATETIME but have greater range value. It runs from 6 to 8 bytes to store.

[D]
DATETIMEOFFSET 

It is similar to DATETIME2, but includes additional information to track the time zone. The format is YYYY-MM-DD hh:mm:ss[.nnnnnnn] [+|-]hh:mm with a range of 0001-01-01 00:00:00.0000000 through 0001-01-01 00:00:00.0000000 through 9999-12-31 23:59:59.9999999 (in UTC), and an accuracy of 100 nanoseconds. Storage depends on the precision and scale selected, and runs from 8 to 10 bytes.

SPECIAL
Two new spatial data types have been added- GEOMETRY and GEOGRAPHY-which you can use to natively store and manipulate location-based information, such as Global Positioning System (GPS) data.

[A]
GEOMETRY

The GEOMETRY data type is used to store planar (flat-earth) data. It is generally used to store XY coordinates that represent points, lines, and polygons in a two-dimensional space. For example storing XY coordinates in the GEOMETRY data type can be used to map the exterior of a building.

[B]
GEOGRAPHY

The GEOGRAPHY data type is used to store ellipsoidal (round-earth) data. It is used to store latitude and longitude coordinates that represent points, lines, and polygons on the earth's surface. For example, GPS data that represents the lay of the land is one example of data that can be stored in the GEOGRAPHY data type.


HIERARCHYID

The HIERARCHYID data type is used to enable database applications to model hierarchical tree structures, such as the organization chart of a business.

The HIERARCHYID data type makes it easier to express these types of relationships without requiring multiple parent/child tables and complex joins.

·         Organizational structures
·         A set of tasks that make up a larger projects (like a GANTT chart)
·         File systems (folders and their sub-folders)
·         A classification of language terms
·         A bill of materials to assemble or build a product
·         A graphical representation of links between web pages


FILESTREAM

FILESTREAM is not a data type as such, but is a variation of the VARBINARY(MAX) data type that allows unstructured data to be stored in the file system instead of inside the SQL Server database.