Tuesday, August 18, 2015

SSIS - Overview & Interview Questions

SSIS - Overview & Interview Questions

About SSIS


-          Introduced in 2005
-          Advanced version of DTS (Data Transformation Services)
-          An easy to use, highly flexible, extremely capable, highly scalable ETL (Extract Transform Load) tool
-          Creates structure called package
-          Package contains mainly following things
o   Control Flow (defines overall flow)
o   Data Flow (extracts data from multiple sources and putting into multiple destinations)
o   Event Handler
o   Package Explorer
o   Connections Tray (holds all kind of connections)



Connection Manager


-          OLEDB connection
-          Flat file connection
-          Excel connection
-          Cache connection
-          FTP connection
-          Multifile connection
-          SMTP connection


Difference between Control Flow & Data Flow Task 
      
      Click Here


Variables


       - Variables store values in SSIS Package at package, container, task level
       - Variables can be used in
             a. Script task - we can use the variable using Dts.Variables["Blah"]
             b. Expression in any transformation
             c. Expression in any Connection Manager
             d. Execute SQL Task to pass value to SP, and store value from SP
             e. Precedence Constraint to define success/failure flow
             f. Error output
       - 2 type of variables
             a. System Variable
             b. User-defined Variable

       - Variable name is case-sensitive
       - We can create 2 variables with same name at different level. e.g. Container level and then  
          Data Flow Task Level
       - In that case, local scope will get more priority.



What are the different control flow tasks you have used?


         -          Execute SQL Task (more details)
         -          Execute Package Task (more details)
         -          Send Mail Task (more details)
         -          Data Profiling Task (more details)
         -          Transfer Logins Task (more details)
         -          Bulk Insert Task (more details)
         -          Data Flow Task
         -          For Loop Container (more details)
         -          For Each Loop Container (more details)
         -          Sequence Container
         -          File System Task (more details)
         -          Script Task
         -          FTP Task
         -          Backup Database Task
         -          Rebuild Index Task
         -          Reorganize Index Task
         -          Shrink Database Task
         -          Update Statistics Task


What are the different data flow transformations you have used?


            -          Simple DFT

         -          Copy Column Transformation (more details)
         -          Derived Column & Data Conversion Transformation (more details)
         -          Conditional Split Transformation (more details)
         -          Character Map Transformation (more details)
         -          Audit Transformation (more details)
         -          Aggregate & Multicast Transformation (more details)
         -          Merge & Sort Transformation (more details)
         -          Merge Join Transformation (more details)
         -          Union All Transformation (more details)
         -          Row Count Transformation (more details)
         -          Row Sampling Transformation
         -          Fuzzy Lookup Transformation
         -          Fuzzy Grouping Transformation
         -          Cache Transformation (more details)
         -          LookupTransformation (more details)


        How would you do logging in SSIS?
             click here

   Approach for Error Handling
             click here



          Debugging and Using Data Viewer
                  click here


         How to deploy SSIS package on production server?
             To be added

        New features of SSIS 2008?
             click here

        How to pass a variable to child package?
             To be added

         Performance improvement in SSIS
             click here

         How would you restart package from point of failure? What is checkpoint?
             To be added 

      How you can achieve parallelism in SSIS?
               To be added 

         Difference between Merge, Merge Join & Union All
              

              
         Difference between Conditional Split & Multicast
               To be added 

Loading Excel into DB on 64 bit machine

Loading Excel into DB on 64 bit machine

Loading Excel into DB was much easier but it became almost impossible in SQL 2008 (64 bit machine).
Here is the fix.

Fix perfectly works with “Microsoft SQL Server 2008 (SP1) - 10.0.2531.0 (X64)” and “Office 2010”.

One time Activity


1.      1). Install “AccessDatabaseEngine_x64.exe” from here on server
2.      2). Execute following code on Server
USE [master]
GO

EXEC master . dbo. sp_MSset_oledb_propN'Microsoft.ACE.OLEDB.12.0' , N'AllowInProcess' , 1
GO

EXEC master . dbo. sp_MSset_oledb_propN'Microsoft.ACE.OLEDB.12.0' , N'DynamicParameters' , 1
GO


Your Code

SELECT * FROM OPENROWSET('Microsoft.ACE.OLEDB.12.0','Excel12.0;IMEX=1;Database=D:\Nisarg\Sample.xlsx','SELECT * FROM [Sheet1$]')

Connect Local SQL Server when you are not part of "Sysadmin"

Connect Local SQL Server when you are not part of "Sysadmin"

There might be a situation when one is part of “Administrators” on local system but not part of “sysadmin” on local SQL Server and hence he is not able to connect to local SQL Server.

This situation can be handled and fixed by following below mentioned steps.

1.       Go to All Programs
à SQL Server 2008
à Configuration Tools
àSQL Server Configuration Manager
2.       Go to “SQL Server Services”
3.       Right click on “SQL Server (MSSQLSERVER)” and click on Properties
4.       Click on “Advance” tab
5.       Add “-m;” in the beginning of value of “Startup Parameters”. So it should look something like this.
-m;-dC:\Program Files\Microsoft SQL Server\MSSQL10.MSSQLSERVER\MSSQL\DATA\master.mdf;-eC:\Program Files\Microsoft SQL Server\MSSQL10.MSSQLSERVER\MSSQL\Log\ERRORLOG;-lC:\Program Files\Microsoft SQL Server\MSSQL10.MSSQLSERVER\MSSQL\DATA\mastlog.ldf
6.       Click “OK”.
7.       Go to All Programs Ã  SQL Server 2008 Ã  (Right click) SQL Server Management Studio Ã  Run as Administrator
8.       Now you should be able to connect
9.       Add yourself in “sysadmin” group
10.   Close SSMS
11.   Follow 1 to 4 steps
12.   Remove “-m;” from beginning from value of “Startup Parameters”.
13.   Click “OK”

Execute SQL Task

Execute SQL Task


- Using this, we can
--> execute direct T-SQL statements
--> execute parameterized SPs
--> execute .sql file

PROPERTIES


1. Result Set
        - None : When you just want to execute the SP and don't want to store resultset anywhere
        - Single Row : When you want to store 1st row of a resultset in some variables
        - Full Result Set : When you want to store whole resultset in some object variable
2. SQLSourceType
        - Direct Input : When you want to store whole resultset in some object variable
        - File Connection : When you want to store whole resultset in some object variable
        - Variable : When you want to store whole resultset in some object variable
3. SQLStatement
        Here you have to give actual statement

EXAMPLE

1. Executing direct SELECT statement

          SQLSourceType : Direct Input
          SQLStatement : SELECT .......

2. Executing parameterized SP


          SQLSourceType : Direct Input
          SQLStatement : EXEC DemoProc ?,?

          This means SP has 2 parameters



Now if you want to store result set then, you can store SingleRow individually in different variables or whole resultset in 1 single Object variable. 

Storing Single Row


Storing Whole Result set 
       click here

Bulk Insert Task

Bulk Insert Task

This is being used to insert the records from some text file or csv file into a table.

Following properties should be set.
column delimiter can be comma, semicolon, tab, pipeline, etc.

If 1st row is header, then FirstRow should be set to 2.




Bulk Insert Task

Introduction

As the name suggests the BULK Insert task is used to insert bulk data from source to destination. But the transformation of data is not possible over here. It straight forwards insert bulk data from source to destination. The configuration of bulk insert task is quite simple and there is no need of  data flow components over here. Just one control flow tasks is sufficient to insert data from source to destination.
In this article I am trying to illustrate the feature and configuration of Bulk Insert task of control flow.

Type of Source and Destination
The source and destination can be any of the data sources as we have such as OLEDB, excel, etc.

Configuring Bulk Insert Task

Step-1 [ The flat file source and Destination Table ]

The flat file name is "BulkInsertSource.txt"



Destination Table objects name is "tbl_BulinsertDestination"

IF OBJECT_ID(N'tbl_BulinsertDestination', N'U') IS NOT NULL
   BEGIN
     DROP TABLE tbl_BulinsertDestination;
   END
GO
CREATE TABLE tbl_BulinsertDestination
      (UNIT             VARCHAR(50)       NOT NULL,
       [YEAR]           VARCHAR(4)        NOT NULL,
       [PERCENT]        DECIMAL(20,2)     NOT NULL);
GO      

Step-2  [ Control Flow tasks ]



Step-3 [ Bulk Insert Task Editor ]



Step-4 [ Result Set ]

SELECT * FROM tbl_BulinsertDestination;

UNIT      YEAR     PERCENT
East         2010       50.00
East         2011       55.00
East         2012       60.00
East         2013       45.00
West       2010       0.00
West       2011       40.00
West       2012       56.00
West       2013       72.00
North      2010       5.00
North      2011       12.00
North      2012       16.00
North      2013       45.00                      


Hope you like it.

Transfer Login Task

Transfer Login Task



Transfer Logins Task is used to transfer either all logins (except sa) or selected logins or all logins of selected databases from the source to the destination SQL Server instance.  After the transfer, all the transferred SQL logins are assigned random passwords and SQL logins are disabled. The DBA needs to change the password and enable the SQL login before it can be used on the destination.
Let's walk through an example. Create another package in the current project and drag aTransfer Logins Task from the Toolbox to the Control Flow. Right click on the task and select Edit to configure the task's properties as shown below.

These are the items that can be configured:
  • Connections
    • SourceConnection - specify the connection for the source SQL Server instance
    • DestinationConnection - specify the connection for the destination SQL Server instance
  • Logins
    • LoginsToTransfer - You have three options for this:
      • AllLogins - this will transfer all logins from the source.
      • SelectedLogins - this allows you to select specific logins
      • AllLoginsFromSelectedDatabases - This allows you to transfer all logins associated with one or more databases as shown in the image below.
    • LoginsList - this will allow you to select specific logins if you selectSelectedLogins for LoginsToTransfer
    • DatabaseList - this will allow you to select the databases if you select AllLoginsFromSelectedDatabases for LoginsToTransfer
  • Options
    • IfObjectExists - If the logins already exist on the destination you have three choices; first FailTask execution, second Overwrite the destination logins and third Skip the existing login and continue with the others.
    • CopySids - if you set it to True then security identifiers (SIDs) associated with logins are also copied to the destination

Data Profiling Task

Data Profiling Task


---> A new task introduced with SQL Server 2008
---> This task can be used by SSIS developers to easily and effectively profile through source system to understand the quality of data before it can be finally loaded to a data warehouse using the ETL process. 
---> There are 8 inbuilt data profiles available within Data Profiling Task. 
---> Among the 8 profiles available, 5 can be used to analyze individual columns and the remaining 3 can be used to analyze multiple columns or relationships existing between columns and tables. 
---> It internally uses the ADO.Net connection to connect with SQL Server. 
---> It returns an XML output and can be viewed by Data Profiler Viewer utility which is available at “C:\Program Files\Microsoft SQL Server\100\DTS\Binn\DataProfileViewer.exe.

EXAMPLE

1. Create File Connection Manager



2. Pull Data Profiling Task and set following properties


3. Click on Quick Profiler and have settings like this.


4. Execute the package and .xml file will be generated.
5. This file can be viewed by .exe mentioned above. It will look something like this.