Wednesday, August 19, 2015

Fuzzy Lookup Transform

Fuzzy Lookup Transform


Introduction

Real-world data is "dirty" because of misspellings, truncations, missing or inserted tokens, null fields, unexpected abbreviations, and other irregularities. Fuzzy lookup enable us to match input records with clean, standardize records in a reference table. To understand it properly let's take an example. Suppose we have customer information like customer name and address. During the sales transaction we take the input for customer name and address which may not be matched exactly with records in the customer reference table because of typographical or others error in the input data. Fuzzy lookup returns the best matching records from the customer reference table even if no exact match exists.

So the fizzy lookup is a very useful transform for every SSIS developer in the real-world environment.  In this article we are going to learn about it.

How we use the Fuzzy Lookup Transform

Step-1 [ Create the Fuzzy Lookup Reference Table ]

-- Fuzzy Lookup Reference table
IF OBJECT_ID(N'tbl_FUZZYREFERENCES', N'U') IS NOT NULL
   BEGIN
      DROP TABLE tbl_FUZZYREFERENCES;
   END
GO
CREATE TABLE tbl_FUZZYREFERENCES
       (FIRSTNAME VARCHAR(50)  NOT NULL,
        LASTNAME    VARCHAR(50)  NOT NULL,
        DOB         DATETIME);
GO          
--Inserting Records
INSERT INTO tbl_FUZZYREFERENCES
       (FIRSTNAME, LASTNAME, DOB)
VALUES ('Joydeep', 'Das', '12-17-1974'),
       ('Shipra', 'Roy Chowdhury', '09-22-1974'),  
       ('Deeepasree', 'Das', '01-31-2003');
GO
SELECT * FROM tbl_FUZZYREFERENCES;                  

FIRSTNAME                        LASTNAME                                         DOB
Joydeep                                  Das                                                         1974-12-17 00:00:00.000
Shipra                                     Roy Chowdhury                                      1974-09-22 00:00:00.000
Deeepasree                             Das                                                         2003-01-31 00:00:00.000

Step-2 [ The Source Flat File ]

The flat file name is "FuzzySourceRecords.txt"



Step-3 [ Data Flow ]



Step-4 [ Fuzzy Lookup Transform Editor ]





Step-5 [ Derived Column Transform Editor ]



Step-6 [ Union All Transform Editor ]



Step-7 [ OLE DB Destination Create Table ]



Step-8 [ Running the Package ]



Step-8 [ Final Destination Table ]






Hope you like it.

XML Configuration Files Effect in SSIS Package

XML Configuration Files Effect in SSIS Package


Introduction

When we are creating the package we are using different types of variable or property values with it. This variables or the property value need to be provide proper value before to run the package successfully. A package configuration is a defined property or value that can be modified without updating the package itself.

Package configurations are useful when we want to deploy packages to multiple servers, when we move our packages from a development to production environment.

Case study
We have a destination table object called "Tbl_ProdcutList". This table populated from a flat file. There are two flat file we have called "FlatFile-A.txt" and "FlatFile-B.txt". There is a package level variable called "v_FlatfileChoose". This variable is responsible to load data from flat file. For an example if the value of the variable is "FlatFile-A" then its loaded data from "FlatFile-A" and if the value of the variable is "FlatFile-B" it's loaded data from "FlatFile-B".

IF v_FlatfileChoose == "FlatFile-A" THE Load Data From "FlatFile-A"
IF v_FlatfileChoose == "FlatFile-B" THE Load Data From "FlatFile-B"

How to do that

Step-1 [ Create the Destination File ]

IF OBJECT_ID(N'Tbl_ProdcutList', N'U') IS NOT NULL
   BEGIN
      DROP TABLE Tbl_ProdcutList;
   END
GO

CREATE TABLE Tbl_ProdcutList
       (PRODID          INT         NOT NULL,
        PRODNAME        VARCHAR(50) NOT NULL);
GO

Step-2 [ The Source Flat file ]



Step-3 [ SSIS Control flow tasks ]

Here I am using Execute SQL Task to truncate the Table only.
In the SQL Statement we use

TRUNCATE TABLE Tbl_ProdcutList;



Step-4 [ The Package level Variable ]



Step-5 [ Precedence Constraint Editors Settings ]



Step-6 [ Package Configuration ]

Right click on control flow and the select package configuration from shortcut menu. Click on enable package configuration and click the Add button.



Welcome to Package configuration wizard window came. In the configuration type select the XML configuration file and then close the configuration setting directory and file name. Please look that the extension of configuration file is .dtxConfig.



In the Set property Explorer Window select the variable and click on the value.



Step-7 [ The Configuration XML File ]



Step-8 [ Run the Package with Different Type of Configuration Settings ]





Hope you like it.

SSIS SQL Server Authentication for OLE DB

SSIS SQL Server Authentication for OLE DB

Introduction

One of my friends sends me a mail related to saving password in SQL Server authentication for OLE DB Source and destination. In this article we are trying to discuss about it.

The Case Scenario
The part of my friends e-mail is mentioned here for better understanding the case scenario.

"I create a new connection Manager and use SQL Server authorization and save the password, the connection manager seems to "forget" the password. That is, when I click the 'save password' check box, and do a test connection, it connects fine. But as soon as I close that connection window, and reopen it, the password box is empty, and the 'save password' box is still checked. Even if I do a test connection at this point, it won't connect because it does not have the password."

Protection Level Property
There is a property of SSIS package called the Protection Level Property that is used to specify how the sensitive information is saved within the package. Here the sensitive information that I mean is connection string etc.

There are six type of property value found in SSIS package like.
  • DontSaveSensitive
  • EncryptSensitiveWithUserKey
  • EncryptSensitiveWithPassword
  • EncryptAllWithPassword
  • EncryptAllWithUserKey
  • ServerStorage


Now we are going to understand each one

Don't Save Sensitive
When we are choosing this value of property Protection level it means that the any kind of sensitive information is not written in the package XML file when we save the package. When we use this value and open the OLE DB connection manager the password is just BLANK even we choose the save my password check box.



Encrypt Sensitive with user Key
It encrypts the sensitive information based on the credentials of the user who create the package. If we use this value the XML file for the password looks like this.

<DTS:PASSWORD Sensitive="1" DTS:Name="Password" Encrypted="1">AQAAANCMnd8BFdERjHoAwE/Cl+...</DTS:PASSWORD>

Here in the above XML code we find that the attributes Encrypted = "1". When the developer who created the package is open the package the password is decrypted automatically. If any other user except the owner tries to open the package get an error like.



And that is the limitation of these settings.

Encrypt Sensitive With Password
Here we just provide a password in the package and this password is used to encrypt and decrypt the sensitive information in the package.



The Encrypt Sensitive With Password setting for the Protection Level property overcomes the limitation of the Encrypt Sensitive With User Key setting, allowing any user to open the package as long as they have the password.

When we execute a package with this setting using DTEXEC, we can specify the password on the command line using the /Decrypt password command line argument.

Encrypt All with Password
It allows encrypting the entire contents of the SSIS package with specified password. The settings is just like the Encrypt Sensitive With Password

Encrypt All With User Key
It allows encrypting the entire contents of the SSIS package with specified password. Only the developer who created the package can able to Open, view or modify and Run it. 

Server Storage
It writes all the sensitive information to SQL server when we save the package. It stores information in MSDB database.

How to solve the above problem
To solve the above problem we must create the SSIS XML configuration file. The Select Property to explore page of the Package configuration wizard is displayed bellow.



Those who want to go to the details of the package configuration manager please read my previous article.

Example of XML file

For Windows authenticated connection to SQL Server:

<DTSConfiguration>
<Configuration ConfiguredType="Property"   
Path="\Package.Connections[SERVERNAMEdbname].Properties[ConnectionString]" ValueType="String">
<ConfiguredValue>
Data Source=[SRVNAME];Initial Catalog=[DBNAME];Provider=SQLNCLI.1;Integrated Security=SSPI;Auto Translate=False;
</ConfiguredValue>
</Configuration>
</DTSConfiguration>

For SQL Server authenticated connection to SQL Server:

<DTSConfiguration>
<Configuration ConfiguredType="Property" Path="\Package.Connections[SERVERNAMEdbname].Properties[ConnectionString]" ValueType="String">
<ConfiguredValue>
Data Source=[SRVNAME];Initial Catalog=[DBNAME];User ID=[LOGIN];password=[PWD];Provider=SQLNCLI.1;Persist Security Info=True;Auto Translate=False
</ConfiguredValue>
</Configuration>
</DTSConfiguration>

Here the [SRVNAME] is the name of the SQL Server. [DBNAME] is the name of the Database,[LOGIN] is the name of the SQL Server login user ID and [PWD] is the Password.



Hope you like it.

Temporary file in SSIS package

Temporary file in SSIS package

Introduction
Temporary file is a very important factors for every T-SQL developer. I think all the T-SQL developer must know the value of temporary file when they are working with complex T-SQL statement. In this article I am trying to use the temporary file in my SSIS package. Is not it an easy task? Yes but we must know some settings as the temp file is deleted when the session is completed.  Hope all of you must understand what I want to mean by that.

Using Temporary file in SSIS

Step-1 [ Here in the control flow tab we are using Execute SQL Task ]
By the Execute SQL Task we are going to create our temp table in the name of "Tmp_VendorsDetails". It contains Vendor ID, Vendor Name and Vendor Grade (like Grade A, B and C).

The SQL statement to Create Temporary File (in Execute SQL Task)

IF OBJECT_ID(N'Tempdb..##Tmp_VendorsDetails') IS NOT NULL
   BEGIN
     DROP TABLE ##Tmp_VendorsDetails;
   END
GO

CREATE TABLE ##Tmp_VendorsDetails
       (VENDORID     INT         NOT NULL IDENTITY(1,1),
        VENDORNAME   VARCHAR(50) NOT NULL,
        VENDORGRADE  VARCHAR(1));
GO          

-- Inserting Data to Temporary Table
INSERT INTO ##Tmp_VendorsDetails
       (VENDORNAME, VENDORGRADE)
VALUES ('RADHARAM TRADING', 'A'),
       ('RATAN TRADERS', 'A'),
       ('BANIK AND CO.', 'C'),
       ('TAHAMAT ALI STORE', 'B');         

Please note that we must run this script at SSMS to get it from data flow tasks.

 Step-2 [ Create the  Destination Table ]

-- Destination Table
IF OBJECT_ID(N'Tbl_DestVendorsDetails') IS NOT NULL
   BEGIN
     DROP TABLE Tbl_DestVendorsDetails;
   END
GO

CREATE TABLE Tbl_DestVendorsDetails
       (VENDORID     INT         NOT NULL IDENTITY(1,1),
        VENDORNAME   VARCHAR(50) NOT NULL,
        VENDORGRADE  VARCHAR(1));
GO                     

Step-3 [ Control Flow Task of SSIS ]






Step-4 [ Data Flow Task of SSIS ]







Step-5 [ Run the Package ]



We find error in the package.

How to Solve this Problem
To fix this issue, we will need to change a property on the connection manager. The name of the property is a connection manager property. If we right click the OLE DB Connection Manager and choose properties we will see a property called RetainSameConnection. This must be set to "True".

The RetainSameConnection property means that the temp table will not be deleted when the task is completed.



Run the package again






Hope you like it.

SSIS Event Handling

SSIS Event Handling

Introduction
One thing that I found when moving with SSIS is a lot of developer no care about SSIS Event handling. As it is a very important portion of the SSIS. We can perform a lot of job by this. This may be due to Innocence. So I decide to write an article related to Event Handling of SSIS.

What is that?
Before we move forward, we have to know what the Event Handling is. Event handling the name is self explanatory. We would like to perform some job based on certain event. The job may be anything like cleaning a table after the ELT process is completed or sending an email in case of any failure etc.

Setup the Event Handling Process

Step-1 [ The Data flow of SSIS Package ]
We have a simple package that extract flat file and load the data into SQL server table.  



Please note that : To test this process we have to introduce an error within this package. In the flat file source we are going to delete the fat file from where the package extracts the data and this action caused an error for this package.

Step-2 [ The Error Log file ]
Now using SSMS we are just create a table to log the error.

IF OBJECT_ID(N'Tbl_ErrorLog', N'U') IS NOT NULL
   BEGIN
      DROP TABLE Tbl_ErrorLog;
   END
GO  
CREATE TABLE Tbl_ErrorLog
            (
                  [ID]          [int] IDENTITY(1,1) NOT NULL,
                  [PackageID]   [uniqueidentifier]  NULL,
                  [Error]             [nvarchar](MAX)     NULL,
                  [Source]      [nvarchar](100)     NULL,
                  [PackageName] [nvarchar](100)     NULL
             );
GO

Step-3 [ The Event Handler Tab ]



The event handler tabs have Executable and event handler drop down.



Here in this package we choose the Data Flow Task as Executable and ON Error as Event Handler. Then we click the hyper link called "Click here to create an 'OnError' event handler for executable 'Data Flow Tasks' ". When we click the hyper link the actual event handler is added with data flow task. To delete this event handler just click the delete button.

Different type of Event Handler is listed bellow
  • OnError: This event is raised by an executable when an error occurs.
  • OnExecStatusChanged: This event is raised by an executable when its execution status changes.
  • OnInformation: This event is raised during the validation and execution of an executable to report information. This event conveys information only, no errors or warnings.
  • OnPostExecute: This event is raised by an executable immediately after it has finished running.
  • OnPostValidate: This event is raised by an executable when its validation is finished.
  • OnPreExecute: This event is raised by an executable immediately before it runs.
  • OnPreValidate: This event is raised by an executable when its validation starts.
  • OnProgress: This event is raised by an executable when measurable progress is made by the executable.
  • OnQueryCancel: This event is raised by an executable to determine whether it should stop running.
  • OnTaskFailed: This event is raised by a task when it fails.
  • OnVariableValueChanged: This event is raised by an executable when the value of a variable changes. The event is raised by the executable on which the variable is defined.
  • OnWarning: This event is raised by an executable when a warning occurs.
Here in this point if we look at the toolbars we can see all the tasks in the control flow item. We can design a small package over here (even handler tabs) to perform its own ETL process. What you think? Is not Event Handling is so powerful.

Now we drag an Execute SQL Task over the Event Handler Tab. In the SQL statement we use the following sql statement.

INSERT INTO Tbl_ErrorLog
       ([PackageID], [Error], [Source], [PackageName])
VALUES (?, ?, ?, ?)



Now we set the parameter mapping option of Execute SQL task Editor.



Parameters Mapping

Table Columns
Variable Name
Data Type
Parameter Name
PackageID
System::PackageID
GUID
0
Error
System::ErrorDescription
NVARCHAR
1
Source
System::SourceName
NVARCHAR
2
PackageName
System::PackageName
NVARCHAR
3

Step-4 [ Run the Package and see the Error Log ]








Hope you like it.