Friday, July 17, 2015

Data Profiling Task

What do we mean by Data Profiling task in SSIS?
Data Profiling task is a control flow component and is used to analyze data of a table or view in a SQL Server database so as to identify the potential problems,data issues,patterns and constraints with the quality of data. This is a new component in MS SQL Server 2008. We can generate a XML(Xtensible Markup Language) report and save it to a file or to a SSIS variable. After saving the computeddata profiles in a file, we can use Data profile viewer to review the profile output. It helps on analyzingthe source data and resolves the data issues to build a mature centralized data ware house. Whether it’s an additional cost or not, it is a necessity these days that an ETL tool should have a data profilingfacility. The Data Profiling task works only with data that is stored in SQL Server. This task does not work with third-party or file-based data sources.
NOTE :-  How to Launch Data Profile Viewer – There are 3 ways to launch Data profile viewer to see the regular XML file as output.
1.  Double click on the Data Profiling task. Go to the Control Flow tab and it will open up the General tab. Under this we can click “Open Profile Viewer“.
2.  Go to Start Menu -> Microsoft SQL Server 2012 -> Integration Services -> Data Profile Viewer. Open the output file that was created by the Data profiling task.
3.  Go to run and enter DataProfileViewer.exe
Why do we need Data Profiling?
 1.  Data Analysis – Instead of writing a bunch of TSQL to do some data analysis on the table data in your database, SQL Server Integration services provides a feature called Data Profiling Task that helps to perform analysis much easier and faster.
2.  To fix data issues at source – It is important to trap the problems in the beginning and rectify them before sending the data for further analysis to minimize the data quality issues that might occur from source data.
3.  To have a Valid and Quality Data – For better analysis and reporting we need a valid and quality data i.e. data should be Accurate, Consistent, Complete and uniform as dirty data will lead to wrong analysis and reporting.
Features of Data Profiling Task :-
The Data Profiling Task can compute 8 different data profiles. 5 of these profiles analyze individual columns, and the remaining 3 analyze multiple columns or relationships between columns and tables.
For more information, you can Visit – http://msdn.microsoft.com/en-us/library/bb895263.aspx
Implementing Data Profiling Task with an Example –  
STEP 1.  Open Business Intelligence Development Studio (BIDS) and create a package with a desired name.
STEP 2.  Once the package is created, Go to the Toolbox and select Data Profiling Task under theControl flow items. Drag and drop this task onto the control flow pane.
Drag_Drop
STEP 3.  Now we need to configure the properties used to profile data sources. Double click on theData profiling task to launch the editor. Under the General tab, go to the destination field and choose New Connection from the drop down list.
Destination_New Connecti
STEP 4.  A file connection manager editor will pop up. You can choose either create file or existing filefrom the usage type drop down list. Next step is to browse the file location where we need to save our file. The extension of the file will be .xml
Destination_location
STEP 5.  Under General tab, go to Quick profile button. It will launch single table quick profile form. Then go to the New button to configure connection for the table whose data profile need to be analyzed. Choose the appropriate Server name, Authentication type and the database name and click Ok.
Connection
STEP 6. This will establish an ADO.NET connection with your SQL server database. Now, you can choose the table or view from the drop down list whose data profiles need to be analyzed. Choose the various profiles for the table from the check boxes and click OK.
Quick profile form
STEP 7.  Clicking OK button will lead to Profile requests tab. You can set the data profiles for the table from here also. This can be used as an alternate method.
Configuring profiles
STEP 8.  The configuration of our data profiling task is now complete and we can run it by selecting “Start Debugging” from the Debug menu or by using the F5 shortcut.
Execution of Data Profiling Task
STEP 9. To view the output in Data Profile Viewer
Go to Start Menu -> Microsoft SQL Server 2012 -> Integration Services -> Data Profile Viewer. Open the output file that was created by the Data profiling task OR Go to run and enter DataProfileViewer.exe
Output of Data Profiling task
This completes our post on Data Profiling Task in SSIS. Hope you unserstood the features and functioning of Data profiling task. Your views and comments are valuable.