Wednesday, August 19, 2015

Data Flow Task

Data Flow tasks of SSIS

Introduction
In my previous article, I am with Control Flow Tasks and Containers. In this article I am trying to discuss abut one of the control flows tasks named data flow.
If we think about a package of SSIS, it must contains one Control Flow tasks and can have zero, one or more data flow tasks.

How work with Data Flow Tasks
It is simple to work with data flow tasks. From Control Flow Items tools bar just drag the Data Flow tasks to the work space of Control Flow tabs and then double click the Data Flow Tasks.





Data Flow tasks Tool Box Objects
Data flow tasks has 3 types of Objects in tools box mentioned bellow.

1.    Data Flow Source Adapters
2.    Data Flow Transformations
3.    Data Flow Destination Adapter


Data Flow Source Adapter
It uses the package connections which points a server instance or file locations. The Source Adapter extracts data from source and moves it to direct Destination Adaptors or to Data Flow transformations and then from Data flow Transformation to Destination adapter.

Here there is one exceptions, the Raw File Adapters doesn't use any package connections. 

There are some common data flow sources adaptors are mentioned bellow.

1.    Data Reader: Make connection to a table or Query through a .NET providers
2.    Excel: Extracts data from Excel work sheet in the Excel file.
3.    Flat File: Make connection with delimited or fixed-width file.
4.    Raw File: Connect to binary type file.
5.    OLEDB: Connects to Installed OLEDB provider, such as SQL Server, Access, ORACLE etc.
6.    XML: Extracts data from XML file.

Data Flow Transformations
Transformations are used to modify or manipulate data in the data flow. It performs operations at one row at time or several rows of data at once.

There are wide verity of transformation tasks are present in the tools bar, what you choose is depends on what type of data processing is needed to complete your functionality. Depending of different nature of transformation, we can divide the transformation into 4 categories.

Category-1 [ Logical row-level transformations ]    
It's works on each and every rows of data flow one by one. `Some common used of there transformations are data type conversion, replacing NULL values, case conversion(lower case to upper case etc), row counts etc.

Category-2 [ Multi-Input and Multi-Output transformations ]    
It's works with more than one data input or can generate more than one output. It's have the ability to combine or branch data from one or more sources to one or more destination. Common examples are Union All, Merge-Join, Multicast etc.

Category-3 [ Multi-Row transformations ]    
It's performs works based on criteria from multiple inputs rows or generate multiple output rows from a single input rows. This type of transforms has more memory overhead. But from business prospective it is very important. Some common examples are Aggregate, Pivot, Sort, Row Sampling etc.

Category-4 [ Advanced Data-preparation transformations ]    
It's works on rows in the data flow pipeline. Some common examples are OLE DB Command, Fuzzy Grouping, and Scripts Components etc.

Data Flow Destination
It is the end point of the package. It is just like the Data flow source to use the package connections. It defines the destination where the processed or unprocessed data are pushed. If we take an example, suppose we want to store the final output of the package in a excel file than we must select the Excel Destination adapter.

To understand it properly
Sometimes real life examples help us to understand properly. Here I am taking a real life example to understand it properly. Think about postal services.

Step-1 [ Data Flow Source Adapters ]
Postman collects postage from different sources like post box, registered postage, and parcel from customer home.

Step-2 [ Data Flow Transformations ]
Then in the post office, they check the details of each and every postal baggage like the stamps are corrects or not, delivery address etc.. ect.. and process them.

Step-3 [ Data  Flow Destination Adapter ]
Postman goes to the delivery address and delivers the postage.