Tuesday, August 18, 2015

Differences - SSIS

Control Flow vs Data Flow




Control Flow
Data Flow
Step
1st step for creating package
2nd step for creating package, it will be part of control flow
Made up of
Containers and tasks connected with precedence constraints
Source, transformation and destinations
About
It is about data preparation and managing the process
It is about ETL activities
Smallest unit
Task
Component
Data movement
It does not move the data from one task to another task
It moves the data from source to destination
Processing
Task1 has to be completed before Task2 begins
Multiple components can be processed at the same time
Count
Only 1 control flow is available
There can be multiple data flow tasks inside control flow
Error output
Explicit Error output is not there but it can be achieved by editing precedence constraint
Error output is available in source/transformation/destination



Introduction to Control Flow and Data Flow

Introduction

When someone learning the SSIS, he/she must face a common question like What is the difference between Control Flow and Data Flow. As the Control flow and the Data flow are two basic elements of SSIS.

We already get a glance about Control Flow and Data Flow in my previous article Understanding the BIDS and in this article now I am trying to differentiate between Control Flow and Data flow. Most of the SSIS developer spent a lot of time to dealing with these two elements.

So let's Start

When we open the Business Intelligence Development Studio or BIDS (in MS SQL Server 2005/2008 2008R2/SQL Server Denali (CTP3)) we find that there are two tabs in the name of Control Flow and Data Flow.



Control Flow

Control flow controls the package flow based on completions, success or tasks failure. The smallest units of the control flow are a task. It does not move data from one task to another. It just maintains the control of data flow. Tasks are run in series if connected with precedence or in parallel. Package control flow is made up of containers and tasks connected with precedence constraints to control package flow.



Data Flow

Data Flow deals with the Actual data movement. Here the multiple components can be process data at the same time. The smallest unit of the data flow is called components. The Data flow is made up of source (From where the data is collected, may be Excel, Flat file, FTP, SQL Server etc), Transform (Modification or manipulation of data, such as data type conversion, convert to smaller to upper etc) and the destination(where the data stores, may be SQL server etc).



So we can tell that the data flow is the child of the control flow. A SSIS package at least contains a control flow. Control Flow may or may not required a data flow.

Real life Example

Don't be afraid the bookish things. To understand it properly, here I am mention a simple scenario or story.
A post man named xyz deliver the postage door to door. Let's assume, he has 3 letter for person –A , B and C. Here he person B address is far from post office. The Person B address is between A and C and Person C address is near the post office. So the postman decide to go to the Person B address first and provide him the letter, then the person B and at the end person C.

Here the decision taken by the postmen how to distribute the postage material is the Control Flow and actual handover the postage material is the Data Flow.