What do we mean by Data flow in SSIS?
Data flow encapsulates the data flow engine and consists of Source, Transformations and Target. The core strength of in MS SQL Server Integration Services (SSIS) is its capability to extract data into the server’s memory (Extraction), transform it (Transformation) and load it to an alternative destination (Loading). It means the data is fetched from the data sources, manipulated or modified through varioustransformations and loaded into the target destination. The data flow task in SSIS sends the data in series of buffer.
Example Scenario – Consider a conveyor belt in a factory. Raw material (Source data) is placed on the conveyor belt and passes through various processes (Transformations). Quality assurancecomponents might reject some material, in which case it can be scrapped (Logged) or fixed and blended back in with the quality material. Eventually, finished goods (Clean and Valid data) arrive at the end of the conveyor belt (Data warehouse).
The first step to implement a Data flow in a package is to add a data flow task to the Control flow of a package. Once the data flow task is included in the control flow of a package, we can start building the data flow of that package.
NOTE: – Arrows connecting the data flow components to create a pipeline are known as Service paths where as arrows connecting components in control flow are known as Precedence constraints. At design time, Data viewers can also be attached to the Service paths to visualize the data.
STEP 1. Creating a Data Flow will include:-
Source(s) to extract data from the databases.
- Adding Connection managers to connect to the data sources.
- Transformations to manipulate or modify the data according to the business need.
- Connecting data flow components by connecting the output of source to transformation and the output of transformation to destination.
- Destination(s) to load the data to data stores.
- Configuring components error outputs.
STEP 2. What are the Components of Data flow?
Components includes –
- Data source(s).
- Transformations.
- Destination(s).
Component 1 – Data Flow Sources
Data Flow Sources | Description |
OLE DB Source | Connects to OLE DB data source such as SQL Server, Access, Oracle, or DB2. |
Excel Source | Receives data from Excel spreadsheets. |
Flat File Source | Connects to a delimited or fixed-width file. |
Raw File Source | Do not use connection manager. It produces a specialized binary file format for data that is in transit. |
XML Source | Do not use connection manager. Retrieves data from an XML document. |
ADO.NET Source | This source is just like the OLE DB Source but only for ADO.NET based sources. |
CDC Source | Reads data out of a table that has change data capture (CDC) enabled. Used to retrieve only rows that have changed over duration of time. |
ODBC Source | Reads data out of table by using an ODBC provider instead of OLE DB. |
Component 2 – Data Flow Transformations
Transformation Categories | Transformations |
Row Transformations | Character Map |
Copy Column | |
Data Column | |
Derived Column | |
OLE DB Command | |
Rowset Transformations | Aggregate |
Sort | |
Pivot/Unpivot | |
Percentage sampling/Row sampling | |
Split and Join Transformations | Conditional split |
Look up | |
Merge | |
Merge join | |
Multicast | |
union All | |
Business intelligence transformations | Data Mining Query |
Fuzzy Look Up | |
Fuzzy Grouping | |
Term Extraction | |
Term Look up | |
Script Transformations | Script |
Other Transformations | Audit |
Cache Transform | |
Export Column | |
Import Column | |
Row Count | |
Slowly Changing Dimension |
Component 3– Data Flow Destinations
Data Flow Destinations | Description |
ADO.NET Destination | Exposes data to other external processes such as a .NET application. |
Data Reader Destination | Allows the ADO.NET Data Reader interface to consume data, similar to the ADO.NET Destination. |
OLE DB Destination | Outputs data to an OLE DB data connection like SQL Server, Oracle or Access. |
Excel Destination | Outputs data from the Data Flow to an Excel spreadsheet. |
Flat file Destination | Enables you to write data to a comma-delimited or fixed-width file. |
Raw file Destination | Outputs data in a binary format that can be used later as a Raw File Source. It’s usually used as an intermediate persistence mechanism. |
ODBC Destination | Outputs data to an OLE DB data connection like SQL Server, Oracle or Access. |
Record set Destination | Writes the records to an ADO record set. Once written, to an object variable, it can be looped over a variety of ways in SSIS like a Script Task or a Foreach Loop Container. |
SQL Server Destination | The destination that you use to write data to SQL Server. This destination has many limitations, such as the ability to only write to the SQL Server where the SSIS package is executing. For example – If you’re running a package to copy data from Server 1 to Server 2, then the package must run on Server 2. This destination is largely for backwards compatibility and should not be used. |
This completes the introduction of Data flow and it’s components in SSIS. In our next tutorial we will discuss more about the various transformations categories and there functionalities. I hope this grabs your interest towards Data Flow in SSIS. Your comments are welcome.