Friday, July 17, 2015

Data Flow and it’s components

What do we mean by Data flow in SSIS?
Data flow encapsulates the data flow engine and consists of Source, Transformations and Target. The core strength of in MS SQL Server Integration Services (SSIS) is its capability to extract data into the server’s memory (Extraction), transform it (Transformation) and load it to an alternative destination (Loading). It means the data is fetched from the data sources, manipulated or modified through varioustransformations and loaded into the target destination. The data flow task in SSIS sends the data in series of buffer.
Example Scenario – Consider a conveyor belt in a factory. Raw material (Source data) is placed on the conveyor belt and passes through various processes (Transformations). Quality assurancecomponents might reject some material, in which case it can be scrapped (Logged) or fixed and blended back in with the quality material. Eventually, finished goods (Clean and Valid data) arrive at the end of the conveyor belt (Data warehouse).
The first step to implement a Data flow in a package is to add a data flow task to the Control flow of a package. Once the data flow task is included in the control flow of a package, we can start building the data flow of that package.
NOTE: –  Arrows connecting the data flow components to create a pipeline are known as Service paths where as arrows connecting components in control flow are known as Precedence constraints. At design time, Data viewers can also be attached to the Service paths to visualize the data.
STEP 1.  Creating a Data Flow will include:-
Source(s) to extract data from the databases.
  1. Adding Connection managers to connect to the data sources.
  2. Transformations to manipulate or modify the data according to the business need.
  3. Connecting data flow components by connecting the output of source to transformation and the output of transformation to destination.
  4. Destination(s) to load the data to data stores.
  5. Configuring components error outputs.
STEP 2.  What are the Components of Data flow?
Components includes –
  1. Data source(s).
  2. Transformations.
  3. Destination(s).
Component 1 –  Data Flow Sources
Data Flow SourcesDescription
OLE DB SourceConnects to OLE DB data source such as SQL Server, Access, Oracle, or DB2.
Excel SourceReceives data from Excel spreadsheets.
Flat File SourceConnects to a delimited or fixed-width file.
Raw File SourceDo not use connection manager. It produces a specialized binary file format for data that is in transit.
XML SourceDo not use connection manager. Retrieves data from an XML document.
ADO.NET SourceThis source is just like the OLE DB Source but only for ADO.NET based sources.
CDC SourceReads data out of a table that has change data capture (CDC) enabled. Used to
retrieve only rows that have changed over duration of time.
ODBC SourceReads data out of table by using an ODBC provider instead of OLE DB.
Component 2 –  Data Flow Transformations
Transformation CategoriesTransformations
Row TransformationsCharacter Map
Copy Column
Data Column
Derived Column
OLE DB Command
 Rowset Transformations   Aggregate
 Sort
 Pivot/Unpivot
 Percentage sampling/Row sampling
 Split and Join Transformations     Conditional split
 Look up
 Merge
 Merge join
 Multicast
 union All
 Business intelligence transformations     Data Mining Query
 Fuzzy Look Up
 Fuzzy Grouping
 Term Extraction
 Term Look up
 Script Transformations Script
 Other Transformations     Audit
Cache Transform
 Export Column
 Import Column
 Row Count
 Slowly Changing Dimension
Component 3–  Data Flow Destinations
Data Flow DestinationsDescription 
 ADO.NET DestinationExposes data to other external processes such as a .NET application.
 Data Reader DestinationAllows the ADO.NET Data Reader interface to consume data, similar to the ADO.NET Destination.
 OLE DB DestinationOutputs data to an OLE DB data connection like SQL Server, Oracle or Access.
 Excel DestinationOutputs data from the Data Flow to an Excel spreadsheet.
 Flat file DestinationEnables you to write data to a comma-delimited or fixed-width file.
 Raw file DestinationOutputs data in a binary format that can be used later as a Raw File Source. It’s usually used as an intermediate persistence mechanism.
 ODBC DestinationOutputs data to an OLE DB data connection like SQL Server, Oracle or Access.
 Record set DestinationWrites the records to an ADO record set. Once written, to an object variable, it can be looped over a variety of ways in SSIS like a Script Task or a Foreach Loop Container.
 SQL Server DestinationThe destination that you use to write data to SQL Server. This destination has many limitations, such as the ability to only write to the SQL Server where the SSIS package is executing. For example – If you’re running a package to copy data from Server 1 to Server 2, then the package must run on Server 2. This destination is largely for backwards compatibility and should not be used.
This completes the introduction of Data flow and it’s components in SSIS. In our next tutorial we will discuss more about the various transformations categories and there functionalities. I hope this grabs your interest towards Data Flow in SSIS. Your comments are welcome.