Microsoft Technologies: Data Flow Transformation categories

Aim – In this post we will learn abot Data Flow Transformation Categories in SSIS. Transformations are defined as a core component in the Data flow of a package in SSIS. It is that part of the data flow to which we apply our business logic to manipulate and modify the input data into the required formatbefore loading it to the destination. All the Data Flow Transformations are broadly classified into 2 types:-

Type 1 – Synchronous Transformations.

Type 2 – Asynchronous Transformations.

What is the difference between Synchronous and Asynchronous transformations?

Synchronous Transformations	Asynchronous Transformations
Processes each incoming row, modifies according to the required format and forward it.	Stores all the rows into the memory before it begins the process of modifying input data to the required output format.
No. of input rows = No. of output rows.	No. of input rows != No. of output rows
Output rows are in sync with Input rows i.e. 1:1 relationship.	Output rows are not in sync with Input rows
Less memory is required as they work on row by row basis.	More memory is required to store the whole data set as input and output buffers do not use the same memory.
Does not block the data flow in the pipeline.	Are also known as “Blocking Transformations” as they block the data flow in the pipeline until all the input rows are read into the memory.
Runs quite faster due to less memory required.	Runs generally slow as memory requirement is very high.
E.g. – Data Conversion Transformation- Input rows flow into the memory buffers and the same buffers come out in the required data format as Output.	E.g. – Sort Transformation- where the component has to process the complete set of rows in a single operation.

Further Asynchronous Transformations are divided into 2 categories:-

Partially blocking Transformations creates new memory buffers for the output of the transformation such as the Union All Transformation.
Fully blocking Transformations performs the same operation but cause a full block of the data such as the Sort and Aggregate Transformations.

Data Flow Transformation Categories are as follows:-

1. Row Transformations –

This transformation is used to update column values or create new columns.
It transforms each row present in the pipeline (Input).

Transformation Name	Description
Character Map	Modifies strings, typically for changes involving code pages.
Copy Column	Copies columns to new output columns.
Data Conversion	Performs data casting.
Derived Column	Allows the definition of new columns, or the overriding of values in existing columns, based on expressions.
OLE DB Command	Executes a command against a connection manager for each row. This transformation can behave as a destination.

2. Rowset Transformations –

These transformations are also called Asynchronous as they “dam the flow” of data i.e. Stores all the rows into the memory before it begins the process of modifying input data to the required output format.
As a result, a block is caused in the pipeline of data until the operation is completed.

Transformation Name	Description
Aggregate	Aggregates (summarizes) numeric columns
Percentage Sampling	Outputs a configured percentage of rows
Row Sampling	Outputs a configured number of rows
Sort	Sorts the data, and can be configured to remove duplicates.
Pivot	Pivots the data
Unpivot	Unpivots the data

3. Split and join Transformations –

Distribute rows to different outputs.
Create copies of the transformation inputs.
Join multiple inputs into one output.

Transformation Name	Description
Condition Split	Uses conditions to allocate rows to multiple outputs.
Look up	Performs a look up against a reference set of data. Typically used in the fact table for loading packages.
Merge	Unions two sorted inputs and retains sort order in the output.
Merge Join	Joins two sorted inputs, and can be configured as Inner, Left Outer or Full Outer.
Multicast	Broadcasts (duplicates) the rows to multiple outputs.
Union All	Unions two or more inputs to produce a single output.

4. Business Intelligence Transformations –

These are used to introduce data mining capabilities and data cleansing.
Cleaning data includes identification and removal of duplicate rows based on approximate matches.
These are only available with the Enterprise Edition

5. Script Transformations –

Extends the capabilities of the data flow.
Delivers optimized performance because it is precompiled.
Similar to the Script Task, it delivers the ability to introduce custom logic into the data flow using VB.NET or C#.NET.
It can be configured to behave as a Source, a Destination, or any type of Transformation.

6. Other Transformations –

Add Audit Information such as when the package was run and by whom.
Export and Import Data.
Stores the row count from the Data Flow into a variable

Transformation Name	Description
Audit	Adds audit information as columns to the output.
Cache Transform	Prepares caches for use by the Lookup transformation.
Export and Import Column	Extracts or load data from/to the file system.
Row Count	Stores the number of rows that have passed through the transformation into a variable.
Slowly Changing Dimensions	Produces SCD logic for type 1 and 2 changes.

This completes the basic of Data Flow Transformation Categories in SSIS. You can have as manytransformations as you need according to your business requirement with the use of Script Transformation. This feature of creating custom transformations via VB.NET or C# programming language, makes it a valuable tool in this competeting market of Business Intelligence. Further we will do the implementation of each Data Flow Transformation. We will see how it is configured to work according to our requirements. I hope you like my article on Data Flow Transformation Categories in SSIS. Get some time to post your queries and comments below.

Microsoft Technologies

Pages

Friday, July 17, 2015

Data Flow Transformation categories

About Me