Friday, July 17, 2015

Data Flow Transformation categories

Aim – In this post we will learn abot Data Flow Transformation Categories in SSIS. Transformations are defined as a core component in the Data flow of a package in SSIS. It is that part of the data flow to which we apply our business logic to manipulate and modify the input data into the required formatbefore loading it to the destination. All the Data Flow Transformations are broadly classified into 2 types:-
Type 1 – Synchronous Transformations.
Type 2 – Asynchronous Transformations.
What is the difference between Synchronous and Asynchronous transformations?
Synchronous TransformationsAsynchronous Transformations
Processes each incoming row, modifies   according to the required format and forward it.Stores all the rows into the memory   before it begins the process of modifying input data to the required output   format.
No. of input rows = No. of output   rows.No. of input rows != No. of output   rows
Output rows are in sync with Input   rows i.e. 1:1 relationship.Output rows are not in sync with Input   rows
Less memory is required as they work   on row by row basis.More memory is required to store the   whole data set as input and output buffers do not use the same memory.
Does not block the data flow in the   pipeline.Are also known as “Blocking Transformations”   as they block the data flow in the pipeline until all the input rows are read   into the memory.
Runs quite faster due to less memory   required.Runs generally slow as memory requirement   is very high.
E.g. – Data Conversion Transformation-   Input rows flow into the memory buffers and the same buffers come out in the required   data format as Output.E.g. – Sort Transformation- where the   component has to process the complete set of rows in a single operation.
Further Asynchronous Transformations are divided into 2 categories:-
  1. Partially blocking Transformations creates new memory buffers for the output of the transformation such as the Union All Transformation.
  2. Fully blocking Transformations performs the same operation but cause a full block of the data such as the Sort and Aggregate Transformations.
Data Flow Transformation Categories are as follows:-
1. Row Transformations –Row Transformations
  • This transformation is used to update column values or create new columns.
  • It transforms each row present in the pipeline (Input).
Transformation NameDescription
Character MapModifies strings, typically for changes involving code pages.
Copy ColumnCopies columns to new output columns.
Data ConversionPerforms data casting.
Derived ColumnAllows the definition of new columns, or the overriding of values in existing columns, based on expressions.
OLE DB CommandExecutes a command against a connection manager for each row. This transformation can behave as a destination.
2. Rowset Transformations –
  • Rowset transformationsThese transformations are also called Asynchronous as they “dam the flow” of data i.e. Stores all the rows into the memory before it begins the process of modifying input data to the required output format.
  • As a result, a block is caused in the pipeline of data until the operation is completed.
Transformation   NameDescription
AggregateAggregates   (summarizes) numeric columns
Percentage   SamplingOutputs   a configured percentage of rows
Row   SamplingOutputs   a configured number of rows
SortSorts   the data, and can be configured to remove duplicates.
PivotPivots   the data
UnpivotUnpivots   the data
3. Split and join Transformations –
Split and Join Transformations
  • Distribute rows to different outputs.
  • Create copies of the transformation inputs.
  • Join multiple inputs into one output.
Transformation   NameDescription
Condition SplitUses conditions to allocate rows to multiple outputs.
Look upPerforms a look up against a reference   set of data. Typically used in the fact table for loading packages.
MergeUnions two sorted inputs and retains   sort order in the output.
Merge JoinJoins two sorted inputs, and can be   configured as Inner, Left Outer or Full Outer.
MulticastBroadcasts (duplicates) the rows to   multiple outputs.
Union AllUnions two or more inputs to produce a   single output.
4. Business Intelligence Transformations –
  • Business Intelligence TransformationsThese are used to introduce data mining capabilities and data cleansing.
  • Cleaning data includes identification and removal of duplicate rows based on approximate matches.
  • These are only available with the Enterprise Edition



5. Script Transformations –
  • Script TransformationsExtends the capabilities of the data flow.
  • Delivers optimized performance because it is precompiled.
  • Similar to the Script Task, it delivers the ability to introduce custom logic into the data flow using VB.NET or C#.NET.
  • It can be configured to behave as a Source, a Destination, or any type of Transformation.
6. Other Transformations –
  • Other TransformationsAdd Audit Information such as when the package was run and by whom.
  • Export and Import Data.
  • Stores the row count from the Data Flow into a variable
Transformation   NameDescription
AuditAdds audit information as columns to   the output.
Cache TransformPrepares caches for use by the Lookup   transformation.
Export and Import ColumnExtracts or load data from/to the file   system.
Row CountStores the number of rows that have   passed through the transformation into a variable.
Slowly Changing DimensionsProduces SCD logic for type 1 and 2   changes.
This completes the basic of  Data Flow Transformation Categories in SSIS. You can have as manytransformations as you need according to your business requirement with the use of Script Transformation. This feature of creating custom transformations via VB.NET or C# programming language, makes it a valuable tool in this competeting market of Business Intelligence. Further we will do the implementation of each Data Flow Transformation. We will see how it is configured to work according to our requirements. I hope you like my article on Data Flow Transformation Categories in SSIS. Get some time to post your queries and comments below.