Monday, July 27, 2015

ETL naming conventions - SSIS



During the developing of ETL for the Data Warehouse and the ETL Meta Data (EMD) framework a range of objects will need to be named and recorded in the EMD repository. To ensure that EMD objects are given meaningful and consistent names , the following set of naming conventions is provided. Below are the SSIS Naming Conventions we always use to append and prepend to the visual controls provided in BIDS. Well a truth to be said is that ETL using SSIS tends to become spaghetti or Amazonia. Worst that’s hard to debug once done deployed. Hence we used to stick to the naming conventions in all of the projects we worked on in order for us to better troubleshoot.

The acronyms below should be used at the beginning of the names of tasks to identify what type of task it is:

1. Control
Task
 Prefix
For Loop Container
FRLC
Foreach Loop Container
FELC
Sequence Container
SEQC
ActiveX Script
AXSC
Analysis Services Execute DDL
ASED
Analysis Services Processing
ASSP
Bulk Insert
BLKI
Data Flow
DTFL
Data Mining Query
DMQR
Execute DTS 2000 Package
EDPK
Execute Package
EPKG
Execute Processs
EPRC
Execute SQL
ESQL
File System
FSYS
FTP
FTP
Message Queue
MSMQ
Script
SCRP
Send Mail
SNDM
Transfer Database
TRDB
Transfer Error Messages
TREM
Transfer Jobs
TRJB
Transfer Logins
TRLI
Transfer Master Stored Procedures
TMSP
Transfer SQL Server Objects
TRSO
Web Service
WSRV
WMI Data Reader
WMID
WMI Event Watcher
WMIE
XML
XML
2. Maintenance
Task
Prefix
Back Up Database Task
BUDB_TSK
Check Database Integrity Task
CDBI_TSK
Execute SQL Server Agent Job Task
SSAJ_TSK
Execute T-SQL Statement Task
TSQL_TSK
History Cleanup Task
HCLN_TSK
Maintenance Clean Task
MCLN_TSK
Notify Operator Task
NTFO_TSK
Rebuild Index Task
RBDI_TSK
Reorganize Index Task
RORI_TSK
Shrink Database Task
SHDB_TSK
Update Statistics Task
USTA_TSK
3. Data Flow Components - Source
Component
Prefix
ADO NET Source
ADO_SRC
Excel Source
EX_SRC
Flat File Source
FF_SRC
OLE DB Source
OLE_SRC
Raw File Source
RF_SRC
XML Source
XML_SRC
4. Data Flow Components - Transformation
Component
Prefix
Aggregate
AGRG
Audit
AUDT
Cache Transform
CFRM
Character Map
CHRM
Conditional Split
CSPL
Copy Column
CPYC
Data Conversion
DCNV
Data Mining Query
DMQR
Derived Column
DERC
Export Column
EXPC
Fuzzy Grouping
FZGR
Fuzzy Lookup
FZLK
Import Column
IMPC
Lookup
LKUP
Merge
MRGE
Merge Join
MRGJ
Multicast
MLTC
OLE DB Command
OLEC
Percentage Sampling
PSMP
Pivot
PIVT
Row Count
RCNT
Raw Sampling
RSMP
Script Component
SCRC
Slowly Changing Dimension
SLCD
Sort
SORT
Term Extraction
TEXT
Term Lookup
TLKP
Union All
UALL
Unpivot
UPVT
5. Data Flow Components - Destination
Component
Prefix
ADO Net Destination
ADO_DST
Data Mining Model Training
DMMT_DST
DataReader Destination
DR_DST
Dimension Processing
DP_DST
Excel Destination
EXL_DST
Flat File Destination
FF_DST
OLE DB Destination
OLE_DST
Partition Processing
PP_DST
Raw File Destination
RF_DST
Recordset Destination
RS_DST
SQL Server Compact Destination
SC_DST
SQL Server Destination
SS_DST
SQL Server Mobile Destination
SM_DST

Nonblocking, Partial Blocking and Blocking Transformations In SSIS



Classifying Data Flow Transformations on performance considerations:

The three classifications of Data Flow transformations that are based mainly on
performance considerations.

Nonblocking Synchronous Row-Based Transformations:

These transformations work on a row-by-row basis and modify the data by changing
the data in columns, adding new columns, or removing columns from the data rows,
but these components do not add new rows to the data flow. The rows that arrive at the
input are those that leave at the output of these transformations. These transformations
have synchronous outputs and make data rows available to the downstream components
straightaway. In fact, these transformations do not cause data to move from one buffer
to another; instead, they traverse over the data buffer and make the changes to the data
columns. So these transformations are efficient and lightweight from the process point
of view. The transformations that readily pass the data to the downstream components are classified as Row Transformations.

The following Row Transformations are examples of Nonblocking synchronous row-based transformations:

Ø  Audit transformation
Ø  Character Map transformation
Ø  Conditional Split transformation
Ø  Copy Column transformation
Ø  Data Conversion transformation
Ø  Derived Column transformation
Ø  Export Column transformation
Ø  Import Column transformation
Ø  Lookup transformation
Ø  Multicast transformation
Ø  OLE DB Command transformation
Ø  Percentage Sampling transformation
Ø  Row Count transformation
Ø  Row Sampling transformation
Ø  Script Component transformation configured as Nonblocking Synchronous
Row–based transformation
Ø  Slowly Changing Dimension transformation

Partially Blocking Asynchronous Row-Set-Based Transformations:

Integration Services provides some transformations that essentially add new rows in the
data flow and hold on to the data buffers before they can perform the transformation.
The nature of such transformations is defined as Partially Blocking because these
transformations hold on to data for a while before they start releasing buffers. As
these transformations add new rows in the data flow, they are asynchronous in
nature. For example, a Merge transformation combines two sorted data sets that may
be the same data, but it essentially adds new rows in the data flow. The time taken
by this transformation before starting to release the buffers depends on when these
components receive matching data from both inputs.

Following is a list of Partially Blocking asynchronous row set-based transformations:

Ø  Data Mining Query transformation
Ø  Merge Join transformation
Ø  Merge transformation
Ø  Pivot transformation
Ø  Term Lookup transformation
Ø  UnPivot transformation
Ø  Union All transformation
Ø  Script Component transformation configured as Partially Blocking Asynchronous
Row-set-based transformation

Blocking Asynchronous Full Row-Set-Based Transformations:

These transformations require all the rows to be assembled before they can perform
their operation. These transformations can also change the number of rows in the
data flow and have asynchronous outputs. For example, the Aggregate transformation
needs to see each and every row to perform aggregations such as summation or
finding an average. Similarly, the Sort transformation needs to see all the rows to
sort them in proper order. This requirement of collecting all rows before performing
a transformation operation puts a heavy load on both processor and memory of the
server. You can understand from the nature of the function they perform that these
transformations block data until they have seen all the rows and do not pass the data to
the downstream component until they have finished their operation.

The following are Blocking asynchronous full row-set-based transformations:

Ø  Aggregate transformation
Ø  Fuzzy Grouping transformation
Ø  Fuzzy Lookup transformation
Ø  Sort transformation
Ø  Term Extraction transformation



Resources:
For more detailed information regarding this study, please see
Ø  Microsoft SQL Server 2008 Integration Services by Ashwani Nanda
Ø  Microsoft Books Online