Monday, August 17, 2015

SSIS Interview Questions and Answers - Part1

Q1.What is SSIS? How it is related with SQL Server.

SQL Server Integration Services (SSIS) is a component of SQL Server which can be used to perform a wide range of Data Migration and ETL operations. SSIS is a component in MSBI process of SQL Server.

This is a platform for Integration and Workflow applications. It is known for a fast and flexible OLTP and OLAP extensions used for data extraction, transformation, and loading (ETL). The tool may also be used to automate maintenance of SQL Server databases and multidimensional data sets.


Q2: What are the tools associated with SSIS?

We use Business Intelligence Development Studio (BIDS) OR SQL Server Development Tools (SSDT) and SQL Server Management Studio (SSMS) to work with Development of SSIS Projects.
We use SSMS to manage the SSIS Packages and Projects.


Q3: What is the difference between BIDS and SSDT?


Q4: What are the differences between DTS and SSIS?

Data Transformation Services

SQL Server Integration Services

Limited Error Handling
Complex and powerful Error Handling
Message Boxes in ActiveX Scripts
Message Boxes in .NET Scripting
No Deployment Wizard
Interactive Deployment Wizard
Limited Set of Transformation
Good number of Transformations
NO BI functionality
Complete BI Integration


Q5. Explain architecture of SSIS?

SSIS architecture consists of four key parts:

a) Integration Services service: monitors running Integration Services packages and manages the storage of packages.
b) Integration Services object model: includes managed API for accessing Integration Services tools, command-line utilities, and custom applications.
c) Integration Services runtime and run-time executables: it saves the layout of packages, runs packages, and provides support for logging, breakpoints, configuration, connections, and transactions. The Integration Services run-time executables are the package, containers, tasks, and event handlers that Integration Services includes, and custom tasks.
d) Data flow engine: provides the in-memory buffers that move data from source to destination.






Q6: What is a workflow in SSIS 2014?

Workflow is a set of instructions on to specify the Program Executor on how to execute tasks and containers within SSIS Packages.


Q7. What is an SSIS Package?

SSIS package is a collection of variety of tasks to perform the Extract, Transform and Load data. Though the primary functionality of SSIS packages is for ETL, they can be used for other maintenance tasks such as database backups, Index rebuild, delete old backups etc.


Q8. How do you create SSIS Packages? OR What tools do you use for creating SSIS Packages?

You can create SSIS packages using Business Intelligence Development Studio in short “BIDS”. It can be installed using the same SQL Server installation media that is used to Install SQL Server. If all the features are selected when choosing the features at the time of installation, then BIDS is installed along with Management Studio, Books Online and other tools..


Q9: What is the control flow?

A control flow consists of one or more tasks and containers that execute when the package runs. To control order or define the conditions for running the next task or container in the package control flow, we use precedence constraints to connect the tasks and containers in a package. A subset of tasks and containers can also be grouped and run repeatedly as a unit within the package control flow. SQL Server Integration Services (SSIS) provides three different types of control flow elements: Containers that provide structures in packages, Tasks that provide functionality, and Precedence Constraints that connect the executables, containers, and tasks into an ordered control flow.


Q10: What is the data flow?

Data flow consists of the sources and destinations that extract and load data, the transformations that modify and extend data, and the paths that link sources, transformations, and destinations The Data Flow task is the executable within the SSIS package that creates, orders, and runs the data flow. A separate instance of the data flow engine is opened for each Data Flow task in a package. Data Sources, Transformations, and Data Destinations are the three important categories in the Data Flow.


Q11. What is the Data Flow Engine?

The Data Flow Engine, also called the SSIS pipeline engine, is responsible for managing the flow of data from the source to the destination and performing transformations (lookups, data cleansing etc.). Data flow uses memory oriented architecture, called buffers, during the data flow and transformations which allows it to execute extremely fast. This means the SSIS pipeline engine pulls data from the source, stores it in buffers (in-memory), does the requested transformations in the buffers and writes to the destination. The benefit is that it provides the fastest transformation as it happens in memory and we don't need to stage the data for transformations in most cases.


Q12. What is a Transformation?

A transformation simply means bringing in the data in a desired format. For example you are pulling data from the source and want to ensure only distinct records are written to the destination, so duplicates are removed. Anther example is if you have master/reference data and want to pull only related data from the source and hence you need some sort of lookup. There are around 30 transformation tasks available and this can be extended further with custom built tasks if needed.


Q13. What is a Task?

A task is very much like a method of any programming language which represents or carries out an individual unit of work. There are broadly two categories of tasks in SSIS, Control Flow tasks and Database Maintenance tasks. All Control Flow tasks are operational in nature except Data Flow tasks. Although there are around 30 control flow tasks which you can use in your package you can also develop your own custom tasks with your choice of .NET programming language.


Q14. What is a Precedence Constraint and what types of Precedence Constraint are there?

SSIS allows you to place as many as tasks you want to be placed in control flow. You can connect all these tasks using connectors called Precedence Constraints. Precedence Constraints allow you to define the logical sequence of tasks in the order they should be executed. You can also specify a condition to be evaluated before the next task in the flow is executed.
These are the types of precedence constraints and the condition could be either a constraint, an expression or both
Success (next task will be executed only when the last task completed successfully) or
Failure (next task will be executed only when the last task failed) or
Complete (next task will be executed no matter the last task was completed or failed).


Q15. What is a container and how many types of containers are there?

A container is a logical grouping of tasks which allows you to manage the scope of the tasks together.

These are the types of containers in SSIS:

Sequence Container - Used for grouping logically related tasks together
For Loop Container - Used when you want to have repeating flow in package
For Each Loop Container - Used for enumerating each object in a collection; for example a record set or a list of files.
Apart from the above mentioned containers, there is one more container called the Task Host Container which is not visible from the IDE, but every task is contained in it (the default container for all the tasks). 


Q16. What are variables and what is variable scope? 

A variable is used to store values. There are basically two types of variables, System Variable (like ErrorCode, ErrorDescription, PackageName etc) whose values you can use but cannot change and User Variable which you create, assign values and read as needed. A variable can hold a value of the data type you have chosen when you defined the variable.
Variables can have a different scope depending on where it was defined. For example you can have package level variables which are accessible to all the tasks in the package and there could also be container level variables which are accessible only to those tasks that are within the container.


Q17. What is the transaction support feature in SSIS?

When you execute a package, every task of the package executes in its own transaction. What if you want to execute two or more tasks in a single transaction? This is where the transaction support feature helps. You can group all your logically related tasks in single group. Next you can set the transaction property appropriately to enable a transaction so that all the tasks of the package run in a single transaction. This way you can ensure either all of the tasks complete successfully or if any of them fails, the transaction gets roll-backed too. 


Q18. What properties do you need to configure in order to use the transaction feature in SSIS? 

Suppose you want to execute 5 tasks in a single transaction, in this case you can place all 5 tasks in a Sequence Container and set the TransactionOption and IsolationLevel properties appropriately.

The TransactionOption property expects one of these three values:
Supported - The container/task does not create a separate transaction, but if the parent object has already initiated a transaction then participate in it
Required - The container/task creates a new transaction irrespective of any transaction initiated by the parent object
NotSupported - The container/task neither creates a transaction nor participates in any transaction initiated by the parent object
Isolation level dictates how two more transaction maintains consistency and concurrency when they are running in parallel. To learn more about Transaction and Isolation Level, refer to this tip

Q19. When I enabled transactions in an SSIS package, it failed with this exception: "The Transaction Manager is not available. The DTC transaction failed to start." What caused this exception and how can it be fixed?

SSIS uses the MS DTC (Microsoft Distributed Transaction Coordinator) Windows Service for transaction support. As such, you need to ensure this service is running on the machine where you are actually executing the SSIS packages or the package execution will fail with the exception message as indicated in this question. 


Q20. What is event handling in SSIS?

Like many other programming languages, SSIS and its components raise different events during the execution of the code. You can write an even handler to capture the event and handle it in a few different ways. For example consider you have a data flow task and before execution of this data flow task you want to make some environmental changes such as creating a table to write data into, deleting/truncating a table you want to write, etc. Along the same lines, after execution of the data flow task you want to cleanup some staging tables. In this circumstance you can write an event handler for the OnPreExcute event of the data flow task which gets executed before the actual execution of the data flow. Similar to that you can also write an event handler for OnPostExecute event of the data flow task which gets executed after the execution of the actual data flow task. Please note, not all the tasks raise the same events as others. There might be some specific events related to a specific task that you can use with one object and not with others. 


Q21. How do you write an event handler?

First, open your SSIS package in Business Intelligence Development Studio (BIDS) and click on the Event Handlers tab. Next, select the executable/task from the left side combo-box and then select the event you want to write the handler in the right side combo box. Finally, click on the hyperlink to create the event handler. So far you have only created the event handler, you have not specified any sort of action. For that simply drag the required task from the toolbox on the event handler designer surface and configure it appropriately. To learn more about event handling, click here.

Q22. What is the DisableEventHandlers property used for?

Consider you have a task or package with several event handlers, but for some reason you do not want event handlers to be called. One simple solution is to delete all of the event handlers, but that would not be viable if you want to use them in the future. This is where you can use the DisableEventHandlers property. You can set this property to TRUE and all event handlers will be disabled. Please note with this property you simply disable the event handlers and you are not actually removing them. This means you can set this value to FALSE and the event handlers will once again be executed. 

Q23. What is SSIS validation?

SSIS validates the package and all of it's tasks to ensure it has been configured correctly. With a given set of configurations and values, all the tasks and package will execute successfully. In other words, during the validation process, SSIS checks if the source and destination locations are accessible and the meta data about the source and destination tables are stored with the package are correct, so that the task will not fail if executed. The validation process reports warnings and errors depending on the validation failure detected. For example, if the source/destination tables/columns get changed/dropped it will show as error. Whereas if you are accessing more columns than used to write to the destination object this will be flagged as a warning. To learn about validation click here.

Q24. Define design time validation versus run time validation.

Design time validation is performed when you are opening your package in BIDS whereas run time validation is performed when you are actually executing the package. 


Q25. Define early validation (package level validation) versus late validation (component level validation).

When a package is executed, the package goes through the validation process. All of the components/tasks of package are validated before actually starting the package execution. This is called early validation or package level validation. During execution of a package, SSIS validates the component/task again before executing that particular component/task. This is called late validation or component level validation. 


Q26. What is DelayValidation and what is the significance?

As I said before, during early validation all of the components of the package are validated along with the package itself. If any of the component/task fails to validate, SSIS will not start the package execution. In most cases this is fine, but what if the second task is dependent on the first task? For example, say you are creating a table in the first task and referring to the same table in the second task? When early validation starts, it will not be able to validate the second task as the dependent table has not been created yet. Keep in mind that early validation is performed before the package execution starts. So what should we do in this case? How can we ensure the package is executed successfully and the logically flow of the package is correct? This is where you can use the DelayValidation property. In the above scenario you should set the DelayValidation property of the second task to TRUE in which case early validation i.e. package level validation is skipped for that task and that task would only be validated during late validation i.e. component level validation. Please note using the DelayValidation property you can only skip early validation for that specific task, there is no way to skip late or component level validation.


Q27. How would you do Logging in SSIS?

Logging Configuration provides an inbuilt feature which can log the detail of various events like onError, onWarning etc to the various options say a flat file, SqlServer table, XML or SQL Profiler.


Q28. How would you do Error Handling?

A SSIS package could mainly have two types of errors
a) Procedure Error: Can be handled in Control flow through the precedence control and redirecting the execution flow.
b) Data Error: is handled in DATA FLOW TASK buy redirecting the data flow using Error Output of a component.


Q29. How to pass property value at Run time? How do you implement Package Configuration?

A property value like connection string for a Connection Manager can be passed to the package using package configurations. Package Configuration provides different options like XML File, Environment Variables, SQL Server Table, Registry Value or Parent package variable.


Q30. How would you deploy a SSIS Package on production?

1. Create deployment utility by setting its property as true.
2. It will be created in the bin folder of the solution as soon as package is build.
3. Copy all the files in the utility and use manifest file to deploy it on the Prod.


Q31. How would you pass a variable value to Child Package?

Very frequent Question which looks so complicated to programmers.

Actually passing a variable value to a child package is very trivial task. We can pass on the value by configuring parent variable in package configuration but there is an easy way of achieve this and the fact lies beneath the fundamental principle of Variable Scope.

If you call a Child package then it is like a container itself and all the variables defined in above hierarchy will be accessible in the Child package.

Let me show this with an example in which I will declare a variable "ParentVar" in my parent package and call a Child package which will access "ParentVar" and display in a msgbox.

1) Parent: Create Parent Package and declare a variable "ParentVar"



2) Child: Create a Child package and use a script task and define readonly variable as ParentVar



3) Child: Now in Script you can use ParentVar like any other variable.eg I am using to display it in a msgbox. I would suggest to create another child package variable and assign Parent package variable value to it and use it in the child package variable through out the package.



4) Child: Whole Child package will look like



5) Parent: Now in parent package call the child package through Execute Package task.
The Parent package will look like



6) Result: Execute Parent package.it will in turn call child package and it will display msgbox


Here was a simple method to use parent package variable in Child package.


Q32. What are new features in SSIS 2008?


1). Improved Parallelism of Execution Trees:

The biggest performance improvement in the SSIS 2008 is incorporation of parallelism in the processing of execution tree. In SSIS 2005, each execution tree used a single thread whereas in SSIS 2008 , the Data flow engine is redesigned to utilize multiple threads and take advantage of dynamic scheduling to execute multiple components in parallel, including components within the same execution tree

2) Any .NET language for Scripting:

SSIS 2008 is incorporated with new Visual Studio Tool for Application(VSTA) scr
ipting engine. Advantage of VSTA is it enables user to use any .NET language for scripting.

3) New ADO.NET Source and Destination Component:

SSIS 2008 gets a new Source and Destination Component for ADO.NET Record sets.

4) Improved Lookup Transformation:

In SSIS 2008, the Lookup Transformation has faster cache loading and lookup operations. It has new caching options, including the ability for the reference dataset to use a cache file (.caw) accessed by the Cache Connection Manager. In addition same cache can be shared between multiple Lookup Transformations.

5) New Data Profiling Task:

SSIS 2008 has a new debugging aid Data Profiling Task that can help user analyze the data flows occurring in the package. In many cases, execution errors are caused by unexpected variations in the data that is being transferred. The Data Profiling Task can help users to discover the course of these errors by giving better visibility into the data flow.

6) New Connections Project Wizard:

One of the main usability enhancement to SSIS 2008 is the new Connections Project Wizard. The Connections Project Wizard guides user through the steps required to create source and destinations.


Q33. What are SSIS Connection Managers?

When we talk of integrating data, we are actually pulling data from different sources and writing it to a destination. But how do you get connected to the source and destination systems? This is where the connection managers come into the picture. Connection manager represent a connection to a system which includes data provider information, the server name, database name, authentication mechanism, etc.


Q34. What are a source and destination adapters? 

A source adaptor basically indicates a source in Data Flow to pull data from. The source adapter uses a connection manager to connect to a source and along with it you can also specify the query method and query to pull data from the source.
Similar to a source adaptor, the destination adapter indicates a destination in the Data Flow to write data to. Again like the source adapter, the destination adapter also uses a connection manager to connect to a target system and along with that you also specify the target table and writing mode, i.e. write one row at a time or do a bulk insert as well as several other properties.
Please note, the source and destination adapters can both use the same connection manager if you are reading and writing to the same database. 


Q35. What is the Data Path and how is it different from a Precedence Constraint?

Data Path is used in a Data Flow task to connect to different components of a Data Flow and show transition of the data from one component to another. A data path contains the meta information of the data flowing through it, such as the columns, data type, size, etc. When we talk about differences between the data path and precedence constraint; the data path is used in the data flow, which shows the flow of data. Whereas the precedence constraint is used in control flow, which shows control flow or transition from one task to another task.


Q36. What is a Data Viewer utility and what it is used for?

The data viewer utility is used in Business Intelligence Development Studio during development or when troubleshooting an SSIS Package. The data viewer utility is placed on a data path to see what data is flowing through that specific data path during execution. The data viewer utility displays rows from a single buffer at a time, so you can click on the next or previous icons to go forward and backward to display data. Check out the Data Viewer enhancements in SQL Server 2012.


Q37. What is an SSIS breakpoint? How do you configure it? How do you disable or delete it?

A breakpoint allows you to pause the execution of the package in Business Intelligence Development Studio during development or when troubleshooting an SSIS Package. You can right click on the task in control flow, click on Edit Breakpoint menu and from the Set Breakpoint window, you specify when you want execution to be halted/paused. For example OnPreExecute, OnPostExecute, OnError events, etc. To toggle a breakpoint, delete all breakpoints and disable all breakpoints go to the Debug menu and click on the respective menu item. You can event specify different conditions to hit the breakpoint as well.


Q38. What is SSIS event logging?

Like any other modern programming language, SSIS also raises different events during package execution life cycle. You can enable or write these events to trace the execution of your SSIS package and its tasks. You can also can write your custom message as a custom log. You can enable event logging at the package level as well as at the tasks level. You can also choose any specific event of a task or a package to be logged. This is essential when you are troubleshooting your package and trying to understand a performance problem or root cause of a failure. 


Q39. What are the different SSIS log providers?

There are several places where you can log execution data generated by an SSIS event log:
SSIS log provider for Text files
SSIS log provider for Windows Event Log
SSIS log provider for XML files
SSIS log provider for SQL Profiler
SSIS log provider for SQL Server, which writes the data to the msdb..sysdtslog90 or msdb..sysssislog table depending on the SQL Server version. 


Q40. How do you enable SSIS event logging? 

SSIS provides a granular level of control in deciding what to log and where to log. To enable event logging for an SSIS Package, right click in the control flow area of the package and click on Logging. In the Configure SSIS Logs window you will notice all the tasks of the package are listed on the left side of the tree view. You can specifically choose which tasks you want to enable logging. On the right side you will notice two tabs; on the Providers and Logs tab you specify where you want to write the logs, you can write it to one or more log providers together. On the Details tab you can specify what events do you want to log for the selected task.
Please note, enabling event logging is immensely helpful when you are troubleshooting a package, but also incurs additional overhead on SSIS in order to log the events and information. Hence you should only enabling event logging when needed and only choose events which you want to log. Avoid logging all the events unnecessarily. 


Q41. What is the LoggingMode property?

SSIS packages and all of the associated tasks or components have a property called LoggingMode. This property accepts three possible values: Enabled - to enable logging of that component, Disabled - to disable logging of that component and UseParentSetting - to use parent's setting of that component to decide whether or not to log the data.


Q42. How is SSIS runtime engine different from the SSIS dataflow pipeline engine? 

The SSIS Runtime Engine manages the workflow of the packages during runtime, which means its role is to execute the tasks in a defined sequence. As you know, you can define the sequence using precedence constraints. This engine is also responsible for providing support for event logging, breakpoints in the BIDS designer, package configuration, transactions and connections. The SSIS Runtime engine has been designed to support concurrent/parallel execution of tasks in the package.
The Dataflow Pipeline Engine is responsible for executing the data flow tasks of the package. It creates a dataflow pipeline by allocating in-memory structure for storing data in-transit. This means, the engine pulls data from source, stores it in memory, executes the required transformation in the data stored in memory and finally loads the data to the destination. Like the SSIS runtime engine, the Dataflow pipeline has been designed to do its work in parallel by creating multiple threads and enabling them to run multiple execution trees/units in parallel. 


Q43. How is a synchronous (non-blocking) transformation different from an asynchronous (blocking) transformation in SQL Server Integration Services? 

A transformation changes the data in the required format before loading it to the destination or passing the data down the path. The transformation can be categorized in Synchronous and Asynchronous transformation.
A transformation is called synchronous when it processes each incoming row (modify the data in required format in place only so that the layout of the result-set remains same) and passes them down the hierarchy/path. It means, output rows are synchronous with the input rows (1:1 relationship between input and output rows) and hence it uses the same allocated buffer set/memory and does not require additional memory. Please note, these kinds of transformations have lower memory requirements as they work on a row-by-row basis (and hence run quite faster) and do not block the data flow in the pipeline. Some of the examples are : Lookup, Derived Columns, Data Conversion, Copy column, Multicast, Row count transformations, etc.
A transformation is called Asynchronous when it requires all incoming rows to be stored locally in the memory before it can start producing output rows. For example, with an Aggregate Transformation, it requires all the rows to be loaded and stored in memory before it can aggregate and produce the output rows. This way you can see input rows are not in sync with output rows and more memory is required to store the whole set of data (no memory reuse) for both the data input and output. These kind of transformations have higher memory requirements (and there are high chances of buffer spooling to disk if insufficient memory is available) and generally runs slower. The asynchronous transformations are also called "blocking transformations" because of its nature of blocking the output rows unless all input rows are read into memory. To learn more about it click here. 


Q44. What is the difference between a partially blocking transformations versus a fully blocking transformation in SQL Server Integration Services? 

Asynchronous transformations, as discussed in last question, can be further divided in two categories depending on their blocking behavior:
Partially Blocking Transformations do not block the output until a full read of the inputs occur. However, they require new buffers/memory to be allocated to store the newly created result-set because the output from these kind of transformations differs from the input set. For example, Merge Join transformation joins two sorted inputs and produces a merged output. In this case if you notice, the data flow pipeline engine creates two input sets of memory, but the merged output from the transformation requires another set of output buffers as structure of the output rows which are different from the input rows. It means the memory requirement for this type of transformations is higher than synchronous transformations where the transformation is completed in place.
Full Blocking Transformations, apart from requiring an additional set of output buffers, also blocks the output completely unless the whole input set is read. For example, the Sort Transformation requires all input rows to be available before it can start sorting and pass down the rows to the output path. These kind of transformations are most expensive and should be used only as needed. For example, if you can get sorted data from the source system, use that logic instead of using a Sort transformation to sort the data in transit/memory. To learn more about it click here. 


Q45. What is an SSIS execution tree and how can I analyze the execution trees of a data flow task? 

The work to be done in the data flow task is divided into multiple chunks, which are called execution units, by the dataflow pipeline engine. Each represents a group of transformations. The individual execution unit is called an execution tree, which can be executed by separate thread along with other execution trees in a parallel manner. The memory structure is also called a data buffer, which gets created by the data flow pipeline engine and has the scope of each individual execution tree. An execution tree normally starts at either the source or an asynchronous transformation and ends at the first asynchronous transformation or a destination. During execution of the execution tree, the source reads the data, then stores the data to a buffer, executes the transformation in the buffer and passes the buffer to the next execution tree in the path by passing the pointers to the buffers. To learn more about it click here.
To see how many execution trees are getting created and how many rows are getting stored in each buffer for a individual data flow task, you can enable logging of these events of data flow task: PipelineExecutionTrees, PipelineComponentTime, PipelineInitialization, BufferSizeTunning, etc. To learn more about events that can be logged click here


Q46. How can an SSIS package be scheduled to execute at a defined time or at a defined interval per day?

You can configure a SQL Server Agent Job with a job step type of SQL Server Integration Services Package, the job invokes the dtexec command line utility internally to execute the package. You can run the job (and in turn the SSIS package) on demand or you can create a schedule for a one time need or on a reoccurring basis. Refer to this tip to learn more about it.


Q47. What is an SSIS Proxy account and why would you create it?

When we try to execute an SSIS package from a SQL Server Agent Job it fails with the message "Non-SysAdmins have been denied permission to run DTS Execution job steps without a proxy account". This error message is generated if the account under which SQL Server Agent Service is running and the job owner is not a sysadmin on the instance or the job step is not set to run under a proxy account associated with the SSIS subsystem. Refer tothis tip to learn more about it.


Q48. How can you configure your SSIS package to run in 32-bit mode on 64-bit machine when using some data providers which are not available on the 64-bit platform?

In order to run an SSIS package in 32-bit mode the SSIS project property Run64BitRuntime needs to be set to "False". The default configuration for this property is "True". This configuration is an instruction to load the 32-bit runtime environment rather than 64-bit, and your packages will still run without any additional changes. The property can be found under SSIS Project Property Pages -> Configuration Properties -> Debugging.


Q49. What do you mean by Microsoft Business Intelligence and what components of SQL Server supports this?

Microsoft defines its BI solution as a platform to provide better and accurate information in a easily understandable format for quicker and better decision making. It consists of BI tools from SQL Server, SQL Server Integration Services (SSIS), SQL Server Analysis Services (SSAS), SQL Server Reporting Services (SSRS), Microsoft SharePoint and its Office products.


Q50. What is Breakpoint and Checkpoint in SSIS Package?

§ Breakpoints in SSIS packages enables us to review the values of the variables, or other components of an SSIS package.

§ Checkpoints in SSIS packages enables us to rerun a SSIS package from the point of failure, so that you do not have to rerun that portion of the package which was successfully run.


Q51. What is the various tabs available in a SSIS Project at Design time?

§ Control Flow.
§ Data Flow.
§ Event Handlers.
§ Package Explorer.


Q52. What are some of the events on which you can add an Event Handler in an SSIS Package?

§ OnError
§ OnPostExecute
§ OnProgress
§ OnTaskFailed
§ OnWarning


Q53. I need to have more than 1 destination in a Data Flow task, how can that be achieved?

Using Multicast Data Flow Transformation, it is possible to direct the output to more than 1 destination.


Q54. How to you deploy an SSIS Package?


Q55. What are the different destinations SSIS packages can be saved / stored for deployment?

SSIS packages can be saved inside the SQL Server or File System destination.

Q56. Where inside a SQL Server are the SSIS stored?

SSIS packages are stored inside MSDB database.


Q57. How does Error-Handling work in SSIS?

When a data flow component applies a transformation to column data, extracts data from sources, or loads data into destinations, errors can occur. Errors frequently occur because of unexpected data values.
Type of typical Errors in SSIS:
-Data Connection Errors, which occur incase the connection manager cannot be initialized with the connection string. This applies to both Data Sources and Data Destinations along with Control Flows that use the Connection Strings.
-Data Transformation Errors, which occur while data is being transformed over a Data Pipeline from Source to Destination.
-Expression Evaluation errors, which occur if expressions that are evaluated at run time perform invalid


Q58. What is environment variable in SSIS?

An environment variable configuration sets a package property equal to the value in an environment variable.
Environmental configurations are useful for configuring properties that are dependent on the computer that is executing the package.


Q59: What are the Transformations available in SSIS?

AGGREGATE - It applies aggregate functions to Record Sets to produce new output records from aggregated values.
AUDIT - Adds Package and Task level Metadata - such as Machine Name, Execution Instance, Package Name, Package ID, etc..
CHARACTERMAP - Performs SQL Server column level string operations such as changing data from lower case to upper case.
CONDITIONALSPLIT– Separates available input into separate output pipelines based on Boolean Expressions configured for each output.
COPYCOLUMN - Add a copy of column to the output we can later transform the copy keeping the original for auditing.
DATACONVERSION - Converts columns data types from one to another type. It stands for Explicit Column Conversion.
DATAMININGQUERY– Used to perform data mining query against analysis services and manage Predictions Graphs and Controls.
DERIVEDCOLUMN - Create a new (computed) column from given expressions.
EXPORTCOLUMN– Used to export a Image specific column from the database to a flat file.
FUZZYGROUPING– Used for data cleansing by finding rows that are likely duplicates.
FUZZYLOOKUP - Used for Pattern Matching and Ranking based on fuzzy logic.
IMPORTCOLUMN - Reads image specific column from database onto a flat file.
LOOKUP - Performs the lookup (searching) of a given reference object set against a data source. It is used for exact matches only.
MERGE - Merges two sorted data sets into a single data set into a single data flow.
MERGEJOIN - Merges two data sets into a single dataset using a join junction.
MULTICAST - Sends a copy of supplied Data Source onto multiple Destinations.
ROWCOUNT - Stores the resulting row count from the data flow / transformation into a variable.
ROWSAMPLING - Captures sample data by using a row count of the total rows in dataflow specified by rows or percentage.
UNIONALL - Merge multiple data sets into a single dataset.
PIVOT– Used for Normalization of data sources to reduce analomolies by converting rows into columns
UNPIVOT– Used for denormalizing the data structure by converts columns into rows incase of building Data Warehouses.


Q60. How to log SSIS Executions?

SSIS includes logging features that write log entries when run-time events occur and can also write custom messages. This is not enabled by default. Integration Services supports a diverse set of log providers, and gives you the ability to create custom log providers. The Integration Services log providers can write log entries to text files, SQL Server Profiler, SQL Server, Windows Event Log, or XML files. Logs are associated with packages and are configured at the package level. Each task or container in a package can log information to any package log. The tasks and containers in a package can be enabled for logging even if the package itself is not.


Q61. How do you deploy SSIS packages?

SSIS Project BUILD provides a Deployment Manifest File. We need to run the manifest file and decide whether to deploy this onto File System or onto SQL Server [ msdb]. SQL Server Deployment is very faster and more secure then File System Deployment. Alternatively, we can also import the package from SSMS from File System or SQL Server.


Q62: What are variables and what is variable scope?

Variables store values that a SSIS package and its containers, tasks, and event handlers can use at run time. The scripts in the Script task and the Script component can also use variables. The precedence constraints that sequence tasks and containers into a workflow can use variables when their constraint definitions include expressions. Integration Services supports two types of variables: user-defined variables and system variables. User-defined variables are defined by package developers, and system variables are defined by Integration Services. You can create as many user-defined variables as a package requires, but you cannot create additional system variables.


Q63. What are the different types of Data flow components in SSIS?

There are 3 data flow components in SSIS.
1. Sources
2. Transformations
3. Destinations


Q64. Explain Audit Transformation?

It allows you to add auditing information as required in auditing world specified by HIPPA and Sarbanes-Oxley (SOX). Auditing options that you can add to transformed data through this transformation are :
1. Execution of Instance GUID : ID of execution instance of the package
2. PackageID : ID of the package
3. PackageName
4.VersionID : GUID version of the package
5. Execution StartTime
6.MachineName
7.UserName
8.TaskName
9.TaskID : uniqueidentifier type of the data flow task that contains audit transformation


Q65. Explain Copy column Transformation?

This component simply copies a column to another new column. Just like ALIAS Column in T-Sql.


Q66. Explain Derived column Transformation?

Derived column creates new column or put manipulation of several columns into new column. You can directly copy existing or create a new column using more than one column also.


Q67. Explain Multicast Transformation?

This transformation sends output to multiple output paths with no conditional as Conditional Split does. Takes ONE Input and makes the COPY of data and passes the same data through many Outputs. In simple give one input and take many outputs of the same data.


Q68.What is a Task?

A task is very much like a method of any programming language which represents or carries out an individual unit of work. There are broadly two categories of tasks in SSIS, Control Flow tasks and Database Maintenance tasks. All Control Flow tasks are operational in nature except Data Flow tasks. Although there are around 30 control flow tasks which you can use in your package you can also develop your own custom tasks with your choice of .NET programming language


Q69.What is a workflow in SSIS ?

Workflow is a set of instructions on to specify the Program Executor on how to execute tasks and containers within SSIS Packages


Q70.What is the Control Flow?

When you start working with SSIS, you first create a package which is nothing but a collection of tasks or package components. The control flow allows you to order the workflow, so you can ensure tasks/components get executed in the appropriate order


Q71.What is a Transformation?

A transformation simply means bringing in the data in a desired format. For example you are pulling data from the source and want to ensure only distinct records are written to the destination, so duplicates are removed. Anther example is if you have master/reference data and want to pull only related data from the source and hence you need some sort of lookup. There are around 30 transformation tasks available and this can be extended further with custom built tasks if needed.


Q72. How many difference source and destinations have you used?

It is very common to get all kinds of sources so the more the person worked with the better for you. Common ones are SQL Server, CSV/TXT, Flat Files, Excel, Access, Oracle, MySQL but also Salesforce, web data scrapping.


Q73. Difference between Control Flow and Data Flow 

1.Control flow consists of one or more tasks and containers that execute when the package runs. We use precedence constraints to connect the tasks and containers in a package. SSIS provides three different types of control flow elements:
Containers that provide structures in packages,
Tasks that provide functionality, and
Precedence Constraints that connect the executables, containers, and tasks into an ordered control flow.
2.Control flow does not move data from task to task.
3.Tasks are run in series if connected with precedence or in parallel.

1. A Data flow consists of the sources and destinations that extract and load data, the transformations that modify and extend data, and the paths that link sources, transformations, and destinations. The Data Flow task is the executable within the SSIS package that creates, orders, and runs the data flow. Data Sources, Transformations, and Data Destinations are the three important categories in the Data Flow.
2. Data flows move data, but there are also tasks in the control flow, as such, their success or Failure effects how your control flow operates
3. Data is moved and manipulated through transformations. 4. Data is passed between each component in the data flow.


Q74. What are the different types of Transformations you have worked? 

AGGEGATE: The Aggregate transformation applies aggregate functions to column values and copies the results to the transformation output. Besides aggregate functions, the transformation provides the GROUP BY clause, which you can use to specify groups to aggregate across.
The Aggregate Transformation supports following operations:
Group By, Sum, Average,
Count, Count Distinct, Minimum, Maximum

AUDIT: Adds Package and Task level Metadata - such as Machine Name, Execution Instance, Package Name, Package ID, etc.

CHARACTER MAP: When it comes to string formatting in SSIS, Character Map transformation is very useful, used to convert data lower case, upper case.

CONDITIONAL SPLIT: used to split the input source data based on condition.

COPY COLUMN: Add a copy of column to the output, we can later transform the copy keeping the original for auditing.

DATA CONVERSION: Converts columns data types from one to another type. It stands for Explicit Column Conversion.
DATA MINING QUERY: Used to perform data mining query against analysis services and manage Predictions Graphs and Controls.

DERIVED COLUMN: Create a new (computed) column from given expressions.
EXPORT COLUMN: Used to export Image specific column from the database to a flat file.

FUZZY GROUPING: Groups the rows in the dataset that contain similar values.
FUZZY LOOKUP: Used for Pattern Matching and Ranking based on fuzzy logic.
IMPORT COLUMN: Reads image specific column from database onto a flat file.

LOOKUP: Performs the lookup (searching) of a given reference object set to a data source. It is used to find exact matches only. MERGE - Merges two sorted data sets of same column structure into a single output.

MERGE JOIN: Merges two sorted data sets into a single dataset using a join.

MULTI CAST: is used to create/distribute exact copies of the source dataset to one or more destination datasets.


ROW COUNT: Stores the resulting row count from the data flow / transformation into a variable.

ROW SAMPLING - Captures sample data by using a row count of the total rows in dataflow specified by rows or percentage.

UNION ALL: Merge multiple data sets into a single dataset.

PIVOT: Used for Normalization of data sources to reduce analomolies by converting rows into columns

UNPIVOT: Used for demoralizing the data structure by converts columns into rows in case of building Data Warehouses.


Q75. What are Row Transformations, Partially Blocking Transformation, Fully Blocking Transformation with examples. 

In Row Transformation, each value is manipulated individually. In this transformation, the buffers can be re-used for other purposes like following: OLEDB Data source, OLEDB Data Destinations, Other Row transformation within the package, Other partially blocking transformations within the package. Examples: Copy Column, Audit, Character Map

Partially Blocking Transformation:
These can re-use the buffer space allocated for available Row transformation and get new buffer space allocated exclusively for Transformation.
Example: Merge, Conditional Split, Multicast, Lookup, Import, Export Column

Fully Blocking Transformation:
It will make use of their own reserve buffer and will not share buffer space from other transformation or connection manager.
Example: Sort, Aggregate, Cache Transformation


Q76. Difference between Merge and UnionAll Transformations? 

The Union All transformation combines multiple inputs into one output. The transformation inputs are added to the transformation output one after the other; no reordering of rows occurs.
Merge Transformations combine’s two sorted data sets of same column structure into a single output. The rows from each dataset are inserted into the output based on values in their key columns.

The Merge transformation is similar to the Union All transformations.
Use the Union All transformation instead of the Merge transformation in the following situations: -The Source Input rows are not need to be sorted.
-The combined output does not need to be sorted.
-when we have more than 2 source inputs.


Q77. Multicast, Conditional Split, Bulk Insert Tasks 

Multicast Transformation is used to extract output from single source and places onto multiple destinations.
Conditional Split transformation is used for splitting the input data based on a specific condition. The condition is evaluated in VB Script.

Multicast Transformation generates exact copies of the source data, it means each recipient will have same number of records as the source whereas the Conditional Split Transformation divides the source data based on the defined conditions and if no rows match with this defined conditions those rows are put on default output.

Bulk Insert Task is used to copy the large volume of data from text file to sql server destination.


Q78. What is the RetainSameConnection property and what is its impact?

Whenever a task uses a connection manager to connect to source or destination database, a connection is opened and closed with the execution of that task. Sometimes you might need to open a connection, execute multiple tasks and close it at the end of the execution. This is where RetainSameConnection property of the connection manager might help you. When you set this property to TRUE, the connection will be opened on first time it is used and remain open until execution of the package completes.


Q79. Difference between Synchronous and Asynchronous Transformation 

Synchronous T/F process the input rows and passes them onto the data flow one row at a time. Synchronous Transformation cannot create new buffer.
When the output buffer of Transformation creates a new buffer, then it is Asynchronous transformation. Output buffer or output rows are not sync with input buffer.


Q80. Explain Web service task in SSIS?

Web Service task let us execute web services.
· First we configure HTTP Connection manager which will point to WSDL of a web service.
· Web service task uses this HTTP Connection manager and let us invoke methods in it.
· Return values of method value we can store it in some variables and can use as input for some other tasks.

    
Q81. Can you explain transfer SQL Server object task?

It allows us to transfer different SQL server objects between different instances of SQL Server.
Object means table, stored procedures, user defined functions etc.


Q82. In SSIS is it possible to communicate with MSMQ?

Yes, for that we have Message Queue task. It let us send messages to MSMQ and receive message from MSMQ.


Q83. What’s the advantage of using MSMQ?

MSMQ let two applications communicate with each other asynchronously. Specialty is two application may be built using different technology and it works even offline messaging. Sender will store messages inside queue and reader reads it wherever required.

Q84. The data in the Flat File as follows:

"132","Ramesh"," " ,"Hyderabad"
"132","Radhika","17","Vangara"
How to remove double quotes from the file to process the data.
In the Flat File Connection Manager Editor, Enter double quotes in Text Qualifier text box:


Q85. Will trigger fire when inserting data through SSIS package? 

1. In the data flow task, go to the Advanced Editor of OLEDB Destination, and there should be a property "FastLoadOptions". Specify FIRE_TRIGGERS as an additional option.
2. SQL Destination Editor:
3. Bulk Insert Task Editor:

 
Q84. What are the different types of Transaction Options? 

Required: If a transaction already exists at the upper level, the current executable will join the transaction. If No transaction at the upper level, a new transaction is created automatically. Supported: In any executable, if there is a transaction at upper level, the executable join the transaction else do not create a new transaction.
Not Supported: The executable of the package do not honor any transaction ie do not join other transaction nor creates new transaction.

Q85. Explain about Checkpoints with properties
 

Checkpoint is used to restart the package execution from the point of failure rather than from initial start. Set the following Properties:
CheckpointFileName: Specifies the name of the checkpoint file.
CheckpointUsage: Never, IfExists, Always
SaveCheckpoints: indicates whether the package needs to save checkpoints. This property must be set to True to restart a package from a point of failure.
FailPackageOnFailure: property needs to be set to True for enabling the task in the checkpoint. Checkpoint mechanism uses a Text File to mark the point of package failure. These checkpoint files are automatically created at a given location upon the package failure and automatically deleted once the package ends up with success.

Q86. How to execute SSIS Package from Stored Procedure. 

using xp_cmdshell command

Q87. How to enable Xp_CmdShell in Sql Server? 

We can enable through either T-Sql or SQL Server Surface Area Configuration. -- To allow advanced options to be changed. EXEC sp_configure 'show advanced options', 1 GO-- To update the currently configured value for advanced options.RECONFIGURE GO -- To enable the feature.
EXEC sp_configure 'xp_cmdshell', 1 GO -- To update the currently configured value for this feature. RECONFIGURE

GO

Q88. Package configuration? Different types of Configuration Files 

The package can be transferred across various environments like development and unit testing, system testing, UAT and production. Most packages will have environment specific variables like connection string to a database or path to a flat file, or user defined variables etc. that would be impacted while moving the package across environments as part of deployment process. Hence, it is mandatory to change these environment dependent variables when the package is transferred across environments. Package configurations help in managing such changes without actually opening and editing the SSIS package in BIDS. After deploying the package to a different machine (using SQL Server or file system deployment mode) it is mandatory to copy the related package configuration files on to that machine. If the package is scheduled to run through a SQL Agent job, the configuration file should be added while creating the job so that package will read the information from the configuration file. While executing the job, SQL Agent will take the design time values for connection strings if the package configuration file is not supplied.
There are 5 configuration types available with package configurations.


Q89. Logging. Different types of Logging files? 

Logging is used to log the information during the execution of package.
A log provider can be a text file, the SQL Server Profiler, a SQL Server relational database, a Windows event log, or an XML file. If necessary, you can define a custom log provider (e.g., a proprietary file format).


Q90. What is the LoggingMode property? 

SSIS packages, tasks and containers have a property called LoggingMode.
This property accepts 3 possible values:
Enabled: to enable logging of that component
Disabled: to disable logging of that component
UseParentSetting: to use parent's setting of that component to decide whether or not to log the data.


Q91. How to debug a package 

For debugging a package, we have 3 options:
-Setting breakpoints in a package, container or Task
-Using Data Viewer in the Data flow task
-Setting Error redirection in the Data Flow task

Q92. Error handling in SSIS package 

I have created a package like below: Select 'Load to sql Table' Data flow Task. Navigate to 'Even Handlers' Tab.

Drag and Drop 'Execute Sql Task'.

Open the Execute Sql Task Editor and in Parameter Mapping' section, select the system variables as follows: create a table in Sql Server Database with Columns as: PackageID, PackageName,
TaskID, TaskName,
ErrorCode, ErrorDescription.

The package will be failed during the execution. The error information is inserted into Table.

Q93. How to configure Error Output in SSIS 

we have 3 options to configure error output in ssis.
1. Ignore Failure
2. Redirect Row
3. Fail Component
For Example: I have comma-separated value in a flat file with two columns (code, Name). Code is an integer value and name is a varchar(20) data type configured in the flat file connection manager. Some of the codes in the flat files are characters. So, flat file reader component will fail reading the character value. But, I want to redirect the error data to separate table.

Q94. What is Ignore Failure option in SSIS? 

In Ignore Failure option, the error will be ignored and the data row will be directed to continue on the next transformation. Let’s say you have some JUNK data (wrong type of data) flowing from source, then using this option in SSIS we can REDIRECT the junk data records to another transformation instead of FAILING the package. This helps to MOVE only valid data to destination and JUNK can be captured into separate file.

Q95. Incremental Load in SSIS 

Using Slowly Changing Dimension
Using Lookup and Cache Transformation


Q96. How to migrate Sql server 2005 Package to 2008 version

1. In BIDS, by right click on the "SSIS Packages" folder of an SSIS project and selecting "Upgrade All Packages".
2. Running "ssisupgrade.exe" from the command line (default physical location C:\Program Files\Microsoft SQL Server\100\DTS\Bin folder).
3. If you open a SSIS 2005 project in BIDS 2008, it will automatically launch the SSIS package upgrade wizard.

Q97. Difference between File System and Sql server Deployment 

File System Deployment: We can save the package on a physical location on hard drive or any shared folder with this option, and we should provide a full qualified path to stored package in the FileSystem option.
Sql Server Deployment: SSIS packages will be stored in the sysssispackages table of MSDB Database.


Q98. Difference between Lookup and Fuzzy Lookup transformation
 

Lookup Transformation finds the exact match.
Fuzzy Lookup transformation matches input table with reference table. It finds the closest match and indicates the quality of the match.


Q99. Difference between Full Cache and Partial Cache 

Partial Cache: The lookup cache starts off empty at the beginning of the data flow. When a new row comes in, the lookup transform checks its cache for the matching values. If no match is found, it queries the database. If the match is found at the database, the values are cached so they can be used the next time a matching row comes in.
Full Cache: The default cache mode for lookup is Full cache. The database is queried once during the pre-execute phase of the data flow. The entire reference set is pulled into memory. This approach uses most of the memory. Caching takes place before any rows are read from the data flow source. Lookup operations will be very fast during execution.

Q100. Cache Transformation 

Cache Transformation: to cache the data used in the Lookup transform. When to use Full cache mode: When you're accessing a large portion of your reference set When you have a small reference table When your database is remote or under heavy load, and you want to reduce the number of queries sent to the server When to use Partial cache mode: When you're processing a small number of rows and it's not worth the time to charge the full cache. When you have a large reference table. When your data flow is adding new rows to your reference table. When you want to limit the size of your reference table by modifying query with parameters from the data flow.

Q101. Explain Slowly Changing Dimension

 
The SCD concept is basically about how the data modifications are absorbed and maintained in a Dimension Table.
The new (modified) record and the old record(s) are identified using some kind of a flag like say IsActive, IsDeleted etc. or using Start and End Date fields to indicate the validity of the record. Types:
Type 1: update the columns in the dimension row without preserving any change history.
Type 2: preserve the change history in the dimension table and create a new row when there are changes.
Type 3: some combination of Type 1 and Type 2, usually maintaining multiple instances of a column in the dimension row; e.g. a current value and one or more previous values.

Q102. Different types of File Enumerators


Foreach ADO: The ADO Enumerator enumerates rows in a table. For example, we can get the rows in the ADO records. The variable must be of Object data type.
Foreach ADO.NET Schema Rowset: The ADO.Net Enumerator enumerates the schema information. For example, we can get the table from the database.
Foreach File: The File Enumerator enumerates files in a folder.
For example, we can get all the files which have the *.txt extension in a windows folder and its sub folders.
Foreach From Variable: The Variable Enumerator enumerates objects that specified variables contain. Here enumerator objects are nothing but an array or data table.
Foreach Item: The Item Enumerator enumerates the collections.
For example, we can enumerate the names of executable and working directories that an “Execute Process” task uses.
Foreach Nodelist: The Node List Enumerator enumerates the result of an XPath expression.
Foreach SMO: The SMO Enumerator enumerates SQL Server Management Objects (SMO). For example, we can get the list of functions or views in a SQL Server database.

Q103. How to execute the package from .NET? 

We need a reference to Microsoft.SqlServer.ManagedDts.dll to call a package.
using Microsoft.SqlServer.Dts.Runtime
Application app = new Application();
Package package = null;
package = app.LoadPackage(@"C:\Program Files\Microsoft SQL Server\100\DTS\Packages\Integration Services Project2\Package.dtsx", null); Microsoft.SqlServer.Dts.Runtime.DTSExecResult results = package.Execute();


Q104. How to schedule a package (Role of Sql Server Agent) 

In order for the job to run successfully, the SQL Server agent should be running on the target machine. We can start the SQL Server Agent Services in numerous ways like:-
Starting SQL Server Agent Service from Command Line
• Starting SQL Server Agent Service from Services.MSC console
• Starting SQL Server Agent Service using SQL Server Configuration Manager
• Starting SQL Server Agent Service using SQL Server Management Studio (SSMS)


Q105. What are containers? (For loop, Sequence Container) 

SSIS Containers are controls that provide structure to SSIS packages. Containers support repeating control flows in packages and they group tasks and containers into meaningful units of work. Containers can include other containers in addition to tasks.
Foreach Loop Container: This container runs a Control Flow repeatedly using an enumerator. To repeat tasks for each element in a collection, for example retrieve files from a folder, running T-SQL statements that reside in multiple files, or running a command for multiple objects.
For Loop Container: This container runs a Control Flow repeatedly by checking conditional expression (same as For Loop in programming language). To repeat tasks until a specified expression evaluates to false.
For example, a package can send a different e-mail message seven times, one time for every day of the week.
Sequence Container: Groups tasks as well as containers into Control Flows that are subsets of the package Control Flow. This container group tasks and containers that must succeed or fail as a unit.
For example, a package can group tasks that delete and add rows in a database table, and then commit or roll back all the tasks when one fails.


Q106. What are precedence constraints? 

A task will only execute if the condition that is set by the precedence constraint preceding the task is met. By using these constraints, it will choose different execution paths depending on the success or failure of other tasks.

Success: Workflow will proceed when the preceding container executes successfully. Indicated in control flow by a solid green line.

Failure: Workflow will proceed when the preceding container’s execution results in a failure. Indicated in control flow by a solid red line.

Completion: Workflow will proceed when the preceding container’s execution completes, regardless of success or failure. Indicated in control flow by a solid blue line.

Expression/Constraint with Logical AND: Workflow will proceed when specified expression and constraints evaluate to true. Indicated in control flow by a solid color line along with a small ‘fx’ icon next to it. Color of line depends on logical constraint chosen (e.g. success=green, completion=blue).

Expression/Constraint with Logical OR: Workflow will proceed when either the specified expression or the logical constraint (success/failure/completion) evaluates to true. Indicated in control flow by a dotted color line along with a small ‘fx’ icon next to it. Color of line depends on logical constraint chosen (e.g. success=green, completion=blue).


Q107. Performance Optimization in SSIS
 

1. Avoid Asynchronous Transformation (Sort T/F) wherever possible.
Sort T/F required all the incoming rows to be arrived before start processing. Instead of using Sort T/F, we get sorted rows from data source using ORDER By clause.
2. While pulling High Volumes of Data, Drop all Non-Clustered Indexes and Clustered Index if exists, then Transfer and load the data into Destination Table. Then create Clustered Index and Non-clustered indexes.

3. Avoid SELECT *
Data Flow Task uses buffer oriented architecture for data transfer and transformation. When data transfer from Source to Destination, the data first comes into the buffer; required transformations are done in the buffer itself and then written to Destination.
The size of buffer depends on the estimated row size. The estimated row size is equal to the maximum size of all columns in the row. So the more columns in a row mean less number of rows in a buffer. Hence select only those columns which are required at the destination. Even if we need all the columns from source, we should use the column name specifically in the SELECT statement.

4. Effect of OLEDB Destination Settings:
There are couples of settings with OLEDB destination which can impact the performance of data transfer.
Data Access Mode: This setting provides the 'fast load' option which internally uses a BULK INSERT statement for uploading data into the destination table instead of a simple INSERT statement (for each single row) as in the case for other options. So unless you have a reason for changing it, don't change this default value of fast load. If you select the 'fast load' option, there are also a couple of other settings which you can use as discussed below.

Keep Identity: By default this setting is unchecked which means the destination table (if it has an identity column) will create identity values on its own. If you check this setting, the dataflow engine will ensure that the source identity values are preserved and same value is inserted into the destination table.
Keep Nulls: Again by default this setting is unchecked which means default value will be inserted (if the default constraint is defined on the target column) during insert into the destination table if NULL value is coming from the source for that particular column. If you check this option then default constraint on the destination table's column will be ignored and preserved NULL of the source column will be inserted into the destination.

Table Lock: By default this setting is checked and the recommendation is to let it be checked unless the same table is being used by some other process at same time. It specifies a table lock will be acquired on the destination table instead of acquiring multiple row level locks, which could turn into lock escalation problems.

Check Constraints: Again by default this setting is checked and recommendation is to un-check it if you are sure that the incoming data is not going to violate constraints of the destination table. This setting specifies that the dataflow pipeline engine will validate the incoming data against the constraints of target table. If you un-check this option it will improve the performance of the data load.

#5 - Effect of Rows Per Batch and Maximum Insert Commit Size Settings:
Rows per batch: The default value for this setting is -1 which specifies all incoming rows will be treated as a single batch. You can change this default behavior and break all incoming rows into multiple batches. The allowed value is only positive integer which specifies the maximum number of rows in a batch. Maximum insert commit size: The default value for this setting is '2147483647' (largest value for 4 byte integer type) which specifies all incoming rows will be committed once on successful completion. You can specify a positive value for this setting to indicate that commit will be done for those number of records. Changing the default value for this setting will put overhead on the dataflow engine to commit several times. Yes that is true, but at the same time it will release the pressure on the transaction log and tempdb to grow specifically during high volume data transfers. The above two settings are very important to understand to improve the performance of tempdb and the transaction log. For example if you leave 'Max insert commit size' to its default, the transaction log and tempdb will keep on growing during the extraction process and if you are transferring a high volume of data the tempdb will soon run out of memory as a result of this your extraction will fail. So it is recommended to set these values to an optimum value based on your environment.

#7 - DefaultBufferSize and DefaultBufferMaxRows :
The execution tree creates buffers for storing incoming rows and performing transformations. The number of buffer created is dependent on how many rows fit into a buffer and how many rows fit into a buffer dependent on few other factors.
The first consideration is the estimated row size, which is the sum of the maximum sizes of all the columns from the incoming records.
The second consideration is the DefaultBufferMaxSize property of the data flow task. This property specifies the default maximum size of a buffer. The default value is 10 MB and its upper and lower boundaries are constrained by two internal properties of SSIS which are MaxBufferSize (100MB) and MinBufferSize (64 KB). It means the size of a buffer can be as small as 64 KB and as large as 100 MB.
The third factor is, DefaultBufferMaxRows which is again a property of data flow task which specifies the default number of rows in a buffer. Its default value is 10000. If the size exceeds the DefaultBufferMaxSize then it reduces the rows in the buffer.
For better buffer performance you can do two things.
First you can remove unwanted columns from the source and set data type in each column appropriately, especially if your source is flat file. This will enable you to accommodate as many rows as possible in the buffer.
Second, if your system has sufficient memory available, you can tune these properties to have a small number of large buffers, which could improve performance. Beware if you change the values of these properties to a point where page spooling begins, it adversely impacts performance. So before you set a value for these properties, first thoroughly testing in your environment and set the values appropriately.

#8 - How DelayValidation property can help you
SSIS uses two types of validation.
First is package validation (early validation) which validates the package and all its components before starting the execution of the package.
Second SSIS uses component validation (late validation), which validates the components of the package once started. Let's consider a scenario where the first component of the package creates an object i.e. a temporary table, which is being referenced by the second component of the package. During package validation, the first component has not yet executed, so no object has been created causing a package validation failure when validating the second component. SSIS will throw a validation exception and will not start the package execution. So how will you get this package running in this common scenario? To help you in this scenario, every component has a DelayValidation (default=FALSE) property. If you set it to TRUE, early validation will be skipped and the component will be validated only at the component level (late validation) which is during package execution

9. Better performance with parallel execution
10. When to use events logging and when to avoid.
11. Monitoring the SSIS Performance with Performance Counters
Launch Performance Monitor:
1. Start -> All Programs -> Administrative Tools -> Performance
2. Load the SSIS related Counters
In the Performance Object, select SQL Server:SSIS Pipeline and SQL Server:SSIS Service. SSIS provide a set of performance counters. Among them, the following few are helpful when you tune or debug your package:
Buffers in use
Flat buffers in use
Private buffers in use
Buffers spooled
Rows read
Rows written

“Buffers in use”, “Flat buffers in use” and “Private buffers in use” are useful to discover leaks. During package execution time, we will see these counters fluctuating. But once the package finishes execution, their values should return to the same value as what they were before the execution. Otherwise, buffers are leaked.

“Buffers spooled” has an initial value of 0. When it goes above 0, it indicates that the engine has started memory swapping. In a case like this, set Data Flow Task properties BLOBTempStoragePath and BufferTempStoragePath appropriately for maximal I/O bandwidth.

Buffers Spooled: The number of buffers currently written to the disk. If the data flow engine runs low on physical memory, buffers not currently used are written to disk and then reloaded when needed.

“Rows read” and “Rows written” show how many rows the entire Data Flow has processed.

12. FastParse property Fast Parse option in SSIS can be used for very fast loading of flat file data. It will speed up parsing of integer, date and time types if the conversion does not have to be locale-sensitive. This option is set on a per-column basis using the Advanced Editor for the flat file source. 13. Checkpoint features helps in package restarting

Q108. Upgrade DTS package to SSIS 

1. In BIDS, from the Project Menu, select 'Migrate DTS 2000 Package'
2. In the Package Migration Wizard, choose the Source, Sql Server 2000 Server Name, Destination folder.
3. Select the List of packages that needs to be upgraded to SSIS
4. Specifty the Log file for Package Migration.


Q109. Events in SSIS


OnError : Runs when a task or container reports an error.
OnExecStatusChanged : Runs for all tasks and containers when the execution status changes to In Process, Success, or Failed.
OnInformation : Runs when SSIS outputs information messages during the validation and execution of a task or container.
OnPostExecute : Runs after a container or task successfully completes.
OnPostValidate : Executes after a container or task has successfully been validated.
OnPreExecute : Runs just before a container or task is executed.
OnPreValidate: Runs before the component is validated by the engine.
OnProgress : Executed when a progress message is sent by the SSIS engine, indicating tangible advancement of the task or container.
OnQueryCancel : Invoked when an Execute SQL Task is cancelled through manual intervention, such as stopping the package.
OnTaskFailed : Similar to OnError, but runs when a task fails rather than each time an error occurs. OnVariableValueChanged: Runs when the value changes in a variable for which the RaiseChangeEvent property is set to True.
OnWarning Runs: when a task returns a warning event such as a column not being used in a data flow.

Q110. Different ways to execute SSIS package 

1. Using the Execute Package Utility (DTEXECUI.EXE) graphical interface one can execute an SSIS package that is stored in a File System, SQL Server or an SSIS Package Store. DTEXECUI provides a graphical user interface that can be used to specify the various options to be set when executing an SSIS package. You can launch DTEXECUI by double-clicking on an SSIS package file (.dtsx). You can also launch DTEXECUI from a Command Prompt then specify the package to execute.
2. Using the DTEXEC.EXE command line utility one can execute an SSIS package that is stored in a File System, SQL Server or an SSIS Package Store. The syntax to execute a SSIS package which is stored in a File System is shown below. DTEXEC.EXE /F "C:\BulkInsert\BulkInsertTask.dtsx"

3. Test the SSIS package execution by running the package from BIDS: -In Solution Explorer, right click the SSIS project folder that contains the package which you want to run and then click properties. - In the SSIS Property Pages dialog box, select Build option under the Configuration Properties node and in the right side panel, provide the folder location where you want the SSIS package to be deployed within the OutputPath.
Click OK to save the changes in the property page. -Right click the package within Solution Explorer and select Execute Package option from the drop down menu
4. Sql Server Agent:
Drill down to the SQL Server Agent node in the Object Explorer.
Right click on the Jobs node and select New Job from the popup menu. The first step to setting up the proxy is to create a credential (alternatively you could use an existing credential). Navigate to Security then Credentials in SSMS Object Explorer and right click to create a new credential Navigate to SQL Server Agent then Proxies in SSMS Object Explorer and right click to create a new proxy

Q111. How to execute a Stored Procedure from SSIS? 


Using Execute SQL Task


Q112. How to deploy packages from one server to another server
 

1.To copy the deployment bundle Locate the deployment bundle on the first server. If you used the default location, the deployment bundle is the Bin\Deployment folder. Right-click the Deployment folder and click Copy. Locate the public share to which you want to copy the folder on the target computer and click Paste.
2: Running the Package Installation Wizard
1. On the destination computer, locate the deployment bundle.
2. In the Deployment folder, double-click the manifest file, Project1.SSISDeploymentManifest.
3. On the Welcome page of the Package Installation Wizard, click Next.
4. On the Deploy SSIS Packages page, select either File sytem or SQL Server deployment option, select the "Validate packages after installation" check box, and then click Next.
5. On the Specify Target SQL Server page, specify (local), in the Server name box.
6. If the instance of SQL Server supports Windows Authentication, select Use Windows Authentication; otherwise, select Use SQL Server Authentication and provide a user name and a password.
7. Verify that the "Rely on server storage for encryption" check box is cleared. Click Next.
8. On the Select Installation Folder page, click Browse.
9. On the Confirm Installation page, click Next.
10. The wizard installs the packages. After installation is completed, the Configure Packages page opens.


Q113. How to deploy a package 

Right click on the Solution in Solution Explorer and choose properties in the Menu.
When the build /rebuild is successful, navigate to the directory is referred in DeploymentOutputPath Deploying the Package: Double click the Manifest File to start the deployment.
The Package Installation wizard begins and Deploy SSIS Packages step is the first screen that is presented. This screen lets you select where shall the packages be deployed, as mentioned in the Dialog Box, deploying in SQL Server is more secure, since SQL Server stores the packages internally compared to File System where additional security measures needs to taken to secure the physical files.


Q114. What is the use of Percentage Sampling transformation in SSIS? 

Percentage Sampling transformation is generally used for data mining. This transformation builds a random sample of set of output rows by choosing specified percentage of input rows. For example if the input has 1000 rows and if I specify 10 as percentage sample then the transformation returns 10% of the RANDOM records from the input data.


Q115. What is the use of Term Extraction transformation in SSIS?
 

Term Extraction transformation is used to extract nouns or noun phrases or both noun and noun phrases only from English text. It extracts terms from text in a transformation input column and then writes the terms to a transformation output column. It can be also used to find out the content of a dataset.


Q116. What is Data Viewer and what are the different types of Data Viewers in SSIS? 

A Data Viewer allows viewing data at a point of time at runtime.
The different types of data viewers are:
1. Grid
2. Histogram
3. Scatter Plot
4. Column Chart


Q117. what are the possible locations to save SSIS package? 

1.File System: We can save the package on a physical location on hard drive or any shared folder with this option, and we should provide a full qualified path to stored package in the FileSystem option.
2. Sql Server: SSIS packages will be stored in the MSDB database, in the sysssispackages table. SSIS Package Store is nothing but combination of SQL Server and File System deployment, as you can see when you connect to SSIS through SSMS: it looks like a store which has categorized its contents (packages) into different categories based on its manager’s (which is you, as the package developer) taste. So, don’t get it wrong as something different from the 2 types of package deployment.


Q118. How to provide security to packages? 

We can provide security to packages in 2 ways
1. Package encryption
2. Password protection
1. DonotSaveSensitive: any sensitive information is simply not written out to the package XML file when you save the package.
2. EncryptSensitiveWithUserKey: encrypts sensitive information based on the credentials of the user who created the package. It is the default value for the ProtectionLevel property.
3. EncryptSensitiveWithPassword: requires to specify a password in the package, and this password will be used to encrypt and decrypt the sensitive information in the package.
4. EncryptAllWithPassword: allows to encrypt the entire contents of the SSIS package with your specified password.
5. EncryptAllWithUserKey: allows to encrypt the entire contents of the SSIS package by using the user key.
6. Server Storage: allows the package to retain all sensitive information when you are saving the package to SQL Server. SSIS packages are saved to MSDB database of SQL Server. You can change the Protection Level of deployed packages by using the DTUTIL utility.


Q119. How to track a variable in ssis? 

OnVariableValueChanged: This event gets raised when value of the variable is changed.
1. Set the "EvaluateasExpression" property of the variable as True.
2. Set the "RaiseChangedEvent" property of the variable as True.
3. Create an event handler for the "OnVariableValueChanged" event for the container in which the variable is scoped.


Q120. FTP Task:
 

The FTP task downloads and uploads data files and manages directories on servers. For example, a package can download data files from a remote server. use the FTP task for the following purposes: 1. Copying directories and data files from one directory to another, before or after moving data, and applying transformations to the data.
2. Downloading files from an FTP location and applying transformations to column data before loading the data into a database. At run time, the FTP task connects to a server by using an FTP connection manager. The FTP connection manager includes the server settings, the credentials for accessing the FTP server, and options such as the time-out and the number of retries for connecting to the server. The FTP connection manager supports only anonymous authentication and basic authentication. It does not support Windows Authentication. Predefined FTP Operations: Send Files, Receive File, Create Local directory, Remove Local Directory, Create Remote Directory, Remove Remote Directory Delete Local Files, Delete Remote File Customer Log Entries available on FTP Task: FTPConnectingToServer FTPOperation


Q121. New features in SSIS 2012 

1. GUI Improvements - -Sort packages by name -Package visualization -Zoom -Data flow source/destination wizard -Grouping in data flow

2. CDC (Change Data Capture) Task and Components -
-CDC is nothing but Incremental load loads all rows that have changed since the last load
-CDC needs to keep track of which changes have already been processed.
-CDC task does this by storing LSNs in a tracking table -CDC source component reads from the CDC table function, based on the LSN it for from the CDC task.
-CDC transformation splits records into new rows, updated rows and deleted rows.
3. Flat File Connection Manager Changes -
-The Flat File connection manager now supports parsing files with embedded qualifiers. The connection manager also by default always checks for row delimiters to enable the correct parsing of files with rows that are missing column fields. The Flat File Source now supports a varying number of columns, and embedded qualifiers.

4. Offline Connection Managers: Integration Services now validates all connections before validating all of the data flow components when a package is opened and sets any connections that are slow or unavailable to work offline. This helps to reduce the delay in validating the package data flow. After a package is opened, you can also turn off a connection by right-clicking the connection manager in the Connection Managers area and then clicking Work Offline. This can speed up operations in the SSIS Designer.

5. New Functions/Expressions in SSIS 2012:
LEFT: You now can easily return the leftmost portion of a string rather than use the SUBSTRING function. Left syntax is the same as we know in T-SQL: LEFT(character_expression,number)
REPLACENULL: You can use this function to replace NULL values in the first argument with the expression specified in the second argument. This is equivalent to ISNULL in T-SQL: REPLACENULL(expression, expression)
TOKEN: This function allows you to return a substring by using delimiters to separate a string into tokens and then specifying which occurrence to return: TOKEN(character_expression, delimiter_string, occurrence)
TOKENCOUNT: This function uses delimiters to separate a string into tokens and then returns the count of tokens found within the string:
TOKENCOUNT(character_expression, delimiter_string)
6. Easy Column Remapping in Data Flow (Mapping Data Flow Columns) -When modifying a data flow, column remapping is sometimes needed -SSIS 2012 maps columns on name instead of id -It also has an improved remapping dialog

7. Shared Connection Managers: To create connection managers at the project level that can shared by multiple packages in the project. The connection manager you create at the project level is automatically visible in the Connection Managers tab of the SSIS Designer window for all packages. -When converting shared connection managers back to regular (package) connection managers, they disappear in all other packages.
8. Scripting Enhancements: Now Script task and Script Component support for 4.0. - Breakpoints are supported in Script Component

9. ODBC Source and Destination - -ODBC was not natively supported in 2008 -SSIS 2012 has ODBC source & destination -SSIS 2008 could access ODBC via ADO.NET
10. Reduced Memory Usage by the Merge and Merge Join Transformations – The old SSIS Merge and Merge Join transformations, although helpful, used a lot of system resources and could be a memory hog. In 2012 these tasks are much more robust and reliable. Most importantly, they will not consume excessive memory when the multiple inputs produce data at uneven rates.
11. Undo/Redo: One thing that annoys users in SSIS before 2012 is lack of support of Undo and Redo. Once you performed an operation, you can’t undo that. Now in SSIS 2012, we can see the support of undo/redo.


Q122. Difference between Script Task and Script Component in SSIS. 

 Script Task Script Component Control Flow/Date Flow The Script task is configured on the Control Flow tab of the designer and runs outside the data flow of the package. The Script component is configured on the Data Flow page of the designer and represents a source, transformation, or destination in the Data Flow task. Purpose A Script task can accomplish almost any general-purpose task. You must specify whether you want to create a source, transformation, or destination with the Script component. Raising Results The Script task uses both the TaskResult property and the optional ExecutionValue property of the Dts object to notify the runtime of its results. The Script component runs as a part of the Data Flow task and does not report results using either of these properties. Raising Events The Script task uses the Events property of the Dts object to raise events. For example: Dts.Events.FireError(0, "Event Snippet", ex.Message & ControlChars.CrLf & ex.StackTrace The Script component raises errors, warnings, and informational messages by using the methods of the IDTSComponentMetaData100 interface returned by the ComponentMetaData property. For example: Dim myMetadata as IDTSComponentMetaData100 myMetaData = Me.ComponentMetaData myMetaData.FireError(...) Execution A Script task runs custom code at some point in the package workflow. Unless you put it in a loop container or an event handler, it only runs once. A Script component also runs once, but typically it runs its main processing routine once for each row of data in the data flow. Editor The Script Task Editor has three pages: General, Script, and Expressions. Only the ReadOnlyVariables and ReadWriteVariables, and ScriptLanguage properties directly affect the code that you can write. The Script Transformation Editor has up to four pages: Input Columns, Inputs and Outputs, Script, and Connection Managers. The metadata and properties that you configure on each of these pages determines the members of the base classes that are autogenerated for your use in coding. Interaction with the Package In the code written for a Script task, you use the Dts property to access other features of the package. The Dts property is a member of the ScriptMain class. In Script component code, you use typed accessor properties to access certain package features such as variables and connection managers. The PreExecute method can access only read-only variables. The PostExecute method can access both read-only and read/write variables. Using Variables The Script task uses the Variables property of the Dts object to access variables that are available through the task’s ReadOnlyVariables and ReadWriteVariables properties. For example: string myVar; myVar = Dts.Variables["MyStringVariable"].Value.ToString(); The Script component uses typed accessor properties of the autogenerated based class, created from the component’s ReadOnlyVariables and ReadWriteVariables properties. For example: string myVar; myVar = this.Variables.MyStringVariable; Using Connections The Script task uses the Connections property of the Dts object to access connection managers defined in the package. For example: string myFlatFileConnection; myFlatFileConnection = (Dts.Connections["Test Flat File Connection"].AcquireConnection(Dts.Transaction) as String); The Script component uses typed accessor properties of the autogenerated base class, created from the list of connection managers entered by the user on the Connection Managers page of the editor. For example: IDTSConnectionManager100 connMgr;connMgr = this.Connections.MyADONETConnection;


Q123. How to execute package from command line 

1. To execute an SSIS package saved to SQL Server using Windows Authentication: dtexec /sq pkgOne /ser productionServer
2. To execute an SSIS package that is saved in the file system: dtexec /f "c:\pkgOne.dtsx"
3. To execute an SSIS package saved to the File System folder in the SSIS Package Store: dtexec /dts "\File System\MyPackage"
4. To execute an SSIS package that is saved in the file system and configured externally: dtexec /f "c:\pkgOne.dtsx" /conf "c:\pkgOneConfig.cfg"


Q124. How to unzip a File in SSIS? 

Use Execute Process Task in the Control Flow.
From BIDS, drag and drop an "Execute Process Task" to the control flow and configure. In the Execute Process, perform the following configurations: Executable: The path of the application that is being used.
Arguments: Need to supply the arguments to extract the zipped files. Working Directory: The current directory for all process.


Q125. which service requires to start a job?             


SQL Server Agent Service


56. Difference between OLEDB Destination, SQL Destination, Bulk Insert 

1. OLEDB destination loads the records in batches, whereas SQL Server destination loads all the records at one go.
2. OLEDB Destination uses the 'Fast Load' data access mode. SQL Server destination uses shared memory for maximum loading speed, must execute on the same server as the database engine. Prefer the OLE-DB Destination simply because it gives better flexibility on where you execute the package.
3. The Bulk Insert task uses the T-SQL BULK INSERT statement for speed when loading large amounts of data.


Q126. which services are installed during Sql Server installation?

SSIS
SSAS
SSRS
SQL Server (MSSQLSERVER)
SQL Server Agent Service
SQL Server Browser
SQL Full-Text


Q127. How to run dynamic T-SQL in SSIS? 


Option#1: Using Script Component as Destination
Option#2: Using Object Variable and run t-sql with Execute SQL Task


Q128. What is the use of RecordSet Destination? 

The Recordset destination does not save data to an external data source. Instead, it saves data in memory in a recordset that is stored in variable of the Object data type. After the Recordset destination saves the data, we use a Foreach Loop container with the Foreach ADO enumerator to process one row of the recordset at a time. The Foreach ADO enumerator saves the value from each column of the current row into a separate package variable. Then, the tasks that you configure inside the Foreach Loop container read those values from the variables and perform some action with them.



Q129. Delay Validation, Forced Execution 

Delay Validation: Validation take place during the package execution. Early Validation: Validation take place just before the package execution.


Q130. Transfer Database Task 

used to move a database to another SQL Server instance or create a copy on the same instance (with different database name). This task works in two modes: Offline, Online. Offline: In this mode, the source database is detached from the source server after putting it in single user mode, copies of the mdf, ndf and ldf files are moved to specified network location. On the destination server the copies are taken from the network location to the destination server and then finally both databases are attached on the source and destination servers. This mode is faster, but a disadvantage with mode is that the source database will not available during copy and move operation. Also, the person executing the package with this mode must be sysadmin on both source and destination instances. Online: The task uses SMO to transfer the database objects to the destination server. In this mode, the database is online during the copy and move operation, but it will take longer as it has to copy each object from the database individually. Someone executing the package with this mode must be either sysadmin or database owner of the specified databases.


Q131. Transfer SQL Server Object Task 

Used to transfer one or more SQL Server objects to a different database, either on the same or another SQL Server instance. You can transfer tables, views, Indexes, stored procedures, User defined functions, Users, Roles etc.


Q132. How to Generate an Auto Incremental Number in a SSIS Package? 

A script component can be used for the designated task. The steps are as follows:
1. Drag and drop the Script Document to the Data flow and select the Script Component Type as Transformation.
2. Double click the Script Component.
3. Select the column which is to pass through the script component, in the Input Columns tab.
4. Add a column with an integer data type, in the Inputs and Outputs tab.


Q133. How to create Temporary Table using SSIS? 

1. For the connection manager, set the property RetainSameConnection =True so that temporary table created in one Control Flow task can be retained in another task. RetainSameConnection means that the temp table will not be deleted when the task is completed.
2. Create a data-flow task that consumes your global temp table in an OLE DB Source component.
3. Set DelayValidation=TRUE on the data-flow task, means that the task will not check if the table exists upon creation.


Q134. How to Lock a variable in Script Task? 

public void Main()
{
Variables vars = null ;
bool fireAgain = true;
Dts.VariableDispenser.LockOneForRead("varName", ref vars); //Do something with the value... vars.Unlock();
Dts.TaskResult = (int)ScriptResults.Success;
}


Q135. How to pass property value at Run time? 

A property value like connection string for a Connection Manager can be passed to the package using package configurations.


Q136. How to skip first 5 lines in each Input flat file? 

In the Flat file connection manager editor, Set the 'Header rows to skip' property.


Q137. Parallel processing in SSIS 

To support parallel execution of different tasks in the package, SSIS uses 2 properties: 1.MaxConcurrentExecutables: defines how many tasks can run simultaneously, by specifying the maximum number of SSIS threads that can execute in parallel per package. The default is -1, which equates to number of physical or logical processor + 2.
2. EngineThreads: is property of each DataFlow task. This property defines how many threads the data flow engine can create and run in parallel. The EngineThreads property applies equally to both the source threads that the data flow engine creates for sources and the worker threads that the engine creates for transformations and destinations. Therefore, setting EngineThreads to 10 means that the engine can create up to ten source threads and up to ten worker threads.


Q138. How do we convert data type in SSIS? 

The Data Conversion Transformation in SSIS converts the data type of an input column to a different data type.


Q139. One Excel file contains 10 rows. 2nd Excel file contains 10 rows. There are 5 matching rows in both excel. How find non-matched rows from both excel and store in output excel file. 


Q140. Difference between Copy Column and Derived Column Transformations 

Both transformations can add new columns. Copy column can add new columns only through existing columns. Derived column can be used to create new column with or without help of existing columns. Derived column supports error output whereas Copy column cannot.

Q141. Let’s say if we have some reference data in Excel and we want to use that excel Data in Lookup Transformation, how we can achieve that without loading into staging or temp table

Cache Transformation


Q142. I have a source file that contains 1000 records, I want to insert 15% records in TableA and remaining in TableB which transformation I can use?


Percentage Sampling

Q143. There is no Union Transformation in SSIS, How to perform UNION operation by using built-in Transformation? 


Q144. Before you create your SSIS Package and load data into destination, you want to analyze your data, which task will help you to achieve that? 


Data Profiling Task


Q145. In Merge Join Transformation, we can use Inner Join, Left Join and Full Outer Join; Which Transformation is used to perform Cross Join. 

Workaround is to add the Derived Column Transformation to add a new column to each of the two sources that you want to join (call the columns [JoinCol]) and put a literal value (e.g. "1") into that column for every single row. In the Merge Join transform, simply join on the two [JoinCol] columns.


Q146. Direct Vs. Indirect Configuration in SSIS 
Direct configuration Pros:
-Doesn't need environment variables creation or maintenance
-Scale well when multiple databases (Test or DEV) are used on the same server
-Changes can be made to the configurations files (.dtsconfig) when deployment is made using SSIS deployment utility
Cons:
-Need to specify configuration file that we want to use when the package is triggered with DTExec (/conf switch).
-If multiple layers of packages are used (parent/child packages), need to transfer configured values from the parent to the child package using parent packages variables which can be tricky (if one parent variable is missing, the rest of the parent package configs (parameters) will not be transferred).
Indirect configuration Pros:
-All packages can reference the configuration file(s) via environment variable
-Packages can be deployed simply using copy/paste or xcopy, no need to mess with SSIS deployment utility
-Packages or application is not dependent of configuration switches when triggered with DTExec utility (command line is much simpler)
Cons:
-Require environment variables to be created -Does not support easily multiple databases (TEST and Pre-Prod) to be used on the same server


Q147. We get the data from flat file and how to remove Leading Zero, Trailing Zeros OR Leading and trailing both before insert into destination.
 
Use the Derived column Transformation to remove Leading/Trailing OR Both zero from the string. After removing Zeros you can Cast to Any data type you want such as Numeric, Int, float etc.
Leading Zeros: (DT_WSTR,50)(DT_I8)[YourInputColumn]
Trailing Zeros: REVERSE((DT_WSTR,50)(DT_I8)REVERSE([YourInputColumn])) Leading and Trailing Zeros: REVERSE((DT_WSTR,50)(DT_I8)REVERSE((DT_WSTR,50)(DT_I8)[YourInputColumn]))


Q148. Import Data in SSMS:
 
We can't apply transformations on source data with "Import Data", "Export Data".
Import and Export Wizard in SSIS: We can apply transformations on source data


Q149. How do I troubleshoot SSIS packages failed execution in SQL Agent job? 

When you see a SSIS package fails running in a SQL Agent job, you need to first consider the following conditions:
1. The user account that is used to run the package under SQL Server Agent differs from the original package author.
2. The user account does not have the required permissions to make connections or to access resources outside the SSIS package. The following 4 issues are common encountered in the SSIS forum.
1. The package's Protection Level is set to EncryptSensitiveWithUserKey but your SQL Server Agent service account is different from the SSIS package creator.
2. Data source connection issue.
3. File or registry access permission issue.
4. No 64-bit driver issue. Package Protection Level issue: For the 1st issue, you can follow the following steps to troubleshoot this issue: 1. Check what the Protection Level is in your SSIS package.
2. If the Protection Level is set to EncryptSensitiveWithUserKey, check the Creator in your SSIS package and compare it with the SQL Server Agent Service account.
3. If the Creator is different from the SQL Server Agent Service account, then the sensitive data of the SSIS package could not be correctly decrypted, which will lead to the failure. A common solution to this issue is that you create a proxy account for SSIS in SQL Server Agent and then specify the proxy account as the "Run as" account in the job step. The proxy account must be the same as the SSIS package creator. You can also change the SSIS package protection level to EncryptSensitiveWithPassword and specify the password in the command line in the job step. Data Source Connection Issue: It happens when you are using Windows Authentication for your data source. In this case, you need to make sure that the SQL Server Agent Services service account or your Proxy account has the permission to access your database.


Q150. How to run SSIS Packages using 32-bit drivers on 64-bit machine?
 
On 64 Operating System when you install Integration Services it will install 32-Bit and 64-Bit version of DTExec commandline tool which is used to execute SSIS packages. If your SSIS package is referencing any 32-Bit DLL or 32-Bit drivers from your package then you must use 32-Bit version of DTExec to execute SSIS package. If you have reference to any 32-Bit driver/dll then make sure you change Project Property Run64BitRuntime to False before you Debug your package in BIDS otherwise your package will try to load 64-Bit dlls instead of 32-Bit.
To change this setting - Right click on Project Node - Under Debugging option, set Run64BitRuntime=False