Tuesday, August 18, 2015

Lookup, Cache Transformation & Cache Connection

Lookup, Cache Transformation & Cache Connection

Cache Connection Manager
- This connection manager is to define the cache
- Cache will be stored in a memory, however we can create cache file to hold the data.
- Once we feed the data into that file (*.caw) , later in all data flow tasks, we can use connection manager and get the data from that cache file

Cache Transformation
- This is one of the Data flow transformation
- It uses memory cache / file cache as configured in Cache Connection Manager and pulls data from there
- This is a very helpful transformation in terms of performance improvement

Lookup Transformation
- Basically it is being used to lookup the data
- It is using equi-join.
- All the values from source table (joining column) SHOULD exist in Reference table. If not, it will throw an
   error
- NULL value also will be considered as non-matching row
- This is CASE-SENSITIVE.
- It can internally use Cache Connection / OLEDB Connection.
- There can be 3 cache types
         (A) Full Cache
                  - Cache connection manager can be used only with this type of cache
                  - In this case, when package starts, data will be pulled and kept into memory cache /  file cache
                     and later only cache will be used and not database
                  - We might not get the latest data from database
         (B) Partial Cache
                  - Cache connection manager can't be used with this
                  - "Advance" tab of Lookup transformation will be enabled only in case of Partial cache where one                                  
                     can configure cache size
                  - This is bit different than Full cache
                  - Initially cache will be empty. For each value, first it checks in Cache, if not found then goes to             
                    database, if found from database, then stores that value in cache so it can be used in later stage.
                  - Startup time will be less than Full cache but processing time will be longer
                  - Also lot of available memory should be there.
         (C) No Cache
                  - Every time, it will get it from database
- There are 3 outputs
         (A) Matching Row
         (B) No Matching Row
         (C) Error

EXAMPLE

Pre-requisite

Execute the following query in database


CREATE TABLE LookupDemo (EID int, EName varchar(10), CountryVARCHAR(10))
CREATE TABLE LookupDemoReference (Country VARCHAR(10), RegionVARCHAR(15))

CREATE TABLE LookupDemoOutput (EID int, EName varchar(10), CountryVARCHAR(10), Region VARCHAR(15))

INSERT INTO LookupDemo (EID,EName,Country) VALUES
(1,'Nisarg','India'),
(2,'Megha','INDIA'),
(3,'Swara','China'),
(4,'Nidhi','USA'),
(5,'Lalu','Japan')

INSERT INTO LookupDemoReference (Country, Region) VALUES
('India','ASIA'),
('India','ASIA-Pacific'),
('China','ASIA-Pacific'),
('USA','North America'),
('Japan','ASIA-Pacific')




Steps

1. Right click in Connection Tray and click on "New Connection"
2. select "CACHE" and click on Add
3. Check "Use File Cache" checkbox to store the cache data in a file.
4. select the file destination and give some name
5. Click on "columns" tab and start adding columns.
6. Add 2 columns (Country , Region) with IndexPosition 1 and 0 respectively.
    Index Position 1 : this will be a joining key on which we will use joining conditions
    Index Position 0 : Other columns

7. Drag data flow task and rename as "Build Cache"
8. Go inside DFT and drag OLEDB source which should refer to LookupDemoReference table.
9. Drag Cache Transformation and connect it with source
10. In Connection Manager, select created cache connection manager
11. click on mapping and make sure, you have proper mapping between columns.
12. Go back to control flow and drag one more data flow task and rename it to "Lookup Data"
13. Connect 1st DFT to 2nd DFT. It should look like this.
14. Go to Lookup Data and add OLEDB source which should point to LookupDemo table
15. Drag Lookup Transformation and connect it with source
16. Select "Full Cache" in General tab of Lookup Transformation. Select "Cache Connection"

17. Select created connection manager and in Connection tab
18. Do column mappings as shown below.
19. Drag OLEDB destination and connect it with Lookup Transformation. make sure to choose "Lookup Match output"
20. OLEDB destination should point to LookupDemoOutput table
20. Column mapping should be like this.

21. Now execute the package.
22. Package will be failed at lookup transformation
23. Reason of failure.
      - Source table has one record with INDIA which is not available in Reference table, this signifies that
         Lookup transformation is CASE-SENSITIVE
      - Lookup Transformation is different then Merge Join here. Merge join will ignore the rows if they are not
         matching, but Lookup will throw an error.
24a. Open Lookup Transformation and change settings like this.

24b. Now drag Derived Column Transformation and connect it with Lookup Transformation. Make sure Lookup No Match Output gets displayed on the arrow.
25. In Derived Column, add a new column like this. Make sure to cast it into DT_STR

26. Remove the connection between OLEDB Destination and Lookup Transformation.
26. Now drag "Union All Transformation" and connect it with Lookup Transformation. Make sure Lookup Match Output gets displayed on the arrow.
27. Select "Invalid Column" in last row, last column

28. Connect Union All transformation to OLEDB destination
29. Whole DFT will look like this.
30. Execute the package.
31. We can see that 1 rows has been redirected as "No match output" and finally using Union All it is reaching to destination.
32. Result table is looking like this.
33. This signifies following things.
      - Lookup Transformation is Case-sensitive
      - If 2 reference records are existing then Lookup Trans. picks up only 1st record.
      - Lookup Trans. fails if joining key column is not in reference table in case we haven't configure No
         Match output.




SSIS Lookup Transform

Introduction

SSIS data flow control has a very important transform called the Lookup Transform. So I decide to write this article related to it.


What is the use of Lookup Transform
The main functionality of the lookup join is to make a Join with other source with current source and fetch the result in a desired format.

[Current Source] + [Other Source] à [Fetch Result] à[Desired Format]


The source can be anyone of the following Chased object, Table, Destination file source, a result from query etc.
The Lookup transform be available for data source like SQL, ORACLE and DB2.


Case Study
Here we have a flat file (FlatFile-A.txt) which contains the product information. There is a SQL table object named tbl_LOOKUPMATCH which contain the match records of product information. The SSIS package use the Lookup Transform and segregate the matched records from flat file to a table objects called tbl_LOOKUPMATCHDESTINATION and unmatched records to another table objects called tbl_LOOKUPUNMATCHDESTINATION.
Please follow the diagram to understand the case study.



Configuring the Lookup Transform

Step-1 [ Flat file source, Lookup Match and Destination Table Objects ]



Creating Lookup Match and Destination Table Objects

-- Lookup Match Table
IF OBJECT_ID(N'tbl_LOOKUPMATCH', N'U') IS NOT NULL
   BEGIN
      DROP TABLE tbl_LOOKUPMATCH;
   END
GO

CREATE TABLE  tbl_LOOKUPMATCH
       (
          PRODCD   VARCHAR(50) NOT NULL PRIMARY KEY,
          PRODNAME VARCHAR(50) NOT NULL
       );
GO
-- Inserting Records in Lookup Match Table
INSERT INTO  tbl_LOOKUPMATCH 
       (PRODCD, PRODNAME)
VALUES ('1', 'RICE'),
       ('2', 'OIL');   


Lookup Match tables Records

-- Records of Lookup match table      
SELECT * FROM tbl_LOOKUPMATCH; 


PRODCD               PRODNAME
1                              RICE
2                              OIL

-- Creating Match Lookup Destination
IF OBJECT_ID(N'tbl_LOOKUPMATCHDESTINATION', N'U') IS NOT NULL
   BEGIN
      DROP TABLE tbl_LOOKUPMATCHDESTINATION;
   END
GO
CREATE TABLE  tbl_LOOKUPMATCHDESTINATION
       (
          PRODCD   VARCHAR(50) NOT NULL PRIMARY KEY,
          PRODNAME VARCHAR(50) NOT NULL
       );
GO 

-- Creating Un Match Lookup Destination
IF OBJECT_ID(N'tbl_LOOKUPUNMATCHDESTINATION', N'U') IS NOT NULL
   BEGIN
      DROP TABLE tbl_LOOKUPUNMATCHDESTINATION;
   END
GO
CREATE TABLE  tbl_LOOKUPUNMATCHDESTINATION
       (
          PRODCD   VARCHAR(50) NOT NULL PRIMARY KEY,
          PRODNAME VARCHAR(50) NOT NULL
       );
GO           

Step-2 [ SSIS Data flow ]



Step-3 [ Configuring Flat file source Editor ]



Step-4 [ Configuring Lookup Transform Editor ]







Step-5 [ Configuring OLE DB Destination Editor ]



Step-6 [ Run the SSIS Package ]






Hope you like it.