SSIS Slowly Changing Dimension (SCD) Part –II
Introduction
In my previous article we try to demonstrate the concept of SSIS Slowly Changing Diagram with example of Changing Attributes (Type-1) example.
In this article we are trying to demonstrate the Type -2 that is Historical Attributes.
What is Historical Attributes
Recalling from our previous article:
Here we need to maintain the history of the records, where some particular value changes. Here we take an example to understand the Type-2.
Here the employee gets promoted, so the job title of the employee changes. In this case we need to maintain the history of the employee. For this type of changes there will be multiple records for the same employee with different job title. Then to indentify the current records, we can either add a column as current flag, which will be 'Y' for the current or latest records, Or else we can add two columns as START_DATE and END_DATE, through which we can maintain history of employee's records. This SCD directs these rows to two outputs: Historical Attribute Inserts Output and New Output.
EMPID
|
FIRST_NAME
|
JOB_TITLE
|
START_DATE
|
END_DATE
|
CURRENT
|
101
|
SUBAL
|
GRADE-C
|
01-01-2010
|
01-01-2011
|
N
|
101
|
SUBAL
|
GRADE-B
|
01-01-2011
|
01-01-2012
|
N
|
101
|
SUBAL
|
GRADE-A
|
01-01-2012
|
01-01-2099
|
Y
|
How to configure the Type-2 (Historical Attributes)
Step -1 [ Creating Tables ]
Here I am creating two tables with same structure. The table one is tbl_Employee_Details wich contains the original employee records and tbl_Employee_Details_Change which contains the records need to be changed to the tbl_Employee_Details table objects.
Here the three columns are important CURRENT_FLAG, STARTDATE, ENDDATE. In the tbl_Employee_Details_Change contains the Job titles changes of the employee.
-- Base Table
IF OBJECT_ID(N'dbo.tbl_Employee_Details', N'U') IS NOT NULL
BEGIN
DROP TABLE dbo.tbl_Employee_Details;
END
GO
CREATE TABLE dbo.tbl_Employee_Details
(EMPID INT,
FIRST_NAME VARCHAR(50),
MIDDLE_NAME VARCHAR(50),
LAST_NAME VARCHAR(50),
DOB DATETIME,
NATIONALITY VARCHAR(50),
JOB_TITLE VARCHAR(20),
MARITAL_STATUS CHAR(1),
EMPLOYEE_GENDER CHAR(1),
CURRENT_FLAG INT,
STARTDATE DATETIME,
ENDDATE DATETIME DEFAULT('01-01-2019'));
GO
-- Table for Changing Records
IF OBJECT_ID(N'dbo.tbl_Employee_Details_Change', N'U') IS NOT NULL
BEGIN
DROP TABLE dbo.tbl_Employee_Details_Change;
END
GO
CREATE TABLE dbo.tbl_Employee_Details_Change
(EMPID INT,
FIRST_NAME VARCHAR(50),
MIDDLE_NAME VARCHAR(50),
LAST_NAME VARCHAR(50),
DOB DATETIME,
NATIONALITY VARCHAR(50),
JOB_TITLE VARCHAR(20),
MARITAL_STATUS CHAR(1),
EMPLOYEE_GENDER CHAR(1),
CURRENT_FLAG INT,
STARTDATE DATETIME,
ENDDATE DATETIME DEFAULT('01-01-2019'));
GO
Step – 2 [ Inserting Records ]
-- Inserting Recors in Base Table
INSERT INTO dbo.tbl_Employee_Details
(EMPID, FIRST_NAME, MIDDLE_NAME, LAST_NAME,
DOB, NATIONALITY, JOB_TITLE, MARITAL_STATUS,
EMPLOYEE_GENDER, CURRENT_FLAG, STARTDATE)
VALUES (101, 'SUBAL', 'CHANDRA', 'DAS',
'12-07-1945', 'INDIAN', 'GRADE-B', 'M',
'M', 0, '01-01-2010'),
(102, 'JOYDEEP', NULL, 'DAS',
'12-17-1974', 'INDIAN', 'GRADE-B', 'M',
'M', 0, '07-01-2010');
-- Inserting Recors for Change
INSERT INTO dbo.tbl_Employee_Details_Change
(EMPID, FIRST_NAME, MIDDLE_NAME, LAST_NAME,
DOB, NATIONALITY, JOB_TITLE, MARITAL_STATUS,
EMPLOYEE_GENDER, CURRENT_FLAG, STARTDATE)
VALUES (101, 'SUBAL', 'CHANDRA', 'DAS',
'12-07-1945', 'INDIAN', 'GRADE-A', 'M',
'M', 1, '08-01-2012');
Step – 3 [ SSIS Data Flow ]
Step – 4 [ OLE DB Source Configuration ]
Step – 5 [ Slowly Changing Dimension Wizard Configuration ]
Step – 6 [ Run the SSIS Package ]
Step – 7 [ Observation ]
SELECT * FROM tbl_Employee_Details;
Hope you like it.
SSIS Slowly Changing Dimension (SCD) Part- I
Introduction
Slowly Changing Dimension (SCD) as the name suggest that a dimension which is changing slowly. To understand it properly we take an example of Employee Table object in our database.
Object name: Tbl_Employee_Details
Attributes Name
|
EMPID
|
FIRST_NAME
|
MIDDLE_NAME
|
LAST_NAME
|
DOB
|
NATIONALITY
|
JOB_TITLE
|
MARITAL_STATUS
|
EMPLOYEE_GENDER
|
If we observe this table objects the records of the employee is not change frequently. But we cannot tell that the records never change. It may change occasionally.
Here we are providing some situation when the records can be changes in the employee table.
1. If the name of employee is wrongly entered in the table (may be the spelling mistake).
2. The employee gets married so the marital status is going to change.
3. The Last name of the employee is going to change.
4. The job title of the employee may be changed after getting promotion.
Some of the attributes of the Employee objects not change like DOB, Employee Gender, and Nationality if we are not going to enter wrong data mistakenly.
Others attribute changes may occurs but may happened after certain period of time.
Type of Changes supported by SCD
SCD supports four types of changes
1. Changing Attributes
2. Historical Attributes
3. Fixed Attributes
4. Inferred numbers
Type-1 [ Changing Attributes ]
When we are going to overwrite the column existing records. Here we are taking an example to understand the Type-1.
Suppose we have a situation where the first name of the Employee is misspelled and the wrong spelling is stored in the first name of the employee. For making the first name correct we need to add one more records for same employee so that we can overwrite the first name of the employee. This SCD transformation directs these rows to an output named Changing Attributes Updates Output.
EMPID
|
FIRST_NAME
|
FIRST_NAME
|
MIDDLE_NAME
|
101
|
SUBOOOOL
|
CHANDRA
|
DAS
|
101
|
SUBAL
|
CHANDRA
|
DAS
|
Type-2 [ Historical Attributes ]
Here we need to maintain the history of the records, where some particular value changes. Here we take an example to understand the Type-2.
Here the employee gets promoted, so the job title of the employee changes. In this case we need to maintain the history of the employee. For this type of changes there will be multiple records for the same employee with different job title. Then to indentify the current records, we can either add a column as current flag, which will be 'Y' for the current or latest records, Or else we can add two columns as START_DATE and END_DATE, through which we can maintain history of employee's records. This SCD directs these rows to two outputs: Historical Attribute Inserts Output and New Output.
EMPID
|
FIRST_NAME
|
JOB_TITLE
|
START_DATE
|
END_DATE
|
CURRENT
|
101
|
SUBAL
|
GRADE-C
|
01-01-2010
|
01-01-2011
|
N
|
101
|
SUBAL
|
GRADE-B
|
01-01-2011
|
01-01-2012
|
N
|
101
|
SUBAL
|
GRADE-A
|
01-01-2012
|
01-01-2099
|
Y
|
Type-3 [ Fixed Attributes ]
Here in this example DOB, Employee Gender, and Nationality should never be changed. If any changes occur these columns value then either it should throw exception (error) or changes can be saved in some other destination. These changes not are applied in columns.
This SCD transformation detects changes and directs the rows with changes to an output named Fixed Attribute Output.
Type-4 [ Inferred Number ]
This are the records of the dimension, which are found missing during fact load. If we take an example, say there is a fact table which contains employee and department information. While generating the fact table from employee table and department table , sometimes happens that employee table contains some departments name which has no records in department table, and during fact table generation those records are found missing from department table, these kind of member of dimension department are called inferred member. It's like 'Fact arriving earlier than dimensions'.
This SCD transformation directs these rows to an output named Inferred Member Updates. When data for the inferred member is loaded, we can update the existing record rather than create a new one.
TYPE-1 Changing Attributes Example of SCD
Step-1 [ Creating Tables ]
Here I am creating two table with same structure. The table one is tbl_Employee_Details wich contains the original employee records and tbl_Employee_Details_Change which contains the records need to be changed to the tbl_Employee_Details table objects.
Here we observe that the First name of the Employee is wrongly entered in the table tbl_Employee_Details. So the correct value of the same records is put to the tbl_Employee_Details_Change table.
-- Base Table
IF OBJECT_ID(N'dbo.tbl_Employee_Details', N'U') IS NOT NULL
BEGIN
DROP TABLE dbo.tbl_Employee_Details;
END
GO
CREATE TABLE dbo.tbl_Employee_Details
(EMPID INT,
FIRST_NAME VARCHAR(50),
MIDDLE_NAME VARCHAR(50),
LAST_NAME VARCHAR(50),
DOB DATETIME,
NATIONALITY VARCHAR(50),
JOB_TITLE VARCHAR(20),
MARITAL_STATUS CHAR(1),
EMPLOYEE_GENDER CHAR(1));
GO
-- Table for Changing Records
IF OBJECT_ID(N'dbo.tbl_Employee_Details_Change', N'U') IS NOT NULL
BEGIN
DROP TABLE dbo.tbl_Employee_Details_Change;
END
GO
CREATE TABLE dbo.tbl_Employee_Details_Change
(EMPID INT,
FIRST_NAME VARCHAR(50),
MIDDLE_NAME VARCHAR(50),
LAST_NAME VARCHAR(50),
DOB DATETIME,
NATIONALITY VARCHAR(50),
JOB_TITLE VARCHAR(20),
MARITAL_STATUS CHAR(1),
EMPLOYEE_GENDER CHAR(1));
GO
Step-2 [ Inserting Records into Tables ]
-- Inserting Recors in Base Table
INSERT INTO dbo.tbl_Employee_Details
(EMPID, FIRST_NAME, MIDDLE_NAME, LAST_NAME,
DOB, NATIONALITY, JOB_TITLE, MARITAL_STATUS,
EMPLOYEE_GENDER)
VALUES (101, 'SUBOOOOL', 'CHANDRA', 'DAS',
'12-07-1945', 'INDIAN', 'GRADE-A', 'M',
'M'),
(102, 'JOYDEEP', NULL, 'DAS',
'12-17-1974', 'INDIAN', 'GRADE-B', 'M',
'M');
-- Inserting Recors for Change
INSERT INTO dbo.tbl_Employee_Details_Change
(EMPID, FIRST_NAME, MIDDLE_NAME, LAST_NAME,
DOB, NATIONALITY, JOB_TITLE, MARITAL_STATUS,
EMPLOYEE_GENDER)
VALUES (101, 'SUBAL', 'CHANDRA', 'DAS',
'12-07-1945', 'INDIAN', 'GRADE-A', 'M',
'M');
Step - 3 [ SSIS Data Flow Tasks ]
Step - 4 [ OLE DB Source Editor ]
Step - 5 [ Slowly Changing Dimension Wizard ]
Step – 6 [ OLE DB Command ]
It configured automatically
In SQL Command
UPDATE [dbo].[tbl_Employee_Details_Change]
SET [FIRST_NAME] = ?,[LAST_NAME] = ?,[MIDDLE_NAME] = ?
WHERE [EMPID] = ?
Step – 7 [ OLE DB Destination ]
Step – 8 [ Run the SSIS Package ]
Step – 9 [ Observation ]
SELECT * FROM tbl_Employee_Details;
In my next article I am showing the example of TYPE-2 (Historical Attributes).
Hope you like it.