Wednesday, September 2, 2015

Add or Remove Tables from Data Source View in SSAS



If you are working with fresh project then you have to create your Data Source View as per the company requirements. Please refer SSAS Data Source View to create new Data Source View. In real time, We mostly work on existing projects so, someone already designed Data Source View.


When we are working with predesigned Data Source View and if we require few more additional tables to meet our requirements or if there are some unwanted tables to remove in Data source view then ADD or REMOVE Tables option will help to achieve the same. This option not only add or remove tables but also views.

Add or Remove Tables from Data Source View in SSAS Example

We have the simple Data Source View with 1 [Fact Internet Sales] and 5 dimension tables such as Dim Customer, Dim Product, Dim Sales Territory, Dim Currency and Dim Date.
SSAS Add or Remove Tables From Data Source View 1
For instance, We have the situation to check the sales amount and tax amount of the product subcategory and product category but we don’t have these tables in the Data Source View so no data is available. In these situations, Right click on the empty space in data source view and select Add or Remove Tables option from the context menu.
SSAS Add or Remove Tables From Data Source View 2
It will open the Add or Remove tables window like below
SSAS Add or Remove Tables From Data Source View 3
For adding, select the required tables in Available Objects and click on > button. This will add those tables to Included objects
For deleting the unwanted tables from Included Objects, select the table and click on <button from the below GUI form.
<< and >> buttons are used to add or remove all the tables from Included Objects.
There is one more button called Add related Tables. If we select one table in Included Objects and want to add all the tables which are related to that table using primary and foreign key relationship then this button do the trick for you.
SSAS Add or Remove Tables From Data Source View 4
From the above screenshot, you can observe that we added Dim Product Subcategory, Dim Product Category tables to the Included Objects.
Click ok and see the Data Source View to check the newly added tables from Add or Remove Tables option.
SSAS Add or Remove Tables From Data Source View 5
Thank you for Visiting Our Blog

Create New Project in SSAS




In this article we will show you, How create New Project in SQL Server Analysis Services with example.
Double click on the BIDS application to open the Business Intelligence Development Studio. To create New Analysis Services Project, Please click on the File Menu and then select theProject option as shown in the below screenshot.
Create New SSAS Project 1
Once you click on the Project option then, New Project window will be opened to select the Analysis Service project. Please select Analysis Services Multidimensional and Data MiningProject Template from the available Business Intelligence Templates
Create New SSAS Project 2
Please provide the Project Name and change the location as per your requirement. We just changes the Solution Name and Name only but you can try. Click ok to finish creating new Analysis Services Project.

Folders Seen in SSAS Project

Once we created a New SSAS Project. If you observe the solution Explorer we have the following folders.
Folders seen in SSAS New Project
  • Data Sources: Connection string and credentials. Please refer SSAS Data Sources article to understand, How to create New Data Source in SQL Server Analysis Services.
  • Data Source Views: Same like data set. Please refer SSAS Data Source View article to understand, How to create New Data Set in SQL Server Analysis Services.
  • Cubes: Helps to create the cube, include measures and measure groups. Please referCreate OLAP Cube in SSAS article to understand, How to create New Cube in SQL Server Analysis Service
  • Dimensions: Helps to add database level dimensions, these can be applicable to all the cubes, to access them every cube has to add to their own dimension folders (project level). Please refer Create Dimensions in SSAS article to understand, How to create New Dimensions in SQL Server Analysis Service
  • Mining Structures: For the data mining purpose
  • Roles: Provides security to the cube like assigning new roles etc.
  • Assemblies: If we require any external assemblies to work then add them to this folder
  • Miscellaneous: If the cube require any images or documents then add all of them to this folder

SSAS - Attribute relationship and Hierarchy of dimension

Most of the scenarios, there will be relation between different attributes of the same dimension and in this case we have to explictly define the relationship using dimenison designer. It improves the performance as well as affects calculations that are aggregated across these attributes. There is a chance of performance degrade if the relationship is not set properly. In this post we can see the steps to set attributes relationship and also important properties of the same.


We can use the same DSV(AdventureworkdDW2008) that I used to demonstrate named query where I have replaced the below query with the actual DimProduct table.

select a.ProductKey
         ,a.EnglishProductName
         ,b.ProductSubcategoryKey
         ,EnglishProductSubcategoryName
         ,c.ProductCategoryKey
         ,EnglishProductCategoryName
from dimProduct a
inner join DimProductSubcategory b on a.ProductSubcategoryKey = b.ProductSubcategoryKey
inner join DimProductCategory c on b.ProductCategoryKey = c.ProductCategoryKey

For each product sub category in the result set of this query is having multiple product and each product category is having multiple product sub category so there is one to many relationship exists between product sub category and product also produt category and product sub category.

Step 1 : Add DimProduct dimension from DSV to the dimension designer
Fig 1

















Step 2 :  Drag and drop English product category name from attributes window to heirarchy window and also English product sub category name and English product name in the same sequence as shown in the figure 2.

Fig 2










Step 3 : Click on the attribute relationships tab of the dimension designer and you can see the diagram looks like Fig 3

Fig 3









Step 4 : We can set the relationship between these attributes by two method. first method is just drag and drop the many relation field to one relation field ie. drag and drop English product name to English product subcategory name. Second method is right click on the relationship arrow and select edit which will open the windown as shown in Fig 4 and we can choose the many field from the name drop down of soure attribute.

Fig 4














Once you set the right relationship the diagram looks like figure 5

Fig 5



Two different types of relationship types are available in dimension design  ie. flexible and rigid. if the relationship between the attributes are non changeable over time, we should set its type as rigid and vice versa. For e.g birthdate of the advertiser is fixed which is rigid type. You can set the relationship type by right click on the relationship arrow and select relationship type as shown in the fig 6.


Fig 6








Setp 5 : Once you set the attribute relationship and the hierarchy, it is important to give proper key columns and name for each attribute. For identifying product uniquely we need to add product sub category key and product category key into the key columns of product along with product key as shown in figure 6. Same way we need to add productkey along with product subcategory key to identify it uniquely.
Fig 7
















Step 6 : We need to define column name since the key column is composite key. give the desription field into name column.
Fig 8









Now you can see in dimension browser that all fields along with newly created heirarchy is there for browsing which may lead the user in confused state.
Fig 9








To avoid this situation we need to set attributehierarchyvisible property of attribute to false to make sure that only hierarchy is available for browsing to avoid any kind of confusion for the user.

Fig 10












Once you set this property in dimension structure, browser window looks like the figure 10.


Fig 11


What is Dimension & Types of Dimensions with Examples

Dimension Table

                      It is the Master Table of Cube. It contains textual information more and more. Generally
A Specific business information (or) A particular task information is stored in this table. It is a collection
Of Hierarchies, Categories and logics. This can be used for a user to traverse in hierarchical nodes.

Types of Dimensions

Confirmed Dimension
The Dimension which is shared by two or more Fact tables (if it is used in multiple projects) is called as confirmed Dimension.
(Or)
The Dimension which is created only once and it is used in many schemas then it is called Confirmed Dimension.

Example: Customer Dimension can be used across the saving & current Fact Tables in Banking Environment.


Role Playing Dimension
A database Dimension that acts as multiple dimensions with in a cube is called as Role Playing Dimension. From the same table if we have multiple foreign keys in fact tables then the table acts differently for each key attribute.

Example: 
Time Dimension is one of the best Example of Role playing Dimension, you can have one Time Dimension called Date and then you can add ShipDate, DueDate and OrderDate as Cube Dimensions.


Junk Dimension 

It is a group of flags which gives true or false, yes or no, type of information. The attributes in the junk dimension do not belongs to the fact table. It contains a Unique key for all possible combinations of flags and use that unique key in the fact table.
This is not relegated to any Data warehouse schema and is used only for reference.

Example: 



Degenerated Dimension
It is a Dimension which is derived from fact tables and does not have its Entry in Dimension Tables.

Example: 


In above table contains Fact Internet Sales information directly taken from OLTP Database. We will create dimension table depends on Fact Table like this



Note: IF number of Rows in Fact Table = Number of Rows in Dimension Table (That time De-Generated Dimension is not possible)

Slowly Changing Dimension 

Slowly Changing Dimension (SCD) & Types of SCD's

Slowly Changing Dimension

Dimension source data may change over time, rather than changing on regular schedule, time base. In Datawarehouse there is need to track changes in dimension attributes in order to report historical data.

Slowly Changing Dimensions are often categorized into three types namely Type1, Type2 and Type3. The Following Example clearly explains the types of SCD’s.



Type 1: Updates existing record with modifications. (Does not Maintains History)

In the year of 2012, if the salary of the person to 19000, then the old values of the columns “Year” and “Salary” have to be updated with the new values. In this Type1, there is no way to find out the old salary of person Person Id – 1 in year 2012 since the table now contains only the new salary and year information.



Type 2: Creating an additional record (Does Maintains History)

In this Type 2, the old values will not be replaced but new values are stored in to another row of table. So at any point of time, the difference the old values and new values can be retrieved from database and early be compared. This is very useful for Real Time for reporting purposes.




Type 3: Creating New Fields. Keep old and new values in the existing row (Requires a design Change)

In this Type 3, the latest update to changed values can be seen. Example mentioned below how to add new columns and maintains old columns of that column changes as shown below.


The problem with the Type3 used in Real Time, if the salary of person1 changes, then added new columns to the same row that type u can change structure of cube every time.


Create a Dimension in SSAS










In last 2 articles, we created Data Source and Data Source View for SSAS. You can find these articles on below link :
–> Create a Data Source in SSAS.
–> Create a Data Source View in SSAS.
In this article, I will show you how to Create a Dimension in SSAS.
Before starting this, let’s have a look on What is Dimension in SSAS?
Dimensions are business objects or entities which contains a list of attributes that describe the object (dimension itself) and they are usually derived from some code tables. These attributes appear as attribute hierarchies and can be organized into user-defined hierarchies, or can be defined as parent-child hierarchies based on columns in the underlying dimension table.
Go through below steps to Create a Dimension in SSAS.
1. First of all open Visual Studio 2010 and open any SSAS project
2. In solution explorer, right click on Dimensions and select New Dimension.
2-Create a Dimension in SSAS
3. A Dimension Wizard window opens. In that check Don’t show this page again and click on Next button.
3-Create a Dimension in SSAS
4. A Select Creation Method page appears. In that select Use an existing table. Then click on Next Button.
4-Create a Dimension in SSAS
5. Now a Specify Source Information page appears. In that first select Data Source View and then select Main Table. Here we selectAdventureWorksDW2012 as Data Source View and DimProduct as Main Table. Make sure that ProductKey has been selected from the list of Name Column. Then click on Next Button.
5-Create a Dimension in SSAS
6. It will analyze the foreign key relationships in data source view and list related tables in next window. Here select DimProductSubcategory andDimProductCategory. Then click on Next Button.
6-Create a Dimension in SSAS
7. From the Available Attributes List, select English Product Name, Color, List Price, Size and Status. Also rename attributes if needed. So your screen should look like following.
7-Create a Dimension in SSAS
Then click on Next Button.
8. In next screen, you will see the summary of Dimension.Here give name of Dimension i.e. ProductDimension and then click on Finish button.
8-Create a Dimension in SSAS
9. Now your visual studio screen look like below screen shot :
9-Create a Dimension in SSAS
10. You can see newly created dimension under the Dimensions tab in Solution Explorer.
10-Create a Dimension in SSAS
Congratulations! We successfully created a Dimension in SSAS.



Create Dimension in SSAS









In SQL Server Analysis Services, Dimensions are the group of attributes (nothing but columns) based on the dimension tables in the data Source View. All these attributes are shown as attribute Hierarchies and Analysis services provides us an option to create User Defined Hierarchies as well.
In SSAS dimensions are of two types such as Database Dimension and Cube Dimension. Please refer Difference between Database Dimension and Cube Dimension in SSAS 2014article to know the Difference between Database Dimension and Cube Dimension in SQL Server Analysis Services.

Creating Dimensions in SSAS 2014 Example

Within the Solution Explorer, Right-click the Dimensions folder and select New Dimensionfrom the Context Menu to create new one.
Create Dimension in SSAS 1
By clicking on the New Dimension option will open the Dimension Wizard.
First page of the Dimension wizard is a Welcome page. If you don’t want to see this page again then Don’t show this page again option as shown in the below screenshot and click on the next
Create Dimension in SSAS 2
Next page is Select Creation Method and this page gives 4 options to create the Dimension.
  • Use an Existing Table: It will use the existing tables present in the Data Source View.
  • Generate a Time Table in the Data Source: This option will generate the time-table and saves it in the data source.
  • Generate a Time Table on the Server: This option will generate the time-table and saves the table directly in the Server.
  • Generate a Non-Time Table in the Data Source: This option will generate the normal tables and saves it in the data source. It has the template to select whether you want to create customer, Employee, Department, Geography etc.
In general, we use the first option only (Use an existing table) because it will use the required table from the Data Source View. However, if we don’t know how to create particular table then use the templates provided here to design it.
Create Dimension in SSAS 3
Click on the Next button will open the Specify Source Information page to configure the Data Source View, Table and Key columns.
For this Example, Select the Data Source View we created earlier in the SSAS Data Source View article. We intend to create a dimension from the DimCustomer table, so select the same table from the Main table drop down list. Every dimension table needs to have a key attribute and CustomerKey is the primary key column for this table.
Create Dimension in SSAS 4
Click on the next button will take the Dimension Wizard to Select Related Tables page. Here it will display the dependent tables (tables that are connected with Dim Customers using foreign Key relationship). For now we are selecting them.
Create Dimension in SSAS 5
Click on the Next Button will open the Select Dimension Attributes page. In this page we have 3 sections to understand.
  • Available Attributes: This section will display all the available columns (attributes) of the Dim Customer table present in the data Source View. We can select all the columns or else we can select required columns. It is always advisable to select the required columns.
  • Enable Browsing: If you check mark this option then that column will be available for browsing in cube. If you uncheck this option then they won’t be visible to client applications when they browse the dimension. Sometimes we may need id column for reference purpose but not required to slice the data then you can uncheck them from browsing.
  • Attribute Types: SQL Server Analysis Services provides many Attribute Types while creating Dimensions. Select the appropriate one and if you don’t know which attribute type to choose then select regular type. For example, For Calendar year column select the calendar year attribute type.
Create Dimension in SSAS 6
For the time being we selected few columns from the Dim Customer and also we haven’t changed the Attribute Types from regular to appropriate one.
NOTE: It is always a good practice to select the specific attribute type rather than the regular type. Especially for the time and account dimensions we must specify attribute type because it will help when working with the MDX.
Now the next step is to give specific or anything appropriate name to the dimension. In this example we are naming it as Dim Customer.
Create Dimension in SSAS 7
Click on the finish button to finish creating Dimension in SSAS.
From the below screenshot you can observe that,
  • In solution explorer we have Dim Customer dimension under the Dimensions folder.
  • Attributes pane have only the selected attributes (Columns). These are the ones we selected while creating Dimension using Dimension Wizard.
  • Hierarchies pane is empty because we haven’t created any User Defined Hierarchies yet.
  • Data Source View will show all the available columns in Dim Customer table.
Create Dimension in SSAS 8
Well we successfully created our First Dimension in SQL Server Analysis Services



Create Cube Dimension in SSAS








A cube dimension is an instance of a database dimension within a cube. Cube dimension are accessible inside that particular cube, We can’t access one cube dimension in another cube. Please refer Create Dimension in SSAS article to understand, How to create Database Dimension in SQL Server Analysis Services.

Creating Cube Dimension in SSAS 2014

In this example we are going to show you, How to Create Cube Dimension in SQL Server Analysis Services 2014.
STEP 1: Within the Dimension Pane, Right click on the Cube Name will open the Context Menu to choose the options. Since we are adding the Cube dimensions click on the Add cube dimension option
Cube Dimension in SSAS 1
Clicking on the Add cube dimension option will open the Add Cube Dimension window and the windows will show us all the available Dimension Tables in the Data Source View.
From this windows we have to select the required dimension as per your organization requirements.
Cube Dimension in SSAS 2
From the above screenshot you can observe that we have selected all the tables present in the Data Source View. Click ok to close the window.
Cube Dimension in SSAS 3
From the above screenshot we have to understand that, All the Dimensions present in the Dimensions Pane are called as Cube Dimensions and all the Dimensions present in the solution explorer are called as Database Dimensions.


Difference between Database Dimension and Cube Dimension in SSAS 2014








In SQL Server Analysis Services, there are Two types of Dimensions
  • Database Dimension
  • Cube Dimension

Database Dimension in SSAS

All the dimensions that are created using the Dimension Wizard of the Solution Explorer are treated as database dimensions. In other words, the dimensions which are at Database level are called Database Dimensions.
Database dimensions are independent to the cubes so single Database dimension can be used in multiple cubes. Please refer Create Dimensions in SSAS article to understand, How to create Database Dimensions in SQL Server Analysis Services
SSAS Database Dimension
From the above figure, all the dimensions such as Dim Customer, Dim Product etc.., which are placed under the Dimensions folder are Database Dimensions.

Cube Dimension in SSAS

A cube dimension is an instance of a database dimension within a cube. Cube dimension are accessible inside that particular cube, We can’t access cube dimension in another cube. Please refer Create Cube Dimension in SSAS article to understand, How to create Cube Dimensions in SQL Server Analysis Services.
Cube Dimension in SSAS 3
From the above screenshot we have to understand that, All the Dimensions present in the Dimensions Pane are called as Cube Dimensions and all the Dimensions present in the solution explorer are called as Database Dimensions.

Difference Between Database Dimension and Cube Dimension in SSAS

  • The Database dimension has only Name and ID properties, whereas a Cube dimension has several more properties.
  • Database dimensions are created once and we can use them in multiple cubes.
  • Cube dimensions are just a point of referenced to the Database dimension. We can’t access the cube dimensions outside the Cube
  • Cube dimensions can be created more than ones (we called it them as Role Playing Dimensions).
Thank you for Visiting Our Blog



Dimension

What is Dimension?

  • Something that quantifies a measure
  • For an example, consider this: If I just say… “20kg”, it does not mean anything. But if I say, "20kg of Rice (Product) is sold to Ramesh (customer) on 5th April (date)", then that gives a meaningful sense. These product, customer and dates are some dimension that qualified the measure - 20kg
  • Dimensions create a sense to identify a number

Dimension Column Types

  • Dimension table contains columns of different types
  • Keys
    • These are surrogate keys which are used to identify the rows uniquely
    • e.g. CustomerKey
  • Name Columns
    • Used for human names for easy identification of the entity
    • e.g CustomerFullName, CustomerNickName
  • Attributes
    • Used for pivoting in analysis
    • Will be used to analyse the business based on different perspectives
    • e.g. Gender, Marital Status, Age, etc..
  • Member Properties
    • Used for labels in the report
    • Generally we don't do pivoting or analysis on these columns but they are used to provide extra info on a report
    • e.g. Address, PhoneNumber, Email, etc..
  • Lineage Columns
    • Used for auditing, never exposed to end users
    • LastCreatedDate, LastModifiedBy, etc..

Types of Dimension

  • Conformed Dimension
  • Junk Dimension
  • Degenerated Dimension
  • Role Playing Dimension

Based on how frequently the data inside a dimension changes, we can further classify dimension as
  • Unchanging or static dimension (UCD)
  • Slowly changing dimension (SCD)
  • Rapidly changing Dimension (RCD)

Conformed Dimension

  • A dimension which is shared across multiple subject areas
  • e.g. "Customer" dimension used in both Marketting & Sales departments
  • Similarly Time & Date dimension

Degenerated Dimension

  • A Key in the Fact table, which does not have its own dimension
  • e.g. In InvoiceFact, we can have columns like TransactionAmount, InvoiceAmount, CustomerKey, CreatedDateKey, GeographyKey, InvoiceNumber, TransactionNumber
  • So we can see here InvoiceNumber & TransactionNumber don't have their own Dimensions like Customer, Date & Geography, but they are required to do other business activities

Junk Dimension

  • grouping of typically low-cardinality attributes (flags, indicators etc.) so that those can be removed from other tables and can be junked into an abstract dimension table
  • These junk dimension attributes might not be related. The only purpose of this table is to store all the combinations of the dimensional attributes which you could not fit into the different dimension tables otherwise
  • e.g. Gender & Marital Status, data looks like this
  • Gender
    Male
    Female
    Marital Status
    Single
    Married
    Divorced
  • So both these dimension have very low cardinality means they have very less number of distinct values
  • So rather than having Gender & MaritalStatus separately, we can apply cross join and generate one single structure like this
GM_id
Gender
MaritalStatus
1
Male
Single
2
Male
Married
3
Male
Divorced
4
Female
Single
5
Female
Married
6
Female
Divorced
  • Now we can refer GM_id in our Facts/Dimensions, this new dimension is called Junk Dimension
  • Improves manageability and improves SQL query performance

Role playing Dimension

  • A dimension which is being used at multiple places with different contextual meaning but having the same data
  • e.g. "Date"… for a particular business, we can have "Date of Sale", "Date of Delivery", "Date of Shipping", "Date of Hire", etc… basically all of them have the basic entity i.e. Date, but then context is different
  • So we create only 1 basic table (i.e. DateDimension) and then we refer it with different names, which are called Role playing Dimensions
  • They can be created separately in cube and in database, they can be created in terms of Views. e.g. On top of DateDimension, we create vwHireDateDimension, vwSalesDateDimension, etc…

Slowly Changing Dimension

Rapidly Changing Dimension

Rapidly Changing Dimension

  • Dimension where data changes are happening very rapidly
  • "Slowly Changing Dimension" blog tells how we can maintain the history of changes
  • But if we really implement the same technic for RCD, it will end up creating N number of Type-2 records for same entity and hence we need to come up with a smart design
  • We can handle this by creating Junk Dimension
  • Let's take an example of Customer Dimension, assume we have following columns
    • CUSTOMER_KEY
    • CUSTOMER_NAME
    • CUSTOMER_GENDER
    • CUSTOMER_MARITAL_STATUS
    • CUSTOMER_TIER
    • CUSTOMER_STATUS
  • Now, Name, Gender, MaritalStatus gets a change very rare, once in a while but Tier & Status get the change very frequently, assume that we are not interested in historical changes of Name, Gender & Marital Status, holding only latest value should be fine, but we need to track the history changes of Tier & Status
  • Now create a junk dimension by removing Tier & Status from Original Dimension
  • So our Original Customer Dimension will be like this
    • CUSTOMER_KEY
    • CUSTOMER_NAME
    • CUSTOMER_GENDER
    • CUSTOMER_MARITAL_STATUS
  • Create a new junk dimension for Tier, Status and create another mapping dimension, so overall structure will look like this



  • By doing so, new type-2 record generation will happen only in a mapping table and not in the main table, so that way we can have easy mainteanance.
  • Only disadvantage is - it will introduce extra joins to get the full picture of history changes of a particular customer
  • But maintenance and storage vise, we will get benefits


Slowly Changing Dimension

Slowly Changing Dimension

  • Often called as SCD, where the data of a dimension is changing slowly
  • Available in different types (SCD Type 0, Type 1, Type 2, Type 3, Type 6)
  • 1, 2 & 6 are most common

Type - 0

  • Where dimension changes are not considered
  • Even if the value gets changed in real scenario. 
  • It does not carry to the dimension and dimension still holds the old data

Type - 1

  • Where history is not maintained and table always show the latest data
  • Always updated with recent values
  • e.g. Before Change

Supplier_Key
Supplier_Code
Supplier_Name
Supplier_State
123
ABC
Acme Supply Co
CA

After Change

Supplier_Key
Supplier_Code
Supplier_Name
Supplier_State
123
ABC
Acme Supply Co
IL


Type - 2

  • Tracks historical updates by creating separate rows
  • Tracking can be done in 2 ways
  • 1st method is via Version

Supplier_Key
Supplier_Code
Supplier_Name
Supplier_State
Version.
123
ABC
Acme Supply Co
CA
0
124
ABC
Acme Supply Co
IL
1

So latest version # is the latest row, and that's how we preserve the history

  • 2nd method is via StartDate & EndDate

Supplier_Key
Supplier_Code
Supplier_Name
Supplier_State
Start_Date
End_Date
123
ABC
Acme Supply Co
CA
01-Jan-2000
21-Dec-2004
124
ABC
Acme Supply Co
IL
22-Dec-2004
NULL

Row with NULL EndDate is the latest row.

Type - 3

  • Tracks historical updates by creating separate columns
  • Type-2 is horizontal growth and Type-3 is vertical growth

Supplier_Key
Supplier_Code
Supplier_Name
Original_Supplier_State
Current_Supplier_State
123
ABC
Acme Supply Co
CA
IL

  • Due to difficult maintenance of vertical growth, only 1 level history is being tracked generally.
  • Let's say now state gets changed from IL to WA, then this is how the data will look

Supplier_Key
Supplier_Code
Supplier_Name
Original_Supplier_State
Current_Supplier_State
123
ABC
Acme Supply Co
IL
WA

So we lose the prior history.

Type - 4

  • Tracks history in a separate table
  • "Supplier" table
Supplier_key
Supplier_Code
Supplier_Name
Supplier_State
123
ABC
Acme Supply Co
IL
  • "SupplierHistory" table
Supplier_key
Supplier_Code
Supplier_Name
Supplier_State
Create_Date
123
ABC
Acme Supply Co
CA
22-Dec-2004

  • One table will maintain the latest data 
  • Other will keep on maintaining historical updates with CreationDate

Type -6

  • It is hybrid of Type-1,2 & 3
Supplier_Key
Supplier_Code
Supplier_Name
Current_State
Historical_State
Start_Date
End_Date
Current_Flag
123
ABC
Acme Supply Co
IL
CA
01-Jan-2000
21-Dec-2004
N
124
ABC
Acme Supply Co
IL
IL
22-Dec-2004
31-Dec-9999
Y

  • StartDate, EndDate will identify during which period that record was active
  • CurrentFlag will identify which is the latest record and which is the historical record
  • CurrentXYZ column will always hold the latest value for all the records



Role playing dimension

A fact table keeps the facts of a business process. It is built of two types of columns: measure columns and dimension key columns. Measure are the quantitative business data about the business process and dimension keys reference to the dimension tables which hold the textual attributes of the business process.
Typically a dimension key column of a fact table refers a dimension table but its also very common where multiple columns of a fact table refer to a single dimension table. When a single dimension table is linked from multiple dimension key columns of a fact table, that dimension table is known as role playing dimension. As multiple columns of a fact table can be associated with a single dimension table, a single database dimension table can play different role as cube dimensions for each association.
Lets have an example of FactInternetSales table from AdventureWorksDW2014database. Below is a customized look of FactInternetSales table;
Multiple date keys in FactInternetSales table
Multiple date keys in FactInternetSales table
Below is the DimDate dimension table from AdventureWorksDW2014  database;
DimDate
DimDate
We have multiple date columns in the above fact table, which are OrderDateKey,DueDateKey and ShipDateKey, and a single Date dimension table, which is DimDate. We don’t need to duplicate the DimDate dimension table for 3 times when we design the cube for internet sales data, we just need to refer the same DimDate dimension with different name for all three date key columns of the FactInternetSales table.
Role playing dimension
Role playing dimension

In above image, we can see that DimDate dimension is playing different role for each of its reference. We have only one DimDate dimension as the database table but there are three date dimensions as cube dimensions (red squared in image). A single date dimension has three different views; Order DateDue Date and Ship Date as cube dimensions in above example.
There could be “n” number of references to the same dimension table from a fact table, and for each reference, we refer the same database dimension table with different name as cube dimension. Role playing dimension not only reduces the space of the cube by removing the redundant data but it also improves the cube processing time as we need to process only one database dimension.


Degenerate Dimension

A high cardinality attribute column in the fact table which does not have any other content except its natural key and is required as a dimension for analysis or drill-down purpose, is called a degenerate dimension. As this degenerate dimension is constructed from a fact table item and is placed in the fact table, it is also known asfact dimension. It helps to reduce duplicate data by placing high cardinality dimension key in the fact table.
Degenerate dimension keys are significantly unique and are nearly as large as fact table. Moving this attribute in a single column dimension table requires a considerably large dimension table, and you would also need to join the surrogate key of the fact table to the dimension table which leads to performance issues, specially when fact table is reasonably large.
Degenerate dimension does not have its associated dimension table, but still it can be used to group related rows of fact table. For example, order number in purchase order fact table, dealing with multiple line items, can be a good candidate for degenerate dimension. Purchase order fact table can be grouped with order number to get the list of all associated items.
Another example could be a bug identification number in a defect data system which records the details of bugs occurred throughout various systems. Bug id can be used to find all related fact table rows, if needed. A free form comment text field can also be a good candidate for degenerate dimension or fact dimension.
Below is an example from AdventureWorksDW2014 database table. This is a customized view from FactInternetSales table to show an example of degenerate dimension along with other analytic dimensions and measures. In below image, we can see that attribute SalesOrderNumber is marked as degenerate dimension (fact dimension), as it does not have any other attribute except its natural key and the values in the column is almost unique throughout the table. SalesOrderNumber attribute can be a good candidate which can be degenerated as a fact dimension.

Degenerate dimension
Degenerate dimension or Fact dimension
Degenerate dimensions mostly occur in transaction level fact tables. If we have a dimension table which is growing along with fact table in the system, it can be reviewed for a degenerate dimension candidate.


Junk dimension 2


What is a Junk Dimension? Have you ever come across a scenario where many small dimension tables connect (using foreign key) to the fact table? In data warehouse, we have two types of tables, Fact and Dimension. Fact table contains business facts as measures and references to the dimension tables. Dimension tables have attributes which contains textual information of business facts and are used to filter and label the data.
Typically, fact tables are deeper whereas dimension tables are wider. Sometimes it happens that we have many narrow and low cardinality (low cardinality : few number of rows in the table) dimensions in the system; like flags and indicators. To keep all these information in fact table, we need to connect all these dimension tables (available in system or created from flag and indicator attributes) with fact table. Connecting all these small dimension tables converts fact table in a centipede fact with possible performance degradation by increasing the number of referenced tables with each row. Also, having many small dimensions increases the total number of dimensions in data warehouse which needs an extra amount of effort for maintenance.
In this post we are going to explore junk dimension and its usage in data warehouse designing with the help of an example. Lets start with the definition of Junk Dimension.

What is a Junk Dimension?

To avoid a centipede fact table in our data warehouse, we create a single dimension table by combining attributes from miscellaneous low cardinality dimensions. This combined dimension table covers all possible set of values which might occur in fact table and is known as junk dimension. The Junk dimension is not a Cartesian product of all miscellaneous dimensions, but it must cover all possible set of values from combined dimension tables which are present or might appear in fact table. Having a single dimension table for such type of indicator and flag dimensions or attributes would not only decrease the number of dimensions, but also require less number of tables to be referred by fact table. We refer the key of the junk dimension instead of individual low cardinality dimension tables in fact table.

Junk dimension example

To demonstrate the junk dimension, have a look on below Fact_Product table which is a dummy fact table. Fact_Product has many wide and high cardinality dimensions attached with their dimension table surrogate key. This table also has few low cardinality small dimension tables which are highlighted in red rectangle in below image.
Junk dimension - Fact_Product
Fact_Product
Below is the attribute level detail of these low cardinality dimension tables;
Junk dimension - Low cardinality dimensions
Low cardinality dimension tables
We can create a single dimension for all above small and low cardinality dimensions in such way that it covers all possible values present or values that may appear later in fact table. Junk dimension created from low cardinality dimensions would multiply the number of rows significantly which is quite better than having billions of records in fact table.
Below is the junk dimension from above miscellaneous dimensions;
Junk Dimension
Junk Dimension – Sample data
Above junk dimension has {(Dim_ProductType *  Dim_TaxInformation * Dim_ProductQuality * Dim_Expiry * Dim_StorageType * Dim_DeliveryMode) – (number of non relevant business rows)} which is (6 x 3 x 3 x 3 x 3 x 3) = 1458 – 0 (might vary as per the business requirement) = 1458.