Microsoft Technologies: SQL Server Analysis Services Interview Questions and Answers

Q: What is Analysis Services? List out the features?
Microsoft SQL Server 2014 Analysis Services (SSAS) delivers online analytical processing (OLAP) and data mining functionality for business intelligence applications. Analysis Services supports OLAP by letting us design, create, and manage multidimensional structures that contain data aggregated from other data sources, such as relational databases. For data mining applications, Analysis Services lets we design, create, and visualize data mining models that are constructed from other data sources by using a wide variety of industry-standard data mining algorithms.
Analysis Services is a middle tier server for analytical processing, OLAP, and Data mining. It manages multidimensional cubes of data and provides access to heaps of information including aggregation of data. One can create data mining models from data sources and use it for Business Intelligence also including reporting features.
Analysis service provides a combined view of the data used in OLAP or Data mining. Services here refer to OLAP, Data mining. Analysis services assists in creating, designing and managing multidimensional structures containing data from varied sources. It provides a wide array of data mining algorithms for specific trends and needs.

Some of the key features are:

§ Ease of use with a lot of wizards and designers.

§ Flexible data model creation and management

§ Scalable architecture to handle OLAP

§ Provides integration of administration tools, data sources, security, caching, and reporting etc.

§ Provides extensive support for custom applications

Q: What is UDM? Its significance in SSAS?
The role of a Unified Dimensional Model (UDM) is to provide a bridge between the user and the data sources. A UDM is constructed over one or more physical data sources, and then the end user issues queries against the UDM using one of a variety of client tools, such as Microsoft Excel. At a minimum, when the UDM is constructed merely as a thin layer over the data source, the advantages to the end user are a simpler, more readily understood model of the data, isolation from heterogeneous backend data sources, and improved performance for summary type queries. In some scenarios a simple UDM like this is constructed totally automatically. With greater investment in the construction of the UDM, additional benefits accrue from the richness of metadata that the model can provide.

The UDM provides the following benefits:
• Allows the user model to be greatly enriched.
• Provides high performance queries supporting interactive analysis, even over huge data volumes.
• Allows business rules to be captured in the model to support richer analysis.

Q:What is the need for SSAS component?

§ Analysis Services is the only component in SQL Server using which we can perform Analysis and Forecast operations.

§ SSAS is very easy to use and interactive.

§ Faster Analysis and Troubleshooting.

§ Ability to create and manage Data warehouses.

§ Apply efficient Security Principles.

Q: Explain the TWO-Tier Architecture of SSAS?

§ SSAS uses both server and client components to supply OLAP and data mining functionality BI Applications.

§ The server component is implemented as a Microsoft Windows service. Each instance of Analysis Services implemented as a separate instance of the Windows service.

§ Clients communicate with Analysis Services using the standard XMLA (XML For Analysis) , protocol for issuing commands and receiving responses, exposed as a web service.

Q: What are the components of SSAS?

§ An OLAP Engine is used for enabling fast adhoc queries by end users. A user can interactively explore data by drilling, slicing or pivoting.

§ Drilling refers to the process of exploring details of the data.

§ Slicing refers to the process of placing data in rows and columns.

§ Pivoting refers to switching categories of data between rows and columns.

§ In OLAP, we will be using what are called as Dimensional Databases.

Q: What is FASMI ?
A database is called an OLAP Database if the database satisfies the FASMI rules :

§ Fast Analysis– is defined in the OLAP scenario in five seconds or less.

§ Shared – Must support access to data by many users in the factors of Sensitivity and Write Backs.

§ Multidimensional – The data inside the OLAP Database must be multidimensional in structure.

§ Information – The OLAP database Must support large volumes of data..

Q: What languages are used in SSAS ?

§ Structured Query Language (SQL)

§ Multidimensional Expressions (MDX) - an industry standard query language orientated towards analysis

§ Data Mining Extensions (DMX) - an industry standard query language oriented toward data mining.

§ Analysis Services Scripting Language (ASSL) - used to manage Analysis Services database objects.

Q: How Cubes are implemented in SSAS ?

§ Cubes are multidimensional models that store data from one or more sources.

§ Cubes can also store aggregations

§ SSAS Cubes are created using the Cube Wizard.

§ We also build Dimensions when creating Cubes.

§ Cubes can see only the DSV( logical View).

Q: What is the difference between a derived measure and a calculated measure?
The difference between a derived measure and a calculated measure is when the calculation is performed. A derived measure is calculated before aggregations are created, and the values of the derived measure are stored in the cube. A calculated measure is calculated after aggregations are created, and the values of a calculated measure aren’t stored in the cube. The primary criterion for choosing between a derived measure and a calculated measure is not efficiency, but accuracy.

Q: What is a partition?
A partition in Analysis Services is the physical location of stored cube data. Every cube has at least one partition by default. Each time we create a measure group, another partition is created. Queries run faster against a partitioned cube because Analysis Services only needs to read data from the partitions that contain the answers to the queries. Queries run even faster when partition also stores aggregations, the pre calculated totals for additive measures. Partitions are a powerful and flexible means of managing cubes, especially large cubes.

Q: While creating a new calculated member in a cube what is the use of property
called non-empty behavior?
Nonempty behavior is important property for ratio calculations. If the denominator Is empty, an MDX expression will return an error just as it would if the denominator Were equal to zero. By selecting one or more measures for the Non-Empty Behavior property, we are establishing a requirement that each selected measure first be evaluated before the calculation expression is evaluated. If each selected measure is empty, then The expression is also treated as empty and no error is returned.

Q: What is a RAGGED hierarchy?
Under normal circumstances, each level in a hierarchy in Microsoft SQL Server Analysis Services (SSAS) has the same number of members above it as any other member at the same level. In a ragged hierarchy, the logical parent member of at least one member is not in the level immediately above the member. When this occurs, the hierarchy descends to different levels for different drilldown paths. Expanding through every level for every drilldown path is then unnecessarily complicated.

Q: What are the roles of an Analysis Services Information Worker?

The role of an Analysis Services information worker is the traditional "domain expert" role in business intelligence (BI) someone who understands the data employed by a solution and is able to translate the data into business information. The role of an Analysis Services information worker often has one of the following job titles: Business Analyst (Report Consumer), Manager (Report Consumer), Technical Trainer, Help Desk/Operation, or Network Administrator.

Q: What are the different ways of creating Aggregations?
We can create aggregations for faster MDX statements using Aggregation Wizard or thru UBO – Usage Based Optimizations. Always, prefer UBO method in realtime performance troubleshooting.

Q: What is WriteBack? What are the pre-conditions?
The Enable/Disable Writeback dialog box enables or disables writeback for a measure group in a cube. Enabling writeback on a measure group defines a writeback partition and creates a writeback table for that measure group. Disabling writeback on a measure group removes the writeback partition but does not delete the writeback table, to avoid unanticipated data loss.

Q: What is processing?
Processing is a critical and resource intensive operation in the data warehouse lifecycle and needs to be carefully optimized and executed. Analysis Services offers a high performance and scalable processing architecture with a comprehensive set of controls for database administrators.
We can process an OLAP database, individual cube, Dimension or a specific Partition in a cube.

Q: Name few Business Analysis Enhancements for SSAS?
The following table lists the business intelligence enhancements that are available in Microsoft SQL Server Analysis Services (SSAS). The table also shows the cube or dimension to which each business intelligence enhancement applies, and indicates whether an enhancement can be applied to an object that was created without using a data source and for which no schema has been generated.

*Enhancement*	*Type*	*Applied to*	No data source
Time Intelligence	Cube	Cube	No
Account Intelligence	Dimension	Dimension or cube	No
Dimension Intelligence	Dimension	Dimension or cube	Yes
Custom Aggregation	Dimension	Dimension (unary operator) or cube	No
Semiadditive Behavior	Cube	Cube	Yes
Custom Member Formula	Dimension	Dimension or cube	No
Custom Sorting and Uniqueness Settings	Dimension	Dimension or cube	Yes
Dimension Writeback	Dimension	Dimension or cube	Yes

Q: What MDX functions do you most commonly use?
This is a great question because you only know this answer by experience. If you ask me this question, the answer practically rushes out of me. “CrossJoin, Descendants, and NonEmpty, in addition to Sum, Count, and Aggregate. My personal favorite is CrossJoin because it allows me identify non-contiguous slices of the cube and aggregate even though those cube cells don’t roll up to a natural ancestor.” Indeed, CrossJoin has easily been my bread and butter.

Q: Where do you put calculated members?
The reflexive answer is “in the Measures dimension” but this is the obvious answer. So I always follow up with another question. “If you want to create a calculated member that intersects all measures, where do you put it?” A high percentage of candidates can’t answer this question, and the answer is “In a dimension other than Measures.” If they can answer it, I immediately ask them why. The answer is “Because a member in a dimension cannot intersect its own relatives in that dimension.”

Q: How do I find the bottom 10 customers with the lowest sales in 2003 that were not null?
A: Simply using bottomcount will return customers with null sales. You will have to combine it with NONEMPTY or FILTER.

SELECT { [Measures].[Internet Sales Amount] } ON COLUMNS ,
BOTTOMCOUNT(
NONEMPTY(DESCENDANTS( [Customer].[Customer Geography].[All Customers]
, [Customer].[Customer Geography].[Customer] )
, ( [Measures].[Internet Sales Amount] ) )
, 10
, ( [Measures].[Internet Sales Amount] )
) ON ROWS
FROM [Adventure Works]
WHERE ( [Date].[Calendar].[Calendar Year].&[2003] ) ;

Q: How in MDX query can I get top 3 sales years based on order quantity?

By default Analysis Services returns members in an order specified during attribute design. Attribute properties that define ordering are "OrderBy" and "OrderByAttribute". Lets say we want to see order counts for each year. In Adventure Works MDX query would be:

SELECT {[Measures].[Reseller Order Quantity]} ON 0
, [Date].[Calendar].[Calendar Year].Members ON 1
FROM [Adventure Works];

Same query using TopCount:
SELECT
{[Measures].[Reseller Order Quantity]} ON 0,
TopCount([Date].[Calendar].[Calendar Year].Members,3, [Measures].[Reseller Order Quantity]) ON 1
FROM [Adventure Works];

Q: How do you extract first tuple from the set?

Use could usefunctionSet.Item(0)
Example:

SELECT {{[Date].[Calendar].[Calendar Year].Members
}.Item(0)}
ON 0
FROM [Adventure Works]

Q: How can I setup default dimension member in Calculation script?

You can use ALTER CUBE statement. Syntax:
ALTER CUBE CurrentCube | YourCubeName UPDATE DIMENSION , DEFAULT_MEMBER='';

What are Database Dimension and Cube Dimension? What is the difference between them?

A Database Dimension is a Dimension which exists independent of a Cube. You can see a Database Dimension as a separate entity/file in the Solution Explorer/Project Folder. An analogy, this is pretty similar to a Class in a Programming Language.

A Cube Dimension is an Instance of a Database Dimension. An analogy, this is pretty similar to an Object (Instance of a Class) in a Programming Language.

Here are some of the highlights/differences of Database and Cube Dimensions:

A Database Dimension is independent of a Cube, but a Cube Dimension is always a part of a Cube.
One Database Dimension can be added to a Cube more than once (Role-Playing Dimension) as different instances. In this scenario, each Instance is treated as a separate Cube Dimension.
A Database Dimension can be added to more than one Cube, whereas a Cube Dimension can belong to a Single Cube at any given point.

What is a Linked Dimension? In what scenarios do you use it?

A Linked Dimension is a Dimension which is based on (Linked To) another Database Dimension which might be either located on the same Analysis Services server as the Linked Dimension or on a different Analysis Services server.

Linked Dimensions can be used when the exact same dimension can be used across multiple Cubes within an Organization like a Time Dimension, Geography Dimension etc.

Here are some of the highlights of a Linked Dimension:

More than one Linked Dimension can be created from a Single Database Dimension.
These can be used to implement the concept of Conformed Dimensions.
For an end user, a Linked Dimension appears like any other Dimension.

What are the different ways to create a Time Dimension in Analysis Services?

Time Dimension is one of the most important and most common type of dimensions as pretty much every metric is analyzed over time. Analysis Services offers following different ways to create a Time Dimension:

Create Using an Existing Table: This is one of the common approaches for building a Time Dimension. In this approach, a table is created in the underlying data source and pre-populated with data and it is then used to create the Time Dimension in SSAS.
Generate a Time Table in the Data Source: In this approach there will be no underlying table and at the time of creation of a Time Dimension in SSAS, Analysis Services creates a table in the underlying data source (relational database) and populates it with the required data using the date range, attributes, and calendars etc. which are specified at the time of creation of Time Dimension in Business Intelligence Development Studio. This option requires permissions to create a table in the underlying data source.
Generate a Time Table on the Server: In this approach there will be no underlying table and at the time of creation of a Time Dimension in SSAS, Analysis Services creates a table on the server and this table is used to generate the Time Dimension by the wizard. The Dimension created using this approach is called a Server Time Dimension.

What is Type property of a Dimension? What is the purpose of setting this property?

Type property of a Dimension is used to specify the type of information that the Analysis Services Dimension contains. Like a Time Dimension (Contains Years, Quarters, Months, Dates, and so on), Geography Dimension (Contains Geographical Information), and Accounts Dimension (Contains Accounts related information) etc. This property is set to Regular by default.

There are basically two important uses of setting this property:

This property is used by the Business Intelligence Wizard to assign/generate standard classification/MDXexpressions.
This property is used by the Analysis Services Client Applications like Excel etc. to adjust the user interface/rendering format appropriately. For example, when a dimension is added to a Pivot Table in Excel, certain types of dimensions are automatically added across the columns axis (like Time Dimension) and a few other types of dimensions are added to the rows axis (Like Product Dimension) automatically.

What is a Storage Mode? What are the different storage modes applicable to Dimensions?

A Storage Mode defines the location in which the Dimension data will be stored and the format (Relational or Multidimensional) in which the data will be stored.

Following are the two Storage Modes supported by Dimensions:

ROLAP (Relational On-Line Analytical Processing):

When the Storage Mode is set to ROLAP for a Dimension, then the Dimension Data will be stored in the relational database tables.
This storage Mode offers effective memory usage by avoiding duplication of data compared to MOLAP Storage Mode.
Using this Storage Mode will result in a slowdown in the query performance compared to MOLAP Storage Mode.

MOLAP (Multidimensional On-Line Analytical Processing):

When the Storage Mode is set to MOLAP for a Dimension, then the Dimension Data will be stored in a multidimensional format in the Analysis Services/OLAP Server.
This Storage Mode offers poor memory usage as it involves duplication of data (first copy is the data in the underlying dimensional tables and the second copy is the data in the Dimension in the OLAP Server).
Using this Storage Mode will result in best query performance compared to any other Storage Modes available in SSAS.

What is the difference between Attribute Hierarchy and User Defined Hierarchy?

An Attribute Hierarchy is a Hierarchy created by SQL Server Analysis Services for every Attribute in a Dimension by default. An Attribute by default contains only two levels - An "All" level and a "Detail" level which is nothing but the Dimension Members.

A User Defined Hierarchy is a Hierarchy defined explicitly by the user/developer and often contains multiple levels. For example, a Calendar Hierarchy contains Year, Quarter, Month, and Date as its levels.

Here are some of the highlights/differences of Attribute and User Defined Hierarchies:

Attribute Hierarchies are always Two-Level (Unless All Level is suppressed) whereas User Defined Hierarchies are often Multi-Level.
By default, Every Attribute in a Dimension has an Attribute Hierarchy whereas User Defined Hierarchies have to be explicitly defined by the user/developer.
Every Dimension has at least one Attribute Hierarchy by default whereas every Dimension does not necessarily contain a User Defined Hierarchy. In essence, a Dimension can contain zero, one, or more User Defined Hierarchies.
Attribute Hierarchies can be enabled or disabled. Disable the Attribute Hierarchy for those attributes which are commonly not used to slice and dice the data during analysis, like Address, Phone Number, and Unit Price etc. Doing this will improve the cube processing performance and also reduces the size of the cube as those attributes are not considered for performing aggregations.
Attribute Hierarchies can be made visible or hidden. When an Attribute Hierarchy is hidden, it will not be visible to the client application while browsing the Dimension/Cube. Attribute Hierarchies for those attributes which are part of the User Defined Hierarchies, like Day, Month, Quarter, and Year, which are part of the Calendar Hierarchy, can be hidden, since the attribute is available to the end users through the User Defined Hierarchy and helps eliminate the confusion/redundancy for end users.

What is an Attribute Relationship? What are the different types of Attribute Relationships?

An Attribute Relationship is a relationship between various attributes within a Dimension. By default, every Attribute in a Dimension is related to the Key Attribute. Quite often these default Attribute Relationships need to be modified to suit the User Defined Hierarchies and other end user requirements.

There are basically two types of Attribute Relationships:

Rigid: Attribute Relationship should be set to Rigid when the relationship between those attributes is not going to change over time. For example, relationship between a Month and a Date is Rigid since a particular Date always belongs to a particular Month like 1st Feb 2012 always belongs to Feb Month of 2012. Try to set the relationship to Rigid wherever possible.
Flexible: Attribute Relationship should be set to Flexible when the relationship between those attributes is going to change over time. For example, relationship between an Employee and a Manager is Flexible since a particular Employee might work under one manager during this year (time period) and under a different manager during next year (another time period).

What are KeyColumns and NameColumn properties of an Attribute? What is the different between them?

KeyColumns is a property of an SSAS Dimension Attribute and it forms the Key (Unique) for the attribute. It can be bound to one or more columns in the underlying database table. When User Defined Hierarchies are created in the dimension (Attribute Relationships defined), setting this property becomes very critical and often requires setting this to a combination of more than one column from the Data Source View. For Example, say you have a Date Dimension and a hierarchy called Calendar Hierarchy (Year -> Quarter -> Month). Now what happens is that, Month gets repeated across different quarters and quarters get repeated across different years making the attribute as non-unique (like January can belong to Q1 of any year and similar Q1 can belong to any year). So to make the attribute unique, KeyColumns for Month should be set to something like Year and Month and similarly for Quarter should be set to Year and Quarter.

A NameColumn is a property of an SSAS Dimension Attribute and it is used to identify the column from the underlying Data Source View which provides the name of the attribute which is displayed to the end user by making it more user friendly instead of displaying the Key Column value. For Example, you might have ProductCategoryKey as 1, 2, 3, & 4, and ProductCategoryName as Bikes, Components, Clothing, & Accessories respectively. Now, NameColumn will be set to ProductCategoryName so that user sees them as Bikes, Components etc. even though the data in the background is processed/retrieved using the Key Column values as 1, 2 etc.

Here are some of the highlights/differences of KeyColumns and NameColumn properties:

KeyColumns property is defaulted to the Attribute itself, and the NameColumn property is defaulted to Key Column (when the KeyColumns is set to only one column).
Column(s) provided in the KeyColumns should be able to uniquely identify all the values of the respective attribute, whereas NameColumn need not be unique.
KeyColumns can contain one or more columns whereas NameColumn can contain only one column.

What is an Unknown Member? What is its significance?

An Unknown Member is a built-in member provided by SQL Server Analysis Services. It represents a Missing or Null value. Basically when a Dimension is Processed, Analysis Services populates each of the attributes with distinct values from the underlying data source and in this process, if it encounters Null value then it converts them appropriately (to 0 in case of numeric columns and to empty string in case of string columns) and marks them as Unknown Member for easy interpretation by the end user.

One of the important uses of an Unknown Member is to handle Early Arriving Facts. This is a scenario, in which the transactional record (Fact) would have come into the system but the corresponding Dimension/Contextual data is yet to come in which could be due to various reasons like Improperly Designed Data Load Process, Failure in the ETL Process, and a Technical Glitch in the transactional system causing delay in pushing the Dimension Data.

Unknown Member can be enabled or disabled (set to None) or set to Visible or Hidden based on the end user requirements.

What are Dimension Translations? In what scenarios do you use them?

Translation in SSAS is a mechanism to support Localization, in which the labels, names, and captions associated with any SSAS Object (pretty much every SSAS Object supports Translations) are translated from one language to another language based on the country/language of the user accessing the data. A Dimension Translation is same as Translation in SSAS, but in this case the labels, names, and captions associated with Dimensions, Attributes, Hierarchies, and/or Dimension Members are translated from one language to another language.

Translations are very useful in achieving higher level of adoption of the BI/Analytics system (SSAS). This will eliminate the language barriers among users from different locations/languages and presents the same information in different languages making single version of truth available to users across different geographical locations.

Here are some of the highlights of Dimension Translations:

There can be multiple Translations associated with a single Dimension, Attribute, Hierarchy, and Dimension Member etc.
The collation and language settings/information from the client application/computer is used to determine and provide the analysis services metadata/data to the client application.

What is SQL Server Analysis Services (SSAS)?

SQL Server Analysis Services (SSAS) is the On-Line Analytical Processing (OLAP) Component of SQL Server. SSAS allows you to build multidimensional structures called Cubes to pre-calculate and store complex aggregations, and also to build mining models to perform data analysis to identify valuable information like trends, patterns, relationships etc. within the data using Data Mining capabilities of SSAS, which otherwise could be really difficult to determine without Data Mining capabilities.
SSAS comes bundled with SQL Server and you get to choose whether or not to install this component as part of the SQL Server Installation.

What is OLAP? How is it different from OLTP?

OLAP stands for On-Line Analytical Processing. It is a capability or a set of tools which enables the end users to easily and effectively access the data warehouse data using a wide range of tools like Microsoft Excel, Reporting Services, and many other 3rd party business intelligence tools.
OLAP is used for analysis purposes to support day-to-day business decisions and is characterized by less frequent data updates and contains historical data. Whereas, OLTP (On-Line Transactional Processing) is used to support day-to-day business operations and is characterized by frequent data updates and contains the most recent data along with limited historical data based on the retention policy driven by business needs.

What is cube:

A Cube acts as an OLAP database to the subscribers who need to query data from an OLAP data store. A Cube is the main object of a SSAS solution where the majority of fine tuning, calculations, aggregation design, storage design, defining relationship and a lot of other configurations are developed. We will create a cube using our dimension and fact tables.

What is a Data Source? What are the different data sources supported by SSAS?

A Data Source contains the connection information used by SSAS to connect to the underlying database to load the data into SSAS during processing. A Data Source primarily contains the following information (apart from various other properties like Query timeout, Isolation etc.):

Provider
Server Name
Database Name
Impersonation Information

SSAS Supports both .Net and OLE DB Providers. Following are some of the major sources supported by SSAS: SQL Server, MS Access, Oracle, Teradata, IBM DB2, and other relational databases with the appropriate OLE DB provider.

What is Impersonation? What are the different impersonation options available in SSAS?

Impersonation allows SSAS to assume the identity/security context of the client application which is used by SSAS to perform the server side data operations like data access, processing etc. As part of impersonation, the following options are available in SSAS:

Use a specific Windows user name and password: This option lets you to specify Windows account credentials which will be used by SSAS to perform operations like source data access, processing etc.
Use the service account: When this option is selected, SSAS uses the credentials of the service account under which the Analysis Services service is configured/running for source data access, processing etc.
Use the credentials of the current user: When this option is set, SSAS uses the credentials of the current user for performing operations like DMX Open Queries, Local cubes etc. This option cannot be used for performing server side operations like source data access, processing etc.
Inherit: This option let's the SSAS server decide which impersonation mode is suitable for each type of operation. When this option is set, by default SSAS will use the service account for operations like processing and the credentials of the current user for operations like Local cubes, querying the data mining models, etc...

What is a Data Source View?

A Data Source View (DSV) is a logical view of the underlying database schema and offers a layer of abstraction for the underlying database schema. This layer acts as a source for SSAS and captures the schema related information from the underlying database. The schematic information present in DSV includes the following:

Underlying database Table(s)/View(s) metadata
Primary Key & Foreign Key relationships between the underlying database Table(s)
Additional columns in the form of Named Calculations
Complex logic on the underlying Table(s)/View(s) in the form of Named Queries

SSAS can only see the schematic information present in the DSV and it cannot see the schematic information from the underlying database.

What is a Named Calculation? In what scenarios do you use it?

A Named Calculation is a new column added to a Table in DSV and is based on an expression. This capability allows you to add an extra column into your DSV which is based on one or more columns from underlying data source Table(s)/View(s) combined using an expression without requiring the addition of a physical column in the underlying database Table(s)/View(s).
The expression used in the Named Calculation should conform to the underlying data source dialect. For example, if the underlying data source is SQL Server, then it should conform to T-SQL, If is it Oracle, then it should conform to PL/SQL, etc...
Named Calculations can be used in many scenarios, following are some of the common scenarios:

For creating Derived Columns. Say you have First Name and Last Name in the underlying data source Table/View and you want to get the Full Name as "First Name + space + Last Name". Such things can be added as a Named Calculation.
For performing Lookup Operations. Say you have an Employee table which has AddressID and an Address table in which AddressID is the Primary Key. Now, to get the address information (say Country) into the Employee table in DSV, a Named Calculation can be added to the Employee table with the following expression:

(

SELECT Country

FROM Address

WHERE AddressID = Employee.AddressID

)

What is a Named Query? In what scenarios do you use it?

A Named Query is a SQL query/expression in your DSV which acts as a Table. It is used to combine data from one or more Table(s)/View(s) from the underlying data source without requiring any schematic changes to the underlying data source Table(s)/View(s).
The SQL Query used in the Named Query should conform to the underlying data source dialect. For example, if the underlying data source is SQL Server, then it should conform to T-SQL, If is it Oracle, then it should conform to PL/SQL, etc...
Named Queries are used in various scenarios, following are some of the common scenarios:

Combining data from multiple Tables/Views from the underlying data source by using either simple or complex join operations.
Adding filter conditions for filtering out unwanted data or selecting the required data (limiting the data).

Pretty much everything that can be done using a Named Calculation can also be done using a Named Query.

What are the pros and cons of using Tables and Named Queries in DSV?

Following are some of the pros and cons of using Tables and Named Queries in DSV. Tables in the below comparison refers to the Table in DSV which references a single Table or a View in the underlying source database.

Tables	Named Queries
Named Calculations can be added to Tables in DSV.	Named Calculations cannot be added to Named Queries in DSV.
Keys and Relationships are automatically set (by the wizard) based on the Keys and Relationships in the underlying database schema.	Keys and Relationships have to be set explicitly in the DSV.
Only one Table/View from the underlying data source can be referenced in DSV.	More than one Table/View from the underlying data source can be referenced using a SQL Expression in the DSV.
Any filter/limiting conditions cannot be applied on a table in DSV.	Filter/limiting conditions can be applied as part of the SQL expression in the Named Query in the DSV.

Although Named Calculations and Named Queries can be used to extend the functionality of SSAS to address the evolving business needs, it is always a good practice to first build a good Dimensional Model in the beginning of a Data Warehousing/SSAS project.

What is the purpose of setting Logical Keys and Relationships in DSV?

Many of the user interfaces/designers/wizards in BIDS which are part of a SSAS project depend on the Primary Key and Relationships between Fact and Dimension tables. Hence it is important to define the Primary Key and Relationships in DSV.
By default, the Data Source View Wizard detects the Physical Primary Keys and Relationships between the tables in the underlying source database and applies the same Keys and Relationships in DSV layer. However, Logical Keys and Relationships need to be defined explicitly in the following scenarios:

If the DSV table is referring to an underlying database View.
If the DSV table is created as a Named Query.
If any additional relationships need to be defined in the DSV layer apart from the ones that are physically defined in the underlying source database.

Is it possible to combine data from multiple data sources in SSAS? If yes, how do you accomplish it?

SSAS allows combining data from multiple underlying data sources into a single DSV. To be able to add Table(s)/View(s) from multiple data sources, first you need to create a DSV using your first source and this source acts as the primary data source. Now after the initial DSV is created, you can add one or more data sources into DSV which will act as secondary data sources and you can choose additional Table(s)/View(s) from the secondary data sources which you want to include in your DSV.
The key thing while combining data from multiple data sources is that the Primary Data Source must support OPENROWSET queries. Hence in most cases, SQL Server is used as the Primary Data Source.

What is a SQL Server Analysis Services Dimension?

A Dimension represents a set of contextual data about the transactional data that is stored in the fact table(s). It basically forms the story line around the data by providing context and thereby helping the users to understand and make sense out of the data.

Example: Let us say that there is a retail chain and the CFO of the chain announces that the "Total Sales is $10,00,000". In this case, $10,00,000 is merely a number and does not provide any valuable information and does not make any sense to the user.

Now let us say that the CFO of the chain makes a slight change in the above announcement and says "Total Sales for FY 2012 is $10,00,000". This makes some sense and provides some amount of context which in this case is sales amount mentioned is for Financial Year 2012 (FY2012). FY2012 represents a date (time) component and hence it represents a Date (Time) Dimension.

Similarly, we can go on and add additional Dimensions to this data (Fact Table/Cube) to provide more context about the data (in this scenario $10,00,000 is the data we are talking about). Say we change the above statement to something like "Total Sales of Facial Tissues for FY 2012 in California is $10,00,000". This gives much better context compared to the previous two statements. In this statement, Facial Tissues represents a Product which introduces us to Product Dimension and California represents a Geographical Location which introduces us to Geography Dimension. In this manner, we can add as many dimensions as we want, to give better context to the data, so that end users can analyze the data from different dimensions and take more effective decisions.

Here are some highlights of Dimension(s):

It represents contextual information which adds context/meaning to the data being analyzed.
Helps in viewing/analyzing the data from different dimensions/angles to get a better understanding of the data.
A dimension is organized in the form of Attributes and Hierarchies.
Here are some examples of a Dimension:

Time Dimension
Product Dimension
Geography Dimension
Customer Dimension
Employee Dimension

Here are some examples of a Dimension Attribute

Year, Quarter, Month etc. in case of a Time Dimension
Color, Size etc. in case of a Product Dimension

What are Natural Key and Surrogate Key in SQL Server Analysis Services?

A Natural Key is a type of key in a table which uniquely identifies each record and has a business meaning attached to it.

Example: Incident/Ticket Number, SSN, Employee Email Alias, etc.

Here are some highlights of a Natural Key:

This can be numeric, string, a combination of both etc. depending on the type of data you are dealing with. In case of non-numeric data, JOINs are slower and occupies more space than an integer.
History cannot be maintained if used as a Primary Key.
Merging data from different source systems can be difficult when there is a conflict in the type of value coming from each source system.

A Surrogate Key is a type of key in a table which uniquely identifies each record, but has no business meaning attached to it. It is merely a value used to uniquely identify a record in a table.

Example: Identity Columns, GUID (Unique identifier), etc.

Here are some highlights of a Surrogate Key:

Most of the times this is a numeric value (like INT, BIGINT, etc.) and hence the JOINs are faster and occupies less space than character based data.
Allows for maintenance of history data (SCD Type 2) when used as a Primary Key.
Data coming from multiple source systems can be easily integrated (especially in the scenarios like Mergers, Acquisitions etc. in the industry).

What is a Hierarchy in SQL Server Analysis Services? What are the different types of Hierarchies? Explain each one of them.

A Hierarchy is a collection of one or more related Attributes which are organized in a Parent-Child fashion. An Attribute at a higher level is a parent of an Attribute at the next level and so on. They provide a very convenient way to perform roll up/drill down analysis and helps in rolling up and drilling down the numeric data in a very effective manner.

In analogy, it is like your Bing maps. If you want to locate a particular place and if you want to do it manually (without doing a text search) then you usually first locate the Country, then locate the State within the Country, after that locate the City within the State, and finally locate the place you are looking for within the City.

Example: A Calendar Hierarchy might contain Attributes like Year, Quarter, Month, and Day which are organized as a Hierarchy with Year as a parent of Quarter, Quarter as a parent of Month, and Month as a parent of Day.

Here are the common types of Hierarchies:

Natural Hierarchy: A Natural Hierarchy is a Hierarchy in which Every Child has a Single Parent. Example: A Calendar Hierarchy.
Balanced Hierarchy: A Balanced Hierarchy is a Hierarchy in which no matter through what path we traverse the Hierarchy, there is a Member at every level and every path has the same number of Levels. Example: A Calendar Hierarchy.
Unbalanced Hierarchy: An Unbalanced Hierarchy is a Hierarchy in which number of members and number of Levels varies across different branches of the Hierarchy. In this type of a Hierarchy, Leaf Level Members might belong to different Levels. Example: An Employee/Organization Hierarchy where in say 1 manager (Manager 1) has 2 or more people reporting to him and who belong to different Levels say L1 & L2. On the other hand say there is another manager (at the same level as Manager 1) who has 2 or more people reporting to him and all of them belonging to the same Level say L1.
Ragged Hierarchy: A Ragged Hierarchy is a Hierarchy in which irrespective of the path you use to traverse, every path has the same number of Levels but not every level is guaranteed to have members except for the Top most Level and Bottom most Level (Leaf Level). Example: A Geography Hierarchy.

What is a Slowly Changing Dimension in SQL Server Analysis Services? What are the different types of Slowly Changing Dimensions?

The Slowly Changing Dimension (SCD) concept is basically about how the data modifications are absorbed and maintained in a Dimension Table. In an SCD the contents/members change over a period of time. Apart from the existing members which are modified, new records are added to the dimension similar to any other dimension.

Here are the major 3 types of Slowly Changing Dimensions:

Type 1: In this type of SCD, modifications to the dimension members are absorbed by overwriting the existing member data present in the Dimension Table. Example: Say you have an Address Dimension which contains addresses of all the employees in an organization. When the address of an employee changes, then the old data (address) in the Dimension is overwritten with the new data (address) and there is no way to get the old address from the table.
Type 2: In this type of SCD, modifications to the dimension members are absorbed by adding it as a new record into the Dimension Table. Meaning, the history of all the changes happening to the dimension members is maintained by creating a new record each time a dimension member is modified. The new (modified) record and the old record(s) are identified using some kind of a flag like say IsActive, IsDeleted etc. or using Start and End Date fields to indicate the validity of the record. New dimension members are inserted into the Dimension Table with appropriate flag value which indicates that this is the active/current record. Unlimited amount of history can be maintained in this type of SCD. Example: Considering the above example of an Address Dimension. When the address of an employee changes, then the old data (address) in the Dimension Table is retained as it is and the flag is updated to indicate that this is a historical record. The modified data (record) is inserted as a new record into the table and the flag is updated to indicate that this is the current record.
Type 3: In this type of SCD, modifications to the dimension members are absorbed with the help of additional columns in the Dimension Table. In this SCD type, the amount of history data (number of modifications) that can be maintained is limited by the number of columns (fixed) in the Dimension Table, which are used for tracking the history details. This is decided at the time of designing the Data Mart. Example: Considering the above example of an Address Dimension. When the address of an employee changes, then the old data (address) in the Dimension is retained as it is and the new address is updated in a separate column to indicate that this is the latest address.

Apart from the above listed 3 types of SCDs, there are other types like Type 0, Type 4, Type 5, & Type 6 etc. which are used rarely.

What is a Parent-Child Dimension in SQL Server Analysis Services? Explain with an example.

A Parent-Child Dimension is a Dimension in which two attributes in the same dimension are related to each other and they together define the linear relationship among the dimension members. The first attribute which uniquely identifies each dimension member is called the Member Key Attribute and the second attribute which identifies the parent of a dimension member is called a Parent Key Attribute.

Example: Consider an Employee Dimension which has EmployeeID as a Primary/Unique Key which uniquely identifies each Employee in the organization. The same Employee Dimension contains another attribute as ManagerID which identifies the Manager of an Employee and ManagerID contains one of the values present in the EmployeeID since Manager is also an Employee, hence he/she also has a record for self in the Employee Table. In this scenario, EmployeeID is the Member Key Column and ManagerID is the Parent Key Column.

Here are some highlights of a Parent-Child Dimension:

Both the Parent and the Child members exist in the same dimension.
There is a Parent-Child relationship between different members (rows) of the same dimension.
Parent Key Column in the dimension table is a Foreign Key Column (Can be physical or logical) which refers to the Member Key Column which is a Primary/Unique Key Column in the same dimension table.

What is a Role-Playing Dimension in SQL Server Analysis Services? Explain with an example.

A Role-Playing Dimension is a Dimension which is connected to the same Fact Table multiple times using different Foreign Keys. This helps the users to visualize the same cube data in different contexts/angles to get a better understanding and make better decisions.

Example: Consider a Time Dimension which is joined to the same Fact Table (Say FactSales) multiple times, each time using a different Foreign Key in the Fact Table like Order Date, Due Date, Ship Date, Delivery Date, etc. Essentially there is only one single physical dimension called Date Dimension. However, it is joined multiple times to the Fact Table to help the users to visualize the cube data in the context of different dates.

Here are some highlights of a Role-Playing Dimension:

It is a single physical Dimension Table.
Same Dimension Table connects to the same Fact Table multiple times using different Foreign Keys from the Fact Table.
When a Role-Playing Dimension is added to the cube, it appears as a different dimension (one instance for each Foreign Key to which it is joined) to the end users and hence playing multiple roles.

What is a Conformed Dimension in SQL Server Analysis Services? Explain with an example.

A Conformed Dimension is a Dimension which connects to multiple Fact Tables across one or more Data Marts (cubes). A Confirmed Dimension is physically implemented across multiple Data Marts with exactly the same structure, attributes, values (dimension members), meaning and definition. Each instance means exactly the same thing providing the exact same context irrespective of the Fact Table to which it is connected or the Data Mart to which it belongs.

Example: A Date Dimension has exactly the same set of attributes, same members and same meaning irrespective of which Fact Table it is connected to or to which Data Mart it belongs to as long as it belongs to the same organization. For instance, a Fiscal Calendar is exactly the same with same start and end dates across all the departments within an organization.

Here are some highlights of Conformed Dimensions:

Same Dimension joins to multiple Fact Tables or is used across multiple Data Marts.
It is a master dimension and is used across multiple dimensional models.
Each instance of Conformed Dimensions are exactly the same in every aspect including the attribute names, definitions, etc.

What is a Degenerate Dimension in SQL Server Analysis Services? In what scenarios do you use it?

A Degenerate Dimension is a Dimension which is derived out of a Fact Table and it appears to the end user as a separate/distinct Dimension, its data is actually stored in the Fact Table. It's a Dimension table which does not have an underlying physical table of its own.

Degenerate Dimensions are commonly used when the Fact Table contains/represents Transactional data like Order Details, etc. and each Order has an Order Number associated with it, which forms the unique value in the Degenerate Dimension.

Example: Degenerate Dimensions having unique Order Numbers can be used to identify the various items sold as part of a particular order.

Here are some highlights of Degenerate Dimension:

It is derived from the Fact Table and does not have an underlying physical Dimension Table of its own.
It is also called as a Fact Dimension.
Since these dimensions are built on top of Fact Table, these are usually very large dimensions.
The attribute of a Degenerate Dimension is not a Foreign Key in the Fact Table.

What is a Junk Dimension in SQL Server Analysis Services? In what scenarios do you use it?

A Junk Dimension is often a collection of Unrelated Attributes like indicators, flags, codes, etc. This Dimension usually contains data which cannot be created as a separate Dimension as they tend to be too small (often) and are not worth having a separate Dimension.

These Dimensions can be used in various scenarios, but one of the common scenarios is when a Fact Table contains a lot of Attributes which are like indicators, flags, etc. Using Junk Dimensions, such Attributes can be removed/cleaned up from a Fact Table.

Example: It contains values like Yes/No, Pending/In Progress/Completed, Open/Resolved/Closed, etc.

Here are some highlights of a Junk Dimension:

It is also called as a Garbage Dimension.
Junk Dimensions are usually small in size.
It is a convenient way to consolidate many Smaller Dimensions which are not really worth having as a separate Dimension.

What are Database Dimension and Cube Dimension? What is the difference between them?

A Cube Dimension is an Instance of a Database Dimension. An analogy, this is pretty similar to an Object (Instance of a Class) in a Programming Language.

Here are some of the highlights/differences of Database and Cube Dimensions:

A Database Dimension is independent of a Cube, but a Cube Dimension is always a part of a Cube.
One Database Dimension can be added to a Cube more than once (Role-Playing Dimension) as different instances. In this scenario, each Instance is treated as a separate Cube Dimension.
A Database Dimension can be added to more than one Cube, whereas a Cube Dimension can belong to a Single Cube at any given point.

What is a Linked Dimension? In what scenarios do you use it?

Linked Dimensions can be used when the exact same dimension can be used across multiple Cubes within an Organization like a Time Dimension, Geography Dimension etc.

Here are some of the highlights of a Linked Dimension:

More than one Linked Dimension can be created from a Single Database Dimension.
These can be used to implement the concept of Conformed Dimensions.
For an end user, a Linked Dimension appears like any other Dimension.

What are the different ways to create a Time Dimension in Analysis Services?

Create Using an Existing Table: This is one of the common approaches for building a Time Dimension. In this approach, a table is created in the underlying data source and pre-populated with data and it is then used to create the Time Dimension in SSAS.
Generate a Time Table in the Data Source: In this approach there will be no underlying table and at the time of creation of a Time Dimension in SSAS, Analysis Services creates a table in the underlying data source (relational database) and populates it with the required data using the date range, attributes, and calendars etc. which are specified at the time of creation of Time Dimension in Business Intelligence Development Studio. This option requires permissions to create a table in the underlying data source.
Generate a Time Table on the Server: In this approach there will be no underlying table and at the time of creation of a Time Dimension in SSAS, Analysis Services creates a table on the server and this table is used to generate the Time Dimension by the wizard. The Dimension created using this approach is called a Server Time Dimension.

What is Type property of a Dimension? What is the purpose of setting this property?

There are basically two important uses of setting this property:

This property is used by the Business Intelligence Wizard to assign/generate standard classification/MDXexpressions.
This property is used by the Analysis Services Client Applications like Excel etc. to adjust the user interface/rendering format appropriately. For example, when a dimension is added to a Pivot Table in Excel, certain types of dimensions are automatically added across the columns axis (like Time Dimension) and a few other types of dimensions are added to the rows axis (Like Product Dimension) automatically.

What is a Storage Mode? What are the different storage modes applicable to Dimensions?

A Storage Mode defines the location in which the Dimension data will be stored and the format (Relational or Multidimensional) in which the data will be stored.

Following are the two Storage Modes supported by Dimensions:

ROLAP (Relational On-Line Analytical Processing):

When the Storage Mode is set to ROLAP for a Dimension, then the Dimension Data will be stored in the relational database tables.
This storage Mode offers effective memory usage by avoiding duplication of data compared to MOLAP Storage Mode.
Using this Storage Mode will result in a slowdown in the query performance compared to MOLAP Storage Mode.

MOLAP (Multidimensional On-Line Analytical Processing):

When the Storage Mode is set to MOLAP for a Dimension, then the Dimension Data will be stored in a multidimensional format in the Analysis Services/OLAP Server.
This Storage Mode offers poor memory usage as it involves duplication of data (first copy is the data in the underlying dimensional tables and the second copy is the data in the Dimension in the OLAP Server).
Using this Storage Mode will result in best query performance compared to any other Storage Modes available in SSAS.

What is the difference between Attribute Hierarchy and User Defined Hierarchy?

Here are some of the highlights/differences of Attribute and User Defined Hierarchies:

Attribute Hierarchies are always Two-Level (Unless All Level is suppressed) whereas User Defined Hierarchies are often Multi-Level.
By default, Every Attribute in a Dimension has an Attribute Hierarchy whereas User Defined Hierarchies have to be explicitly defined by the user/developer.
Every Dimension has at least one Attribute Hierarchy by default whereas every Dimension does not necessarily contain a User Defined Hierarchy. In essence, a Dimension can contain zero, one, or more User Defined Hierarchies.
Attribute Hierarchies can be enabled or disabled. Disable the Attribute Hierarchy for those attributes which are commonly not used to slice and dice the data during analysis, like Address, Phone Number, and Unit Price etc. Doing this will improve the cube processing performance and also reduces the size of the cube as those attributes are not considered for performing aggregations.
Attribute Hierarchies can be made visible or hidden. When an Attribute Hierarchy is hidden, it will not be visible to the client application while browsing the Dimension/Cube. Attribute Hierarchies for those attributes which are part of the User Defined Hierarchies, like Day, Month, Quarter, and Year, which are part of the Calendar Hierarchy, can be hidden, since the attribute is available to the end users through the User Defined Hierarchy and helps eliminate the confusion/redundancy for end users.

What is an Attribute Relationship? What are the different types of Attribute Relationships?

There are basically two types of Attribute Relationships:

Rigid: Attribute Relationship should be set to Rigid when the relationship between those attributes is not going to change over time. For example, relationship between a Month and a Date is Rigid since a particular Date always belongs to a particular Month like 1st Feb 2012 always belongs to Feb Month of 2012. Try to set the relationship to Rigid wherever possible.
Flexible: Attribute Relationship should be set to Flexible when the relationship between those attributes is going to change over time. For example, relationship between an Employee and a Manager is Flexible since a particular Employee might work under one manager during this year (time period) and under a different manager during next year (another time period).

What are KeyColumns and NameColumn properties of an Attribute? What is the different between them?

Here are some of the highlights/differences of KeyColumns and NameColumn properties:

KeyColumns property is defaulted to the Attribute itself, and the NameColumn property is defaulted to Key Column (when the KeyColumns is set to only one column).
Column(s) provided in the KeyColumns should be able to uniquely identify all the values of the respective attribute, whereas NameColumn need not be unique.
KeyColumns can contain one or more columns whereas NameColumn can contain only one column.

What is an Unknown Member? What is its significance?

Unknown Member can be enabled or disabled (set to None) or set to Visible or Hidden based on the end user requirements.

What are Dimension Translations? In what scenarios do you use them?

Here are some of the highlights of Dimension Translations:

There can be multiple Translations associated with a single Dimension, Attribute, Hierarchy, and Dimension Member etc.
The collation and language settings/information from the client application/computer is used to determine and provide the analysis services metadata/data to the client application.

What are Measures and Measure Groups? What is the difference between them?

A Measure is any numeric quantity/value that represents a metric aligned to an organization's goals. This is the value which the business users are interested in, and are interested in viewing these values from different angles and different granularity levels. A measure is also commonly called a fact. The term "measures" and "facts" are used interchangeably.

A Measure Group is a collection/group of measures which belong to the same underlying fact table. In SSAS, typically each Measure Group is tied to each one of the underlying fact tables.

A Measure is single numeric value whereas a Measure Group is a collection of measures.

What are the different types of Measures? Explain each one of them with an example.

Below are the most common types of measures/facts:

Fully Additive Facts: These are facts which can be added across all the associated dimensions. For example, sales amount is a fact which can be summed across different dimensions like customer, geography, date, product, and so on.
Semi-Additive Facts: These are facts which can be added across only few dimensions rather than all dimensions. For example, bank balance is a fact which can be summed across the customer dimension (i.e. the total balance of all the customers in a bank at the end of a particular quarter). However, the same fact cannot be added across the date dimension (i.e. the total balance at the end of quarter 1 is $X million and $Y million at the end of quarter 2, so at the end of quarter 2, the total balance is only $Y million and not $X+$Y).
Non-Additive Facts: These are facts which cannot be added across any of the dimensions in the cube. For example, profit margin is a fact which cannot be added across any of the dimensions. For example, if product P1 has a 10% profit and product P2 has a 10% profit then your net profit is still 10% and not 20%. We cannot add profit margins across product dimensions. Similarly, if your profit margin is 10% on Day1 and 10% on Day2, then your net Profit Margin at the end of Day2 is still 10% and not 20%.
Derived Facts: Derived facts are the facts which are calculated from one or more base facts, often by applying additional criteria. Often these are not stored in the cube and are calculated on the fly at the time of accessing them. For example, profit margin.
Factless Facts: A factless fact table is one which only has references (Foreign Keys) to the dimensions and it does not contain any measures. These types of fact tables are often used to capture events (valid transactions without a net change in a measure value). For example, a balance enquiry at an automated teller machine (ATM). Though there is no change in the account balance, this transaction is still important for analysis purposes.
Textual Facts: Textual facts refer to the textual data present in the fact table, which is not measurable (non-additive), but is important for analysis purposes. For example, codes (i.e. product codes), flags (i.e. status flag), etc.

What is the purpose of Dimension Usage settings? Explain different types of relationships between Facts and Dimensions.

The Dimension Usage tab in the Cube Designer in SQL Server Business Intelligence Development Studio defines the relationship between a Cube Dimension and a Measure Group (s). A Dimension which is related to one of more Measure Groups, directly/indirectly, is called as a Cube Dimension. A Cube Dimension is an instance of a database Dimension as explained in the previous tip.

Following are the four different types of relationships between a Cube Dimension and a Measure Group:

Regular: In a Regular relationship, primary key column of a dimension is directly connected to the fact table. This type of relationship is similar to the relationship between a dimension and a fact in a Star Schema, and it can be based on either the physical primary key-foreign key relationship in the underlying relational database or the logical primary key-foreign key relationship defined in the Data Source View.
Referenced: In a Referenced relationship, primary key columns of a dimension is indirectly connected to the fact table through a key column in the intermediate dimension table. This type of relationship is similar to the indirect relationship between a dimension and a fact, through an intermediate dimension, in a Snowflake Schema.
Fact: In a Fact relationship, the dimension table and the fact table are one and the same. Basically a Fact Dimension or Degenerate Dimensionis created using one or more columns from the fact table and this degenerate dimension is used while defining/establishing the relationship in case of a fact relationship.
Many-to-Many: In a Many-to-Many relationship, a dimension is indirectly connected to a Measure Group through an intermediate fact table which joins with the dimension table. It is analogous to a scenario, where one project can have multiple project managers and one project manager can manage multiple projects.

What are Calculated Members? How do they differ from Measures?

Calculated Members are members of a measure group and are defined based on a combination of one or more base measures, arithmetic/conditional operators, numeric values, and functions, etc. For example, profit is a calculated member/calculate measure, which is defined based on various base measures like selling price, cost, price, tax amount, freight amount, etc.

The value of a measure (base measure) is stored in a cube as part of the cube processing process. Whereas the value of a calculated member/measure is calculated on the fly in response to a user request and only the definition is stored in the cube.

What are Named Sets? What are the two types of Named Sets?

A Named Set is a set of dimension members (usually a subset of dimension members) and is defined using MDX (a Multidimensional Expression). Often Named Sets are defined for improved usability by the end users and client applications. Apart from that, they can also be used for various calculations at the cube level. Similar to calculated members/measures, named sets are defined using a combination of cube/dimension data, arithmetic operators, numeric values, functions, etc. Some of the examples of Named Sets are top 50 customers, top 10 products, top 5 students, etc.

Named Sets are of two types: Static Named Sets and Dynamic Named Sets.

Static Named Sets, when defined in cube, are evaluated during cube processing process. Dynamic Named Sets are evaluated each time the query is invoked by the user.

What are KPIs? What are the different properties associated with a KPI?

KPI stands for Key Performance Indicator. A KPI is a measure of an organization's performance in a pre-defined area of interest. KPIs are defined to align with the pre-defined organizational goals and help the business decision makers gain insights into their business performance.

Often KPIs have the following five commonly used properties:

Name: Indicates the name of the Key Performance Indicator.
Actual/Value: Indicates the actual value of a measure pre-defined to align with organizational goals.
Target/Goal: Indicates the target value (i.e. goal) of a measure pre-defined to align with organizational goals.
Status: It is a numeric value and indicates the status of the KPI like performance is better than expected, performance is as expected, performance is not as expected, performance is much lower than expected, etc.
Trend: It is a numeric value and indicates the KPIs trend like performance is constant over a period of time, performance is improving over a period of time, performance is degrading over a period of time, etc.

Apart from the above listed properties, most of the times, KPIs contain the following two optional properties:

Status Indicator: It is a graphical Indicator used to visually display the status of a KPI. Usually colors like red, yellow, and green are used or even other graphics like smiley or unhappy faces.
Trend Indicator: It is a graphical indicator used to visually display the trend of a KPI. Usually up arrow, right arrow, and down arrow are used.

What are Actions in SSAS? What are the different types of Actions in SQL Server Analysis Services?

Actions in SSAS allow us to extend the cube functionality and enable the users to interact with the cube. An Action in simple terms is basically an event, which can be initiated by a user/application and it can take various forms depending upon the type of Action defined.

Actions are primarily of following three types:

Drillthrough Actions: A Drillthrough Actionretrieves the detail level information associated with the cube data based on which the Drillthrough Action is defined.
Reporting Actions: A Reporting Action retrieves an SSRS report which is associated with the cube data. The command which invokes the SSRS report contains the report URL along with the report parameters.
Standard Actions: A Standard Action retrieves the action element associated with the cube data. Standard actions are further categorized into 5 different subcategories and the action element varies for each of these subcategories. The following are the types of Standard Actions:

Dataset Action: Returns a dataset to the client application and the action content is an MDX expression.
Proprietary Action: Performs an operation as defined by the client application. The action content for this type of action is specific to the calling client application and the client application is responsible for interpreting the meaning of the Action.
Rowset Action: A Rowset Action returns a Rowset to the client application. The action content is a command to retrieve the data.
Statement Action: The action content for this type of Action is an OLE DB command and it returns a command string to the client application.
URL Action: The Action Content for this type of action is an URL and it returns a URL to the client application which can be opened usually in a web browser. This is the default action.

What are partitions in cubes? How do they different from table partitions at a SQL Server database level?

A partition is physical storage space which contains either all or a portion of measure group data. Each measure group in SSAS has one partition by default.

A partition can be either bound to a table in the underlying relational database or a query pointing to the table(s) in the underlying database and has filters in it.

In terms of storage, cube partitions in SSAS and table partitions in a database are similar. Both these types of partitions are used to improve the performance. However, partitions in SSAS offer additional benefits including:

Each partition can be processed separately (i.e. a measure group can be split across multiple partitions, for example one partition for each year). Only the partitions in which data has been modified can be processed thereby improving the processing time of the cube.
Partitions provide improved manageability by allowing us to define storage mode, aggregation design, etc. at the partition level and these settings can vary between different partitions belonging to the same measure group.

What are the different Storage Modes supported by Cube Partitions?

There are primarily two types of data in SSAS: summary and detail data. Based on the approach used to store each of these two types of data, there are three standard storage modes supported by partitions:

ROLAP: ROLAP stands for Real Time Online Analytical Processing. In this storage mode, summary data is stored in the relational data warehouse and detail data is stored in the relational database. This storage mode offers low latency, but it requires large storage space as well as slower processing and query response times.
MOLAP: MOLAP stands for Multidimensional Online Analytical Processing. In this storage mode, both summary and detail data is stored on the OLAP server (multidimensional storage). This storage mode offers faster query response and processing times, but offers a high latency and requires average amount of storage space. This storage mode leads to duplication of data as the detail data is present in both the relational as well as the multidimensional storage.
HOLAP: HOLAP stands for Hybrid Online Analytical Processing. This storage mode is a combination of ROLAP and MOLAP storage modes. In this storage mode, summary data is stored in OLAP server (Multidimensional storage) and detail data is stored in the relational data warehouse. This storage mode offers optimal storage space, query response time, latency and fast processing times.

There are different variations of these Standard Storage Modes. Visit this msdn article for more details.

What is proactive caching in SQL Server Analysis Services?

Proactive caching is an advanced feature in SSAS and it enables a cube to reflect the most recent data present in the underlying database by automatically refreshing the cube based on the predefined settings. This feature allows the users to view the data in near real-time.

Proactive caching can be configured to refresh the cache (MOLAP cache) either on a pre-defined schedule or in response to an event (change in the data) from the underlying relational database. Proactive caching settings also determine whether the data is queried from the underlying relational database (ROLAP) or is read from the outdated MOLAP cache, while the MOLAP cache is rebuilt.

Proactive caching helps in minimizing latency and achieve high performance.

What are aggregates? What is the purpose of defining an aggregation design in Analysis Services?

Aggregates are summarized values, each of which corresponds to a combination of an attribute from each dimension and a measure group. An aggregate in SSAS is the differentiating factor between OLAP and OLTP, and is the fundamental principle of SSAS/OLAP, which offers blazing fast performance.

In theory, many people believe that SSAS stores aggregated values for every combination of each attribute from each dimension and in each measure group. However, in reality, SSAS stores only a part of all combinations and not all the possible combinations. In some scenarios it might be helpful to create certain percentage of aggregations every time the cube is processed, without leaving the decision to SSAS. This is achieved by defining aggregation design.

The Aggregation Design Wizard is used to design aggregations in SSAS and it provides the following options as part of the aggregation design process:

Design aggregations until estimated storage reaches "X" MB.
Design aggregations until performance gain reaches "X" percentage.
Design aggregations until the person designing the aggregations, clicks the "Stop" button.
Do not design aggregations at all (0% aggregation).

What is Usage Based Optimization in SSAS? How is it performed?

Usage Based Optimization is the process of defining aggregation design based on the actual queries executed by the users/applications accessing the cube, which is captured using a log.

SSAS offers a wizard called Usage Based Optimization Wizard to perform Usage Based Optimization by defining aggregation design. The Usage Based Optimization Wizard provides the following options as part of the Aggregation Design Process:

Design Aggregations until estimated storage reaches "X" MB.
Design Aggregations until performance gain reaches "X" percentage.
Design Aggregations until the person, designing the aggregations, clicks the "Stop" button.

As we can see, the above options offered by the Usage Based Optimization Wizard are similar to the ones offered by the Aggregation Wizard. However, in the case of the Aggregation Wizard, all possible queries are given equal importance, whereas in the case of the Usage Based Optimization Wizard, the importance is based on the actual queries being fired.

To use the Usage Based Optimization Wizard, Query Log Properties need to be configured at the SSAS Server Level for the tool to be able to log the queries fired by users for accessing the cube.

What are the different aggregation functions available in SSAS? Can you explain each one of them?

SSAS supports the following types of aggregation functions for Measures:

Fully Additive

Sum: This is the default aggregation type and calculates the sum of all the child members of a member at any level in the hierarchy, except for leaf level.
Count: Calculates the count of all the child members of a member at any level in the hierarchy except for the leaf level.

Semi Additive

Min: Returns the lowest value among all the child members.
Max: Returns the highest value among all the child members.
FirstChild: Returns the value of the first child of a member.
LastChild: Returns the value of the last child of a member.
FirstNonEmpty: Returns the value of the first non-empty child of a member.
LastNonEmpty: Returns the value of the last non-empty child of a member.
ByAccount: This aggregation type is applicable only if there is an Account Dimension in the cube. A Dimension is treated as an Account Dimension by SSAS, only when the Type property of the dimension is set to "Account". This function aggregates the data based on the aggregation function set for members of an Account Dimension. If an Account Dimension does not exist, then setting this aggregation type is equivalent to No Aggregation (None).
AverageOfChildren: Calculates the average of values of all the non-empty child members of a member at any level of a hierarchy, except at the leaf level.

Non Additive

None: When an aggregation function is set to None, no aggregation is performed and the value of the member (both leaf and non-leaf) is returned directly.
DistinctCount: Returns the count of unique/distinct child member of a member at any level of a hierarchy, except for leaf level.

What is Time Intelligence? How is it implemented in SSAS?

Time Intelligence is a technique, which allows us to align the Time Dimension in SSAS with our actual Calendar, thereby making time calculations like Period to Date, Comparison across Parallel Time Periods, Cumulative aggregates, etc. very simple without the need for us to write explicit MDX queries/expressions.

For implementing Time Intelligence, the Type property of the Time Dimension should be explicitly set to "Time" and the attributes of this dimension should be set appropriately like year, month, etc.

SSAS offers a wizard called Business Intelligence Wizard which allows us to add different types of Intelligence to SSAS like Time Intelligence, Account, Intelligence, and Dimension Intelligence etc.

What are translations? What is its significance in SSAS?

Translations in SSAS allow us to bind labels/properties of those objects in SSAS which can be represented in multiple languages. In simple terms, Translations allow us to display the Labels and Captions of various SSAS objects in different languages. In SSAS both metadata and data can be translated. The objects which support Translations in SSAS include databases, cubes, dimensions, attributes, hierarchies, measure groups, calculated members, KPIs, and various other objects.

In today's world, business are growing and expanding to a very large extent and tend to have presence internationally. In such situations, it would be essential that, SSAS objects support localization and people from different geographical locations be able to see the information in their local language.

What are perspectives? How they can be used to implement security in SSAS?

A perspective is a visual layer on top of SSAS and is used to display a subset of the cube/dimension based on either a specific subject area or based on the target audience, or any other scenario which might require exposing a subset of cube/dimension to the users/applications to simplify the view of data.

Perspectives are a very useful feature especially when the cubes are too large in size containing many dimensions and measure groups. Such large cubes can be too much data for the users, if the data is not exposed using suitable perspectives.

Perspectives are not a security mechanism. They cannot be used to implement security in SSAS. The security settings/restrictions applied at the cube/dimension/data level apply at the perspective level and no separate security settings can be defined at the perspective level.

What are the different levels where security can be implemented in SSAS?

Security is one of the most common and most important needs for any application especially when working with sensitive data. In SSAS, security can be implemented at various levels:

Server Level: A user is either allowed or denied access to the OLAP Server.
Database Level: A user is either allowed or denied access to specific databases on the SSAS Server.
Cube Level: Users can be allowed or denied access to specific cubes within a database especially when a database contains multiple cubes.
Dimension Level: Users are either allowed or denied access to specific dimensions and thereby enabling or restricting them from browsing the data using those specific dimensions.
Measure Level: Either one or more measures can be exposed or restricted from the user.
Cell Level: This can be of two types: Security for dimension members and security for data cells. With dimension members, users can be allowed or restricted to see specific members of a dimension. With data cells, users can be allowed or restricted to see a specific range of cells. In both of these scenarios, users can be given read, read/write, etc. permissions.

What is writeback functionality in SSAS? In what scenarios is it useful?

Writeback is a functionality in SSAS which allows the users to write the data back into the cube. Meaning, while browsing the cube, they can make changes to the data and those changes are written back into the cube.

Writeback can be enabled primarily at two levels, either at the dimension and/or partition. For dimension level writeback, the users can modify the members of a dimension. For partition writeback, users are allowed to modify the cells of data and hence it is commonly referred to as cell level writeback.

Users require special permissions to be able to use the writeback functionality to write the data back into the cube. For dimension writeback, users should have the read/write permissions and the "enable dimension writeback" property should be set. In the case of cell/partition level writeback users need to have read/write permissions and the partition should be enabled for writeback.

Writeback functionality can be useful in many scenarios such as to perform What-If analysis. Say a business user wants to understand how much impact a change in freight and other taxes can have on profitability, which they can simply achieve by trying to set a different set of values for freight and taxes then see the profitability changing.

What are the deployment options available in SSAS?

Below are the different options to deploy a SSAS solution to an OLAP Server:

Business Intelligence Development Studio (BIDS): This is one of the primary tools used for developing SQL Server BI solutions and is the most frequently used IDE by the developer community in SQL Server 2008. BIDS can be used to deploy the SSAS Objects/Solutions. BIDS was replaced by the SQL Server Data Tools in SQL Server 2012, but has much of the same functionality.
Analysis Services Deployment Wizard: Analysis Services Deployment Wizard is a tool included as part of the SSAS installation and can be used to deploy the SSAS objects/solutions.
XML for Analysis Services (XMLA): It is used for deploying the SSAS objects to the OLAP Server. Apart from deployment purposes, XMLA scripts can be used for various other purposes like managing SSAS objects, creating/modifying SSAS objects and processing SSAS objects, etc. It is also known as the Analysis Services Scripting Language (ASSL).

What is processing in SSAS?

Processing is the process to load data into SSAS objects like Cubes, Partitions, and Dimensions, etc. on the OLAP Server with the data from the underlying relational data warehouse. As part of the processing step(s), aggregations are performed, data is loaded into one or more pre-defined partitions and various actions, which involve data modifications are performed in this step. The processed data is stored into respective locations like the OLAP Server, relational data warehouse, etc. depending on the Storage Mode defined for the different objects.

What is database synchronization in SSAS? What is its advantage?

Database synchronization is the process of updating the data and metadata in a database based on the target SSAS Server based on the data and metadata from the source SSAS Server. Analysis Services offers a tool called the "Synchronize Database Wizard" to synchronize databases on two different SSAS databases.

The primary advantage of a database synchronization via the Database Synchronization Wizard is that, while the databases are being synchronized by the wizard, the destination database continues to be available and the users can access the database as usual. Once the synchronization is completed, Analysis Services drops the out dated copy (old database) and switches automatically to the new database which has been updated.

What are the different processing modes available in SSAS? Explain each one of them.

Below are the processing options available in SSAS:

Process Add: This processing mode is available for Dimensions, Measure Groups and Partitions. When this processing mode is applied to Dimensions, it adds new members into the dimensions from the underlying database and updates the descriptions of the attributes. When this processing mode is applied to Measure Groups and Partitions, it adds the incremental fact data into the cube and processes only the affected partitions.
Process Clear: This processing mode is available for Dimensions, Measure Groups, Partitions, Cubes, Mining Models and Structures as well as OLAP (SSAS) databases. This processing mode is used to remove the data from an Analysis Services object. Data is removed from an object and all the subsequent child level objects.
Process Clear Structure: This processing mode is available only for Mining Structures as part of the Data Mining component of SSAS. This processing mode is used to clear the training data previously loaded into a mining structure.
Process Data: This processing mode is available for Dimensions, Measure Groups, Partitions, and Cubes. This processing mode is used to process the data without actually building or creating the aggregations and indexes.
Process Default: This processing mode is available for Dimensions, Measure Groups, Partitions, Mining Models and Structures, Cubes, and OLAP (SSAS) databases. When this process is applied to any of these objects, it first determines the current state of the object from a processing standpoint and then does the necessary processing to bring the object to current/fully processed state.
Process Full: This processing mode is available for Dimensions, Measure Groups, Partitions, Mining Models and Structures, Cubes, and OLAP (SSAS) Databases. When this processing mode is applied to an object, the object itself and all the subsequent lower level objects are fully processed. The data is cleared from an object if it already has some previously processed data.
Process Index: This processing mode is available for Dimensions, Measure Groups, Partitions, and Cubes. This option works only on the processed objects and for unprocessed objects you will get an error. When applied on any processed objects, it creates aggregations and indexes if they don't already exist. It recreates them if they already exist.
Process Structure: This processing mode is available for Mining Structures and Cubes. When this processing mode is applied to a cube, it creates the cube structure and it also processes the cube dimensions if they are not already processed. This processing mode loads the data into the Mining Structure, when applied on it.
Process Update: This processing mode is available only for Dimensions. When this processing mode is used, it freshly reads all the dimension data from the database and updates the data in the dimension.

What are the different ways to process objects in SSAS?

SSAS objects can be processed using various techniques. The following are the most popular tools and techniques used to process objects in SSAS:

SQL Server Integration Services (SSIS): Analysis Service Processing Task in SSIS can be used to process SSAS objects.
Business Intelligence Development Studio (BIDS): SSAS Objects can be processed via Business Intelligence Development Studio from within the SSAS Project. BIDS was replaced by the SQL Server Data Tools in SQL Server 2012, but has much of the same functionality.
SQL Server Management Studio (SSMS): SQL Server Management Studio can be used in multiple ways to process SSAS objects.

Object Explorer: Using SSMS, one can directly connect to Analysis Services Server and process the objects through the Object Explorer (GUI).
XMLA: An XML for Analysis Services (XMLA) script can be executed directly in SSMS to process the SSAS objects.
SQL Server Agent: An XMLA script can be executed using a SQL Server Agent Job which also helps in scheduling the script to perform the processing during off hours.

Command Line: SSAS offers the following two command line utilities which can be used to process SSAS objects:

Power Shell: Power Shell is a well known and very powerful command line utility which is part of SQL Server. It can be used to process SSAS objects. Apart from this, Power Shell can be used to perform various other operations within SQL Server.
ascmd: The ascmd utility can be used to execute XMLA scripts as well as MDX and DMX queries against an instance of SSAS. Using this utility, one can execute the relevant scripts to process the SSAS objects.

Programmatically: Analysis Management Objects (AMO) is a collection of SSAS management classes, which can be used for processing SSAS objects. Apart from this, AMO can be used for various management activities in SSAS.

What are the programmability options available in SSAS?

In Analysis Services there are two major programmability options: AMO and ADOMD.NET.

AMO (Analysis Management Objects) is a set of classes used for managing an Analysis Services instance and can be used to perform various management activities related to SSAS. Using AMO, the following operations can be performed on the SSAS objects:

Creation
Modification
Deletion
Processing

With AMO, one cannot query the SSAS objects. To query the data from SSAS programmatically, one can use ADOMD.NET which is a .NET based data provider. It is primarily used to communicate with SSAS and uses Multidimensional Expressions (MDX), Data Mining Extensions (DMX), and Analysis Services Scripting Language (ASSL), etc. commands to query the data.

What are the different querying and programming languages supported by SSAS?

The following languages are supported by SSAS:

T-SQL: Transact-SQL queries can be used to query certain aspects of SSAS.
MDX: It stands for Multidimensional Expressions. MDX queries are the primary means of querying data in SSAS.
DMX: It stands for Data Mining Extensions. DMX is used for queries related to Data Mining in SSAS.
AMO: It stands for Analysis Management Objects. It is .NET based and can be used to manage SSAS objects, creating/modifying/deleting SSAS objects and implementing security in certain scenarios.

What are the different tools used for accessing cubes?

Here is a list of different tools which are used for accessing Cubes in Analysis Services:

Business Intelligence Development Studio (BIDS): BIDS is used to access the cube and preview the data at the design time. This option is useful only for the developer community and is not applicable to business analysts and decision makers. BIDS was replaced by the SQL Server Data Tools in SQL Server 2012, but has much of the same functionality.
SQL Server Management Studio (SSMS): SSMS is used to access the cube by people in different roles. It is used to access the cube by the development/testing teams to validate the data. Those business analysts and decision makers (very few) who are familiar with SSMS; also use it for accessing the cubes.
Microsoft Office Excel: Excel being one of the most widely used data analysis tool, has the capability to connect to cubes and perform different kinds of reporting such as pivot tables, charts, etc. It is used by many decision makers, business analysis, and even developers for reporting purposes.
SQL Server Reporting Services (SSRS): SSRS allows Analysis Services databases or cubes as a data source to build reports. This option is available both in BIDS\SQL Server Data Tools while building canned reports and in Report Builder while doing ad-hoc reporting.
PerformancePoint Services (PPS): PPS which is now a part of SharePoint 2010 allows building KPIs, scorecards, reports, and dashboards with Analysis Services cubes as data source.

These are the major tools in the Microsoft BI stack which support Analysis Services data sources. Apart from these, there are various third party tools which support Analysis Services as a data source.

How do you capture which users are connecting to SSAS, which objects of SSAS are being used, which queries are being executed, etc.?

Analysis Services offers Dynamic Management Views (DMV), which are along the same lines as the DMVs for the relational database engine. Using the DMVs is the most common approach to capturing these statistics and to monitor the current status of the SSAS server. With the SSAS DMVs, one can capture various statistics including:

Utilization of resources such as CPU, memory, etc.
Usage of aggregations, hierarchies, and other components of SSAS.
Who is connecting to the OLAP database and what are the different objects being used.
Queries being executed by the users, time taken for executing different user queries, and so on.

1. What do you mean by Microsoft Business Intelligence and what components of SQL Server supports this?

§ Microsoft defines its BI solution as a platform to provide better and accurate information in a easily understandable format for quicker and better decision making. It consists of BI tools from SQL Server, SQL Server Integration Services (SSIS), SQL Server Analysis Services (SSAS), SQL Server Reporting Services (SSRS), Microsoft SharePoint and its Office products

There are 5 key areas to test:

1. Cube design: the stages of creating a cube in BIDS, i.e. data source, DSV, dimension, cube.

2. Performance tuning: troubleshooting query performance as well as processing performance.

3. MDX: the multi dimensional expression that we all love.

4. Client tools: creating reports and dashboards on Strategy Companion, Excel, Tableau, Panorama, etc.

5. Administration: installing and upgrading SSAS servers, configuring SSAS server settings, security, processing, deployment, migration, backing up and restoring the cubes.

Beyond these 5, I think it is very important for a SSAS developer (and ETL developer) to understand the concept of dimensional modelling well. Hence I like throw in one or two Kimball questions. Usually candidates understand dimensional modelling, at least the basic, but we need to make sure.

1. Cube design

Out of the 5 things above, Cube Design is the most important one. It is the bread and butter of the job. We need to establish 2 things here:

a) Do the candidates know the basic cube design (things that your company use now)

b) Do they know the advanced cube design (features that your company don’t use now)

Because it depends on the SSAS features your company use, what constitutes “basic” are different from company to company. But it’s probably not far from this list:

Data source: connection string, impersonation
DSV: table, view, named query, relationships between entities
Dimension: hierarchy, sort order, attribute relationships, OrderByAttribute, display folder, default member
Cube: referenced dimension, many-to-many, partitioning, aggregation, measure expression, format string, calculated members, error configuration, display folder
Role: membership, cube access

Advanced cube design:

DS & DSV: multiple DS, multiple DSVs, logical primary key, friendly name, retrieve relationship, schema restriction, number of connections
Dimension: parent child dim, translations, date calculation dimension, multiple KeyColumns, ragged dim, type 2 dim, custom rollup, unary operator, write enabled, ROLAP.
Cube: actions (URL & drill through), translations, perspectives, HOLAP, proactive caching, input dimension (dimension which is not connected to any MG, but used in calculation), IsAggregatable, KPI.
Role: dimensional security, cell security
AMO: doing all of the above programmatically using C#.

I usually like to establish first if the candidate can do the job, i.e. basic stuff above/things that we used in the company. I do this by picking 3 from the basic stuff, see if they are comfortable. If they do, I pick 3 from the advanced stuff. If they are not comfy with the basic, the question I need to answer is whether the candidate would be able to sort it out by themselves on the job. I do that by giving them some hints.

Whenever possible, I prefer to wrap the question in a business context, rather than bare theory. For example, question a) below is with business context, where as question b) is theory:

a) Explain how you handle currency conversion.
They can use measure expression, many-to-many, but we know immediately if they have done this stuff or not.

b) Explain different modes of impersonation.
That is just theory. I prefer to give a context i.e. when would we want to use service account and when to use a specific user name? Or: if the connection string already defines how to connect to the database, then what’s impersonation for?

2. Performance tuning

I think what I wrote here [Optimising Cube Query and Processing Performance] is quite comprehensive to be used as a base for our questions. As per its title, it covers the troubleshooting of both query performance and processing performance.

For query performance we can ask MDX queries, aggregation, partitioning, server settings. For processing performance: dimension, materialized view, partitioning and the most important of them all is incremental processing. I explain more about processing performance here.

Again I like to wrap them in a case, for example: a cube takes 5 hours to process and we need to get it down to 1 hour. How can we do that? Many candidates never optimise dimension processing, but any good SSAS developer will at least understand partitioning. An expert developer may even give you a lecture about why aligning table partitioning and incremental partition processing are very important.

Another favourite question of mine is what to do if a) a query is slow, and b) a certain measure is slow (other measures are fine). For a) trap the MDX, run it in MDX Studio (thanks Mosha) and find out which part makes it slow by altering the query bit by bit. For b) establish first if it’s a calculated measure or base measure. If it’s calculated, look at the MDX and analyse on MDX Studio. If it’s a base measure, look at the storage mode (ROLAP?), the dimension usage tab (relationship type = fact or many to many? Try removing some relationships & reprocess), measure expression (is it a simple A/B formula?), simplify the attributes involved in a query (perhaps the issue is with the attributes, not with the measures?).

3. MDX

This is always tricky. Not about figuring out what to ask, but what level of MDX do you need in your company/project. Any SSAS developer can do basic MDX, right? Not quite. Yes they can do the basic MDX select statement, but they might not know even the basic stuff like filter and sorting. So we need to test those. I’d recommend the following list:

Count, YTD, parallel period, filter, existing, order, first/lastchild, as/descendents, currentmember, if/iif/case, existing, head & tail, crossjoin, nonempty, except, scope, top/bottomcount.

They are considered every day use. Any SSAS developer who always queries their cubes using client tools would be separated from those who write MDX to query the cubes. Obviously we don’t need to ask all of them, just pick 3 or 4.

Before we test those functions, it’s worth establishing if they grasp the concepts of multidimensional database (OLAP) or not. For example: tuple, set, cube space, axis, level, member. If they can’t differentiate a tuple from a set, it’s probably not worth asking them any MDX function.

You might argue that “O but we don’t use MDX here”. But it is essential that they understand the concept and the basic functions. You can’t effectively compose SSRS reports displaying data from your cubes if your SSAS developer can’t write MDX. You can’t effectively build a cube if you don’t understand multidimensional database concepts.

XMLA (ASSL) is a bit different. They are not as important as MDX because you can script them on SSMS. Whether it is to create or to alter SSAS objects, or to process them, you can script them on SSMS. Besides, we have AMO for creating/altering SSAS objects programmatically. With AMO you get proper development environment like looping and branching, so you have more control. And it’s .NET. True that you can use tools like Excel, Panorama & ProClarity to get the MDX, or use Profiler to trap Strategy Companion & Tableau’s MDX statements, but you can’t optimise the calculated members if your SSAS developer doesn’t know MDX.

Another thing to consider is whether you need to cover DAX. This depends on whether you are using (or will be using) PowerPivot or not. But DAX is becoming more and more central in MS OLAP technology so in the long run (like 2 years from now) you will need to cover DAX. But for now it’s still optional, depending on the use of PowerPivot.

4. Client Tools

Specific features are different from one client tool to the next, but the basic ideas are similar: creating report, chart and dashboard. Some of the specific features to test are: create custom set, create calculated member, time calculation function such as YTD and QTD, hyperlink to differ reports, drilldown. One of my favourite questions is whether a specific calculation should be done in the client tool or in SSAS.

The administration of the client tool is usually done by somebody else, not the SSAS developer, so we shouldn’t worry about it in this interview. But if you are a small company then usually one person does it all, in which case you will need to ask the candidate about installation, migration, user security, and other aspects of the client tool.

5. Administration

Administration tasks are not the responsibility of an SSAS developer. They are the responsibility of the SQL DBA. These days, in large corporations, SQL Server DBAs are asked to manage all 4 aspects of SQL Server. They manage not only the relational engine, but also SSAS server, SSIS packages and SSRS servers. But in smaller companies SSAS developers are involved in administering the servers. So I include this section (section 5, Admin) in the interview.

The Questions

Enough with the background, let’s get on with the questions. I’m going to label each question with 1-5 indicating the 5 areas above, and LMH (low, medium, high) indicating the difficulty. I’ll put the label in curly brackets: {}. Hopefully the labels help you compose the right mix of interview questions.

1. How do you ensure that January, February, March, etc will be in the correct sequence when the user browse the cube? {1-L}

2. In SSAS, how do you design currency conversion for a) from many transaction currencies to 1 reporting currency, b) from 1 transaction currency to several reporting currencies and c) from many transaction currencies to many reporting currencies? {1-M}

3. A user reported that a Panorama report that she uses every day is very slow today. Normally it opens in 1 or 2 seconds, but today it’s didn’t open at all (she has waited for 4 minutes). Explain the approach how you are going to handle this case systematically. {4-L}
Note: you can still ask this question if though you are not using Panorama.

4. We have a risk analysis cube. This cube is 500 GB in size, containing 20 billion fact rows. Explain how we: a) process this cube efficiently; b) ensure a good query performance. {2-M}

5. Explain (step by step) how you build a cube from a data mart. {1-L}

6. Explain how you migrate a cube from a dev AS server to production. {5-L}
Follow up with advantage/disadvantage of each approach. {5-M}

7. A cube has 300 partitions and we want to process only the partitions where the fact table row changed. We don’t want to process the entire measure group. How do we do that? {2-M}

8. We have a cube with 1 measure and 1 dimension. That dimension has only 1 attribute. Write an MDX to list the members of that attribute of which measure1 is bigger than 10. We want to sort it in descending order, i.e. largest amount first. {3-L}

9. Like above but sort on the attribute, not on the measure. {3-L}

10. People say that we shouldn’t use cell security because it’s slow. Explain how we should use it then? How do you combine it with dimensional security? {2-H}

11. What’s the difference between a tuple and a set? What is an axis? {3-L}

12. You need to grant access to 200 users to access a cube. Each user can only access certain data within the cube. The access is by product and region, i.e. user1-20 can only access product1-50 and region1-3, user21-40 can only access product30-80 and region2-5. How would you do that? {5-M}

13. Some users need to see the sales figures in USD, other users in GBP. How would you do that? {1-H}

14. You need to upgrade 4 SSAS servers (dev, UAT, prod, DR) from 2005 to 2008. Explain would you approach this. {5-L}

15. People say that we can have a real time cube using ROLAP or ProActive Caching. Explain how, and the plus/minus of each approach. {1-H}

16. Cube processed OK but all measures return no value. How do you troubleshoot this issue systematically? {1-M}

17. How do you do Slowly Changing Dimension type 2 in SSAS? {1-M}

18. The data mart only has 1 measure: amount. This measure could be asset, expense, income, etc depending on an attribute called “Account”. How do you implement this in SSAS cube? {1-H}

19. The value of measure “price” is only valid if the product dimension is at the leaf level and the location dimension is at the “store” level. How do you set it? {2-M}

20. Attribute relationship: In what case would we want to set: a) the cardinality to “one to one”? b) The relationship type to rigid? What’s the risk (of doing so)? {1-M}

21. Why do you need to design aggregations? How do you design aggregations (usage-based and non usage-based)? Explain how you test it. {1-H}

22. People say that perspective is not a security measure. What do they mean by that? {1-M}

23. Star vs snowflake: what’s the benefit of having a snowflake data mart in SSAS? {1-M}

24. We have a dimension which is as big as the fact table (in terms of the number of rows). What do you suggest we do with this dimension? And with regards to SSAS, what would you do differently? {1-H}

25. Excel: how do you change language from English to French when browsing a cube?

Answers:

1. Order by key and put month number in the key column. If they can’t answer this you should worry.

2. It’s best if the candidate can explain about the currency conversion BI wizard. But if he can explain the “traditional way” it’s quite good too: using calculated measure and many-to-many relationship.

3. As with any support call, we need to find out which report. Try to reproduce what the user did in NovaView. Find what the MDX is (Tools menu, Direct MDX, ctrl-alt-V), execute in SSMS. If this is slow then find out which part of the MDX makes it slow by building the query step-by-step. Then consider: a) aggregation, b) block computation, c) convert to base measure (do calculation in the relational/ETL), d) improve the MDX if it’s a calculated measure, e) partitioning. Most of the time, when a user says that usually it is 2 seconds but today it timed out, it is not the MDX or the cube structure. It’s usually either the server or the connection. So check that a) the server is up and running and b) from Panorama you can “open” the cube in that server.

4. Incremental processing (IP). If you have huge cubes and the candidate has never done IP before, end the interview. If you have huge cubes in your company (100 GB), your questions should be focused on performance (processing & query). The job spec should clearly say: wanted: SSAS performance tuning specialist (not a “normal” AS developer) and prepare to pay 20% premium.

5. It’s an easy question; every single candidate should be able to answer this question as it is bread and butter. But their answer shows their class. One of the important things I’m looking for here is requirement analysis. An SSAS developer who “just build it” without analysing the requirements is usually less helpful then those who analyse the requirements. The other thing I’m looking for here is performance, e.g. when building the DSV do they ensure that the SQL is efficient? When creating calculated measure, do they ensure that the MDX is efficient? The last thing I’m looking for here is how they do unit testing.

6. a) backup-restore, b) script (XMLA)-execute-process, c) deployment wizard.

7. First detect (in relational) which partition changed (you need some kind of flag / timestamp column on the fact table). In the metadata table, update the status of each partition of which the data has changed. Then based on this metadata table, process the changed partitions using either a) AMO or b) “AS execute DDL” task in SSIS.

8. See below.

9. See below.

10. Look at Jason Thomas’ post here & here, and Bryan Smith’s post here.

11. A tuple is a cut of a cube; a set is a collection of several tuples (could be 1 tuple). A tuple is enclosed with parentheses: (); a set is enclosed with curly brackets: {}. An axis is an edge of a multidimensional result set returned by an MDX query (columns, rows, pages, sections, chapters)

12. Put users into AD groups and add these groups as the members of the cube roles. For each role, define the dimensional security (dimension data tab) on product and region dims.

13. Create 2 roles: USD and GBP and add users into these roles via AD groups. In each role, set the default member in the currency dimension using dimensional security (dimension data tab in the role).

14. Upgrade Dev first (document the process), test the cubes by running the reports (Panorama/Strategy Companion/Excel, etc). Use the documented process to upgrade UAT, and test the UAT cubes. Then upgrade prod and DR at the same time (using the documented process). Or DR first, test then Prod. Before upgrading Dev, you might want to copy a Prod cube to Dev and export some reports into Excel. After upgrading Dev, run the same reports and compare with the results before upgrade.

15. There are 2 things here: dim and MG. In reality, a real time cube doesn’t mean that all dims and all MGs are real time. But only 1 or 2 MG are real time. Usually all dims are MOLAP, perhaps with the exceptions of 1 or 2 dims. To make an MG real time, we can define it as ROLAP storage mode. For performance reason, best not to define the whole of MG as ROLAP, but only 1 partition (usually the last one, if partition by month/period). Unless it’s a small MG (<100,000 rows). For dims, to make it ROLAP set the StorageMode property. ProActive Caching (PC) provides automatic management of MOLAP storage. The cube incorporates the changes in the relational tables, based on a notification mechanism. You can do PC on partition and dim. If the fact table (or partition of the fact table) is small, and data is changing frequently (like every minute), best to use ROLAP. If the fact table (or partition of the fact table) is large (like 8m rows in partition), and the data is rarely changed, best to use PC.

16. Do “explore data” on the fact table on DSV. Run the SQL if it’s a named query. If this doesn’t return anything, check the fact table. If this returns rows, check the dimension usage tab. Focus on 1 MG. Remove all relationships (keep the cube dims), reprocess cube and see if you get a measure at the top level. If you do, add the relationship to the dims one by one until you find out which dim causing the issue.

17. No special settings, just create the dim as normal from the SCD2 table. The only difference that it has 3 extra attributes: IsCurrent, Valid From & To date.

18. Parent child dim, unary operator column, custom rollup.

19. Scope statement

20. a) If for each member of attribute1 there is only 1 member of attribute2, b) if “member mappings” are fixed. Risk: if a member changes its mapping.

21. Why: to increase query performance. How (usage based): OLAP query log. How (non usage based): based on attributes which are queried/used most. Refer to SSAS2008PerfGuide section 3.4.3.1. Test in Mosha’s MDX Studio, before and after, 2 different cubes, same query.

22. You can’t prevent access to dim/attribute/measure using perspective. They just can’t see it, but if they know the name of the dim/attribute/measure they can access it.

23. Flexibility in building a dimension, speed of processing. No difference in query performance.

24. Dimensional modelling: break into smaller dimensions and/or snow flake the dim. SSAS: limit the number of attributes (don’t just take all attributes but be selective), index on those attributes (relational), limit the number of rows going into the cube by using a where clause on the DSV/view, put attributes with lower grain (smaller number of distinct values) into a referenced dim, linked via the main dim to the MG.

25. Modify the connection string in the ODC file, see my post here.

Number 8:

select order

( filter

( {[dim1].[attribute1].[All].Children},

[Measures].[Measure1] &gt; 10

[Measures].[Internet Sales Amount],

desc

) on rows,

[Measures].[Measure1] on columns

from [Cube]

Number 9:

select order

( filter

( {[dim1].[attribute1].[All].Children},

[Measures].[Measure1] &gt; 10

[dim1].[attribute1].CurrentMember.Name,

desc

) on rows,

[Measures].[Measure1] on columns

from [Cube];

Learn More About SSAS

Before discussing any future questions, it is crucial to understand their basic concepts as well as their applications. To start, the Analysis Service is generally used for analytical processing, Online Analytical Processing (OLP) and data mining. SQL Server Analysis Service enables integration and management of data from various data sources to cater for business intelligence applications. These are all an integral part of the SQL server enabling creation and the management of highly complex and multidimensional data structures in addition to the identification of vital data through data mining. There are specific questions listed in the following paragraphs that will certainly be included in SSAS interviews regarding their core concepts.

Describe the Significance of SSAS

In order to perform analysis and forecast operations, you need to have SSAS in SQL server. The benefits of SSAS are many, it:

Is convenient with user-friendly interactive features.
Enables creation and management of data warehouses.
Offers high end security principles for enhanced efficiency.
Provides speedy analysis and troubleshooting options.

How Many Languages does SSAS Use?

Another basic question that will examine your knowledge regarding SSAS will refer to the languages used. Here are the four main languages:

SQL- (Structured Query Language).
MDX- (Multidimensional Expressions) for analysis.
DMX- (Data Mining Extensions) for data mining operations.
ASSL- (Analysis Service Scripting Language) for managing analysis service databases.

What Types of Data Sources does SSAS Support?

Data sources are the bridge between SSAS and the database where the data is loaded into the former during processing. A data source will invariably have a provider, database name, server name and impersonation information. Both .Net and OLE DB providers are supported by SSAS. Here is a list of supported sources:

SQL Server
MS Access
Oracle
Teradata
IBM DB2
Any other relational database having appropriate OLB DB provider

What is Your Role as an Analysis Services Information Worker?

This question speaks to your very existence in the organization, and there should be a clear understanding of what is expected from you and how you are going to accomplish company goals. The role of modern day analysis services information worker has its origin from the Domain Expert. In addition, you may be assigned with the management of anything ranging from business analysis, technical training and management of help desk/operation to Network Administration.

Definition of Certain Terms and their Significance

You will more than likely come across many technical terms that are related to SSAS. It is important to know and understand them in order to give you an upper edge in interviews.

How Do You Define a Partition and How Does it Affect Operations?

The physical location of stored data or a cube is a partition, and it is an effective and highly flexible technique for managing large cubes. It enhances the efficiency and potency of analytical services because partitions with advanced techniques such as aggregation make queries run faster by enabling them to look into only those measure groups that provide answers.

What is the Difference Between OLAP And OLTP?

Online Analytical Processing (OLAP) and Online Transactional Processing (OLTP) both function in different ways.

OLAP is designed for daily business decisions whereas OLTP is used for daily business operations.
The frequency of data updates is very high in OLTP with the most recent data getting integrated consistently. However, in OLAP, data update frequency is relatively low.
OLAP uses a large amount of historical, and OLTP only use a very limited amount of historical data.

What are some UDM Benefits?

Unified Dimension Model (UDM) is a method used to gap the bridge between the user and data sources. It ensures a high performance ratio by effectively managing queries involving huge volume of data and complicated processes such as interactive analysis.

Questions Related to the Functional Aspects

How Much Do You Know About Impersonation?

Impersonation is the means used by SSAS in data operations such as processing, data access, etc… in order to identify the client application or ascertain its security. There are four major forms for using impersonation:

Using the service account.
Using the current user credentials.
Using specific user name and password for Windows.
By default method of selecting impersonation method appropriate to the type of operations.

How Do You Use Named Calculation and Named Query in SSAS?

Named calculation is essentially a column with an expression added to the table in DSV. It should always conform to the language of the underlying data source. Named query, on the other hand, combines data from one or more tables based upon the underlying data source to form a new table based on a SQL query.

How Do You Combine Multiple Data Sources in SSAS?

To combine multiple data sources, you need to create a DSV from a source that serves as the primary data source throughout operations. Generally, SQL server is found to be the best source for creating an initial DSV. Afterwards, additional data sources can be added into the DSV to act as secondary sources and you can incorporate as many tables as you want into the DSV from these secondary sources.

The Microsoft Business Intelligence (BI) suite provides a very powerful technology called SQL Server Analysis Services (SSAS) to deliver OnLine Analytical Processing (OLAP) and data mining for business intelligence applications.

The purpose of analysis services is to turn data into information and to provide quick and easy access to that information for decision makers. SSAS provides OLAP by letting you design, create and manage multidimensional structures that contain data aggregated from other data sources, such as relational databases and provides many data mining algorithms for mining data from data sources. So for delivering OLAP and data mining it uses client and server technologies.

The main idea of SSAS is to provide fast results from data sources when we apply a query because in order to make a decision we need data of various dimensions. OLAP does this by pre-calculating and storing aggregates. This requires a lot of disk space, but here it provides the ability to explore the data very quickly because we have already processed the data.

Now there are two basic questions that come into your mind:

Am I right? So now I will answer both the questions as in the following:

Why: Most of us use relational databases, so in a relational database, data is stored and you can fetch the data by applying queries, generating reports and so on, but here if we think about queries and reports we can access data of certain things but by utilizing analysis services we can play with data, or slice and dice the data with very less effort.

The most important thing about this is that you can provide the power of analysis of data to someone else who is not a DBA or a database professional or developer.
How: This is the most important question, how to use analysis services for OLAP and Data mining. Even I can't say, it is very easy to work on analysis services, it's a challenge in itself. There are many tutorials provided for this so you can use them. This is my first Article on SSAS so as soon as I get time I will soon write a full series of articles on this.

Now I am starting the core part of analysis services. Here I am not jumping directly into the hands-on of how to create a cube at all. I am first providing an overview of the internal structure of SSAS.

SSAS uses both client and server components to provide OLAP and data mining functionality:

The Server component is implemented as a Microsoft Windows service. SSAS supports multiple instances on the same computer, so each instance is implemented as a separate instance of a Windows service.
Clients communicate with analysis services using XMLA (XML for Analysis), a SOAP-based protocol for issuing commands and receiving responses, exposed as a web service.
Query commands can be issued using the following languages:

SQL: This is a structured query language, most of us know about it very well, so I will not describe it here.
MDX (Multidimensional Expression): MDX is a query language to retrieve multidimensional data from SSAS. MDX is based on the XMLA specification.
DMX (Data Mining Extensions): This language works with data mining models. You can use DMX to create a new structure of data mini models, to train these models, to browse and to manage and predict against them.
AASL (Analysis Services Scripting Language): Most of the communication of the SSAS client happens using SOAP messages. AASL is the XML dialect used for these messages, made up of two parts; the first is DDL or object definition language, that defines and describes an instance of Analysis Services and the second is a command language that sends action commands, such as Create , Alter or Process to an instance of Analysis Services.

Since I have already said that SSAS is based on client and server components, I will now explain both components of the Architecture in detail.

Server Architecture: This runs as a Windows service. The Msmdsrv.exe application is a server component. This application consists of security, XMLA listener, query processor and other components that perform the following tasks:
Client Architecture: SSAS has a thin client Component Architecture. All queries and calculations are resolved by the server only. So for each request a server to client connection is required. There are several providers with SSAS to support various programming languages. These providers communicate using SOAP packets. You can better understand this by the following diagram:

SSAS also has a web architecture with a fully scalable middle tier for deployment by both small and large organizations. SSAS also provides middle tier support for web services.

Both client and server can communicate without using any provider. They can communicate by sending XMLA in a SOAP packet. Check the diagram given below for a better understanding.

I know many of you are feeling very boreD after reading so much theory. Thanks for your patience. I will now finally describe very the important keywords or I can say core concepts, that are the building blocks for SSAS. So if you have become to fell sleepy, now is the time to wake up and read these keywords very carefully because without this we are not able to understand SSAS.

Now without wasting any time I am explaining the following keywords:

Cube: This is basic unit of storage and analysis in SSAS. It is a multidimensional data structure that holds data that's been aggregated to return data quickly when a query fires.
A Cube is a related set of measures and dimensions used to analyze data. A cube is essentially synonymous with a Unified Dimensional Model (UDM).
Dimension: Dimensions are a group of attributes based on columns of tables of a view. Dimensions are always independent of a cube so they can be used in multiple cubes. Dimensions are the criteria onto which analysis of business data is performed, like time, geography and so on. For example time can be years, months, quarterly, weekly and so on.
Dimension Table: This table contains the entire dimensions onto which you want to summarize your data, like in the case of a time dimension this table can contain Year, Month, Quarter, Week and so on.
Fact Table: A fact table contains the data that measures the organization's business operations. A key characteristic of a fact table is that it contains numerical data (facts) that can be summarized to provide information about the history of the operation of the organization. Any table that you are using for a function like sum or average can be called a fact table.
Measures: A Cube is based on one or more measures that are nothing but a column of facts table that you want to analyze.
Schema: A fact table and a dimension table are always related and this relation inside the cube forms a schema. There are multiple types of schema but the most common two are start and snowflake schema. I will write a separate article for explaining the various kinds of schema because here I don't want to complete all in short.

Here I have described all the basic concepts as much as possible in this article. Now I will define the objects or I can say indexes for our hands-on articles that I will cover in my next articles. But if required in between hands-on, I will post some theoretical articles like this so with hands-on we can also understand the concepts.

Create SSAS Project in BIDS
Create a data source
Create Data Source View
Create Dimension
Create Cube

These are the basic topics that I will cover in future articles and surely this list will increase as soon as I write more articles.

Explain Business intelligence and ETL?

Business intelligence is a collection of skills, technologies and practices, by which we can gather data, analyze data and provide data for forecasting and better decision making.

BI system has 3 important phases, Collect, Analyze and View:-

· Collect data: - Data in an enterprise can be stored in various formats. Now these formats can vary from normalized structured RDBMS to excel sheets or probably unstructured file formats. So the first step in BI is to collect all these unstructured and scattered data and bring them in to one uniform format.

This can be achieved in 3 steps:-

1. Extract: - In this step we read unorganized data from these sources by understanding their data structure.

2. Transform: - In this step we transform the data in to a standard format.

3. Load: - Finally we load the standard format in to a data warehouse and data mart (In the next part we have explained what data warehouse, data mart and OLAP?).

· Analyze data: - Once the data is loaded in to Data ware house, you run tools, algorithms so that you can analyze and forecast information.

· View data: - Once you have analyzed the data you would like view it. Now again how people want to view data can vary from simple tabular format to complex graphical chart. So in this section we would need good reporting tools to achieve the same.

What is data warehouse and data mart and how do they differ?

Data warehouse is a database which is used to store data for reporting and data analysis. Data in data ware house can come from disparate sources like structured RDBMS, Files or any other source. ETL ( ETL has been explained in the previous question) process fetches data from these sources and loads it in to data warehouse.

Data warehouse can be further divided in to data marts. Data warehouse focuses on wide data while data mart focuses on specific processes.

Data warehouse are also termed as OLAP systems (OLAP is explained in the next question) . The database designs for these systems do not follow conventional normalization (1st, 2nd or 3rd normal form) design. Most them use denormalized design like star schema and snow flake design. They normally store data in fact and dimension tables.

What is the difference between OLTP and OLAP system?

Both OLTP and OLAP are types of IT systems.OLTP (Online transaction processing system) deals with transactions ( insert, update , delete and simple search ) while OLAP (Online analytical processing) deals with analyzing historical data, forecasting etc.

Below is a simple table which chalks out the differences.

	OLTP	OLAP
Design	Normalized. (1st normal form, second normal form and third normal form).	Denormalized (Dimension and Fact design).
Source	Daily transactions.	OLTP.
Motive	Faster insert, updates, deletes and improve data quality by reducing redundancy.	Faster analysis and search by combining tables.
SQL complexity	Simple and Medium.	Highly complex due to analysis and forecasting.

What are Facts, Dimension and Measures tables?

The most important goal of OLAP application is analysis on data. The most important thing in any analysis are “NUMBERS”. So with OLAP application we would like to get those numbers, forecast them, analyze them for better business growth. These numbers can be total sales, number of customers etc.

These numbers are termed as “Measures” and measures are mostly stored in “Fact” tables.

“Dimension” describes what these measures actually mean. For example in the below table you can see we have two measures 3000 units and 1500 $. One dimension is “ProductWiseSales” and the other dimension is “AgewiseSalary’.

Dimensions are stored in dimension table.

Dimension	Measures
ProductWiseSales	3000 units
AgeWiseSalary	1500 $

What are Cubes?

A cube helps us to get a multi-dimensional view of a data by using dimension and measures. For instance in the below table we have “Units” as measure and we have 3 dimensions Product wise, Year wise and Country wise.

Product Name	Units	Year	Country
Shirts	3000	2011	India
Shirts	1000	2012	India
Pants	4500	2011	India
Pants	200	2012	India
Shirts	3100	2011	Nepal
Shirts	1200	2012	Nepal
Pants	600	2011	Nepal
Pants	1200	2012	Nepal

So if we change the dimensions and measures in a cube format we would get a better picture. In other words cube is intersection of multiple measures and their dimensions. If you visualize the same in graphical mode below is the image how it will look like.

What is the difference between star schema and snow flake design?

Star schema consists of fact and dimension tables. The fact tables have the measures and dimension tables give more context to the fact tables.

In the below figure “Star design” you can see we have four dimension tables and each one of them are referencing the fact tables for measure values. The references between dimension and fact tables are done using simple foreign key relationships.

Figure: - Star design

Snow flake design is very much similar to star design. The exception is the dimension table. In snow flake dimension tables are normalized as shown in the below figure “Snow flake design”. The below design is very much similar to the star design shown previously but the products table and vendor tables are separate tables.

The relationship is more of a normalized format. So summing in other words Star design is pure denormalized design while snow flake can have normalized dimension tables.

Figure: - Snow flake design

	Snowflake Schema	Star Schema
Normalization	Can have normalized dimension tables.	Pure denormalized dimension tables.
Maintenance	Less redundancy so less maintenance.	More redundancy due to denormalized format so more maintenance.
Query	Complex Queries due to normalized dimension tables.	Simple queries due to pure denormalized design.
Joins	More joins due to normalization.	Less joins.
Usage guidelines	If you are concerned about integrity and duplication.	More than data integrity speed and performance is concern here.

Where does SSIS, SSAS and SSRS fits in?

SSIS (SQL Server integration services) helps to collect data. In other words it does ETL (Extract transformation and loading) as explained in the previous question.

SSAS (SQL Server analysis services) helps us to analyze data by creating cubes, facts, dimensions and measures.

SSRS (SQL Server reporting services) helps us to view this analyzed data in different formats like graphical, tabular etc.

What is a package, control flow and data flow?

· Data flow defines flow of data from a source to a destination. In other words it defines individual ETL.

· Control flow defines work flow (iterations, conditional checks etc.) and component execution (example: - send email, copy files, invoke web services, FTP etc.). Control flow has nothing to do with data. The data flow only deal with data i.e. movement, transformation and loading.

· Package is a collection of control flows.

SSAS - Performance Tuning of Tabular Models in SQL Server 2012 Analysis Services

What is Tabular Model Designer (SSAS Tabular)

The tabular model designer is part of SQL Server 2012 SQL Server Data Tools (SSDT), integrated with Microsoft Visual Studio 2010 or later, with additional project type templates specifically for developing professional tabular model solutions.

Benefits:

When you install SQL Server Data Tools (SSDT), new project templates for creating tabular models are added to the available project types. After creating a new tabular model project by using one of the templates, you can begin model authoring by using the tabular model designer tools and wizards.

In addition to new templates and tools for authoring professional multidimensional and tabular model solutions, the Visual Studio environment provides debugging and project lifecycle capabilities that ensure you create the most powerful BI solutions for your organization.

https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiV7V_m0_3ScZj4Xjtm4fcywKS6ruPqyXE_Uddf6S3aftfnrmiv8ZvtYUR8PJfmxYx-TlkY6EJOhNyJMOIBQ0XpY87ieuVD7wWV9PHAYXs7mC72hGns3_xDjYePQN0TDMh7P9ZwDstDo8A/s400/BISM.jpg

Comparing Tabular and Multidimensional Solutions (SSAS)

http://technet.microsoft.com/en-us/library/hh212940.aspx

Tabular models hosted in SQL Server 2012 Analysis Service provide a comparatively lightweight, easy to build and deploy solution for business intelligence. However, as you increase the data load, add more users, or run complex queries, you need to have a strategy for maintaining and tuning performance. This paper describes strategies and specific techniques for getting the best performance from your tabular models, including processing and partitioning strategies, DAX query tuning, and server tuning for specific workloads.

Download "Performance Tuning of Tabular Models in SQL Server 2012 Analysis Services" case study with detailed analysis:

http://msdn.microsoft.com/en-us/library/dn393915.aspx

Types of Dimensions
Types of Measures
Types of relationships between dimensions and measuregroups: None (IgnoreUnrelatedDimensions), Fact, Regular, Reference, Many to Many, Data Mining
Star Vs Snowflake schema and Dimensional modeling
Data storage modes - MOLAP, ROLAP, HOLAP
MDX Query syntax
Functions used commonly in MDX like Filter, Descendants, BAsc and others
Difference between EXISTS AND EXISTING, NON EMPTY keyword and function, NON_EMPTY_BEHAVIOR, ParallelPeriod, AUTOEXISTS
Difference between static and dynamic set
Difference between natural and unnatural hierarchy, attribute relationships
Difference between rigid and flexible relationships
Difference between attirubte hierarchy and user hierarchy
Dimension, Hierarchy, Level, and Members
Difference between database dimension and cube dimension
Importance of CALCULATE keyword in MDX script, data pass and limiting cube space
Effect of materialize
Partition processing and Aggregation Usage Wizard
Perspectives, Translations, Linked Object Wizard
Handling late arriving dimensions / early arriving facts
Proactive caching, Lazy aggregations
Partition processing options
Role playing Dimensions, Junk Dimensions, Conformed Dimensions, SCD and other types of dimensions
Parent Child Hierarchy, NamingTemplate property, MemberWithLeafLevelData property
Cube performance, MDX performance
How to pass parameter in MDX
SSAS 2005 vs SSAS 2008
Dimension security vs Cell security
SCOPE statement, THIS keyword, SUBCUBE
CASE (CASE, WHEN, THEN, ELSE, END) statement, IF THEN END IF, IS keyword, HAVING clause
CELL CALCULATION and CONDITION clause
RECURSION and FREEZE statement
Common types of errors encountered while processing a dimension / measure groups / cube
Logging and monitoring MDX scripts and cube performance

What is the difference between SSAS 2005 and SSAS2008?

In 2005 its not possible to create an empty cube but in 2008 we can create an empty cube.
A new feature in Analysis Services 2008 is the Attribute Relationships tab in the Dimension Designer . to implement attribute relationship is complex in ssas 2005
we can create ONLY 2000 partitions per Measure Group in ssas 2005 and the same limit of partitions is removed in ssas 2008.

You can answer more but if you end this with these then the interviewer feel that you are REAL EXPERIENCED.

What is datawarehouse in short DWH?

The datawarehouse is an informational environment that

Provides an integrated and total view of the enterprise
Makes the enterprise’s current and historical information easily available for decision making
Makes decision-support transactions possible without hindering operational systems
Renders the organization’s information consistent
Presents a flexible and interactive source of strategic information

OR a warehouse is a

Subject oriented
Integrated
Time variant
Non volatile for doing decision support

Collection of data in support of management’s decision making process”. He defined the terms in the sentence as follows.

Subject oriented:

It define the specific business domain ex: banking, retail, insurance, etc…..

Integrated:

It should be in a position to integrated data from various source systems

Ex: sql,oracle,db2 etc……

Time variant:

It should be in a position to maintain the data the various time periods.

Non volatile:

Once data is inserted it can’t be changed

What is data mart?

A data mart is a subset of an organizational data store, usually oriented to a specific purpose or major data subject that may be distributed to support business needs. Data marts are analytical data stores designed to focus on specific business functions for a specific community within an organization.

Data marts are often derived from subsets of data in a data warehouse, though in the bottom-up data warehouse design methodology the data warehouse is created from the union of organizational data marts.

They are 3 types of data mart they are

Dependent
Independent
Logical data mart

What are the difference between data mart and data warehouse?

Datawarehouse is complete data where as Data mart is Subset of the same.

Ex:

All the organisation data may related to finance department, HR, banking dept are stored in data warehouse where as in data mart only finance data or HR department data will be stored. So data warehouse is a collection of different data marts.

Have you ever worked on performance tuning, if yes what are the steps involved in it?

We need to identify the bottlenecks to tune the performance, to overcome the bottleneck we need to following the following.

Avoid named queries
Unnecessary relationships between tables
Proper attribute relationships to be given
Proper aggregation design
Proper partitioning of data
Proper dimension usage design
Avoid unnecessary many to many relationships
Avoid unnecessary measures
Set AttributeHierarchyEnabled = FALSE to Attributes that is not required
Won’t take even single measure which is not necessary.

What are the difficulties faced in cube development?

This question is either to test whether you are really experienced or when he doesnot have any questions to ask ..

You can tell any area where you feel difficult to work. But always the best answers will be the following.

Giving attribute relationships
Calculations
Giving dimension usage (many to many relationship)
Analyzing the requirements

Explain the flow of creating a cube?

Steps to create a cube in ssas

Create a data source.
Create a datasource view.
Create Dimensions
Create a cube.
Deploy and Process the cube.

What is a datasource or DS?

The data source is the Physical Connection information that analysis service uses to connect to the database that host the data. The data source contains the connection string which specifies the server and the database hosting the data as well as any necessary authentication credentials.

What is datasourceview or DSV?

A data source view is a persistent set of tables from a data source that supply the data for a particular cube. BIDS also includes a wizard for creating data source views, which you can invoke by right-clicking on the Data Source Views folder in Solution Explorer.

Datasource view is the logical view of the data in the data source.
Data source view is the only thing a cube can see.

What is named calculation?

A named calculation is a SQL expression represented as a calculated column. This expression appears and behaves as a column in the table. A named calculation lets you extend the relational schema of existing tables or views in a data source view without modifying the tables or views in the underlying data source.

Named calculation is used to create a new column in the DSV using hard coded values or by using existing columns or even with both.

What is named query?

Named query in DSV is similar to View in Database. This is used to create Virtual table in DSV which will not impact the underlying database. Named query is mainly used to merge the two or more table in the datasource view or to filter columns of a table.

Why we need named queries?

A named query is used to join multiple tables, to remove unnecessary columns from a table of a database. You can achieve the same in database using Views but this Named Queries will be the best bet whe you don’t have access to create Views in database.

How will you add a new column to an existing table in data source view?

By using named calculations we can add a new column to an existing table in the data source view. Named Calculation is explained above.

What is dimension table?

A dimension table contains hierarchical data by which you’d like to summarize. A dimension table contains specific business information, a dimension table that contains the specific name of each member of the dimension. The name of the dimension member is called an “attribute”

The key attribute in the dimension must contain a unique value for each member of the dimension. This key attribute is called “primary key column”

The primary key column of each dimension table corresponding to the one of the key column in any related fact table.

What is fact table?

A fact table contains the basic information that you wish to summarize. The table that stores the detailed value for measure is called fact table. In simple and best we can define as “The table which contains METRICS” that are used to analyse the business.

It consists of 2 sections

1) Foregine key to the dimesion

2) measures/facts(a numerical value that used to monitor business activity)

What is Factless fact table?

This is very important interview question. The “Factless Fact Table” is a table which is similar to Fact Table except for having any measure; I mean that this table just has the links to the dimensions. These tables enable you to track events; indeed they are for recording events.

Factless fact tables are used for tracking a process or collecting stats. They are called so because, the fact table does not have aggregatable numeric values or information. They are mere key values with reference to the dimensions from which the stats can be collected

What is attribute relationships, why we need it?

Attribute relationships are the way of telling the analysis service engine that how the attributes are related with each other. It will help to relate two or more attributes to each other.Processing time will be decreased if proper relationships are given. This increases the Cube Processing performance and MDX query performance too.

In Microsoft SQL Server Analysis Services, attributes within a dimension are always related either directly or indirectly to the key attribute. When you define a dimension based on a star schema, which is where all dimension attributes are derived from the same relational table, an attribute relationship is automatically defined between the key attribute and each non-key attribute of the dimension. When you define a dimension based on a snowflake schema, which is where dimension attributes are derived from multiple related tables, an attribute relationship is automatically defined as follows:

Between the key attribute and each non-key attribute bound to columns in the main dimension table.
Between the key attribute and the attribute bound to the foreign key in the secondary table that links the underlying dimension tables.
Between the attribute bound to foreign key in the secondary table and each non-key attribute bound to columns from the secondary table.

How many types of attribute relationships are there?

They are 2 types of attribute relationships they are

Rigid
Flexible

Rigid: In Rigid relationships where the relationship between the attributes is fixed, attributes will not change levels or their respective attribute relationships.

Example: The time dimension. We know that month “January 2009″ will ONLY belong to Year “2009″ and it wont be moved to any other year.

Flexible : In Flexible relationship between the attributes is changed.

Example: An employee and department. An employee can be in accounts department today but it is possible that the employee will be in Marketing department tomorrow.

How many types of dimensions are there and what are they?

They are 3 types of dimensions:

confirm dimension
junk dimension
degenerate attribute

What are confirmed dimensions, junk dimension and degenerated dimensions?

Confirm dimension: It is the dimension which is sharable across the multiple facts or data model. This is also called as Role Playing Dimensions.

junk dimension: A number of very small dimensions might be lumped (a small irregularly shaped) together to form a single dimension, a junk dimension – the attributes are not closely related. Grouping of Random flags and text Attributes in a dimension and moving them to a separate sub dimension is known as junk dimension.

Degenerated dimension: In this degenerate dimension contains their values in fact table and the dimension id not available in dimension table. Degenerated Dimension is a dimension key without corresponding dimension.

Example: In the PointOfSale Transaction Fact table, we have:

Date Key (FK), Product Key (FK), Store Key (FK), Promotion Key (FP), and POS Transaction Number

Date Dimension corresponds to Date Key, Production Dimension corresponds to Production Key. In a traditional parent-child database, POS Transactional Number would be the key to the transaction header record that contains all the info valid for the transaction as a whole, such as the transaction date and store identifier. But in this dimensional model, we have already extracted this info into other dimension. Therefore, POS Transaction Number looks like a dimension key in the fact table but does not have the corresponding dimension table.

What are the types of database schema?

They are 3 types of database schema they are

Star
Snowflake
Starflake

What is star, snowflake and star flake schema?

Star schema: In star schema fact table will be directly linked with all dimension tables. The star schema’s dimensions are denormalized with each dimension being represented by a single table. In a star schema a central fact table connects a number of individual dimension tables.

Snowflake: The snowflake schema is an extension of the star schema, where each point of the star explodes into more points. In a star schema, each dimension is represented by a single dimensional table, whereas in a snowflake schema, that dimensional table is normalized into multiple lookup tables, each representing a level in the dimensional hierarchy. In snow flake schema fact table will be linked directly as well as there will be some intermediate dimension tables between fact and dimension tables.

Star flake: A hybrid structure that contains a mixture of star(denormalized) and snowflake(normalized) schema’s.

How will you hide an attribute?

We can hide the attribute by selecting “AttributeHierarchyVisible = False” in properties of the attribute.

How will you make an attribute not process?

By selecting “ AttributeHierarchyEnabled = False”, we can make an attribute not in process.

What is use of IsAggregatable property?

In Analysis Service we generally see all dimension has All member. This is because of IsAggregatable property of the attribute. You can set its value to false, so that it will not show All member. Its default member for that attribute. If you hide this member than you will have to set other attribute value to default member else it will pick some value as default and this will create confusion in browsing data if someone is not known to change in default member.

What are key, name and value columns of an attribute?

Key column of any attribute: Contains the column or columns that represent the key for the attribute, which is the column in the underlying relational table in the data source view to which the attribute is bound. The value of this column for each member is displayed to users unless a value is specified for the NameColumn property.

Name column of an attribute: Identifies the column that provides the name of the attribute that is displayed to users, instead of the value in the key column for the attribute. This column is used when the key column value for an attribute member is cryptic or not otherwise useful to the user, or when the key column is based on a composite key. The NameColumn property is not used in parent-child hierarchies; instead, the NameColumn property for child members is used as the member names in a parent-child hierarchy.

Value columns of an attribute: Identifies the column that provides the value of the attribute. If the NameColumn element of the attribute is specified, the same DataItem values are used as default values for the ValueColumn element. If the NameColumn element of the attribute is not specified and the KeyColumns collection of the attribute contains a single KeyColumn element representing a key column with a string data type, the same DataItem values are used as default values for the ValueColumn element.

What is hierarchy, what are its types and difference between them?

A hierarchy is a very important part of any OLAP engine and allows users to drill down from summary levels hierarchies represent the way user expect to explore data at more detailed level

hierarchies is made up of multipule levels creating the structure based on end user requirements.

->years->quarter->month->week ,are all the levels of calender hierarchy

They are 2 types of hierarchies they are

Natural hierarchy
Unnatural hierarchy

Natural hierarchy: This means that the attributes are intuitively related to one another. There is a clear relationship from the top of the hierarchy to the bottom.

Example: An example of this would be date: year, quarter and month follow from each other, and in part, define each other.

Unnatural hierarchy: This means that the attributes are not clearly related.

Example: An example of this might be geography; we may have country -> state -> city, but it is not clear where Province might sit.

What is Attribute hierarchy?

An attribute hierarchy is created for every attribute in a dimension, and each hierarchy is available for dimensioning fact data. This hierarchy consists of an “All” level and a detail level containing all members of the hierarchy.

you can organize attributes into user-defined hierarchies to provide navigation paths in a cube. Under certain circumstances, you may want to disable or hide some attributes and their hierarchies.

What is use of AttributeHierarchyDisplayFolder property ?

AttributeHierarchyDisplayFolder: Identifies the folder in which to display the associated attribute hierarchy to end users. For example if I set the property value as “Test” to all the Attributes of a dimension then a folder with the name “Test” will be created and all the Attributes will be placed into the same.

What is use of AttributeHierarchyEnabled?

AttributeHierarchyEnabled: Determines whether an attribute hierarchy is generated by Analysis Services for the attribute. If the attribute hierarchy is not enabled, the attribute cannot be used in a user-defined hierarchy and the attribute hierarchy cannot be referenced in Multidimensional Expressions (MDX) statements.

What is use of AttributeHierarchyOptimizedState?

AttributeHierarchyOptimizedState: Determines the level of optimization applied to the attribute hierarchy. By default, an attribute hierarchy is FullyOptimized, which means that Analysis Services builds indexes for the attribute hierarchy to improve query performance. The other option, NotOptimized, means that no indexes are built for the attribute hierarchy. Using NotOptimized is useful if the attribute hierarchy is used for purposes other than querying, because no additional indexes are built for the attribute. Other uses for an attribute hierarchy can be helping to order another attribute.

What is use of AttributeHierarchyOrdered ?

AttributeHierarchyOrdered: Determines whether the associated attribute hierarchy is ordered. The default value is True. However, if an attribute hierarchy will not be used for querying, you can save processing time by changing the value of this property to False.

What is the use of AttributeHierarchyVisible ?

AttributeHierarchyVisible : Determines whether the attribute hierarchy is visible to client applications. The default value is True. However, if an attribute hierarchy will not be used for querying, you can save processing time by changing the value of this property to False.

What are types of storage modes?

There are three standard storage modes in OLAP applications

MOLAP
ROLAP
HOLAP

Compare the Three Storage Modes ?

Summary and comparison

Basic Storage Mode	Storage Location for Detail Data	Storage Location for Summary/ Aggregations	Storage space requirement	Query Response Time	Processing Time	Latency
MOLAP	Multidimensional Format	Multidimensional Format	MediumBecause detail data is stored in compressed format.	Fast	Fast	High
HOLAP	Relational Database	Multidimensional Format	Small	Medium	Fast	Medium
ROLAP	Relational Database	Relational Database	Large	Slow	Slow	Low

What is MOLAP and its advantage?

MOLAP (Multi dimensional Online Analytical Processing) : MOLAP is the most used storage type. Its designed to offer maximum query performance to the users. the data and aggregations are stored in a multidimensional format, compressed and optimized for performance. This is both good and bad. When a cube with MOLAP storage is processed, the data is pulled from the relational database, the aggregations are performed, and the data is stored in the AS database. The data inside the cube will refresh only when the cube is processed, so latency is high.

Advantages:

Since the data is stored on the OLAP server in optimized format, queries (even complex calculations) are faster than ROLAP.
The data is compressed so it takes up less space.
And because the data is stored on the OLAP server, you don’t need to keep the connection to the relational database.
Cube browsing is fastest using MOLAP.

What is ROLAP and its advantage?

ROLAP (Relational Online Analytical Processing) : ROLAP does not have the high latency disadvantage of MOLAP. With ROLAP, the data and aggregations are stored in relational format. This means that there will be zero latency between the relational source database and the cube.

Disadvantage of this mode is the performance, this type gives the poorest query performance because no objects benefit from multi dimensional storage.

Advantages:

Since the data is kept in the relational database instead of on the OLAP server, you can view the data in almost real time.
Also, since the data is kept in the relational database, it allows for much larger amounts of data, which can mean better scalability.
Low latency.

What is HOLAP and its advantage?

Hybrid Online Analytical Processing (HOLAP): HOLAP is a combination of MOLAP and ROLAP. HOLAP stores the detail data in the relational database but stores the aggregations in multidimensional format. Because of this, the aggregations will need to be processed when changes are occur. With HOLAP you kind of have medium query performance: not as slow as ROLAP, but not as fast as MOLAP. If, however, you were only querying aggregated data or using a cached query, query performance would be similar to MOLAP. But when you need to get that detail data, performance is closer to ROLAP.

Advantages:

HOLAP is best used when large amounts of aggregations are queried often with little detail data, offering high performance and lower storage requirements.
Cubes are smaller than MOLAP since the detail data is kept in the relational database.
Processing time is less than MOLAP since only aggregations are stored in multidimensional format.
Low latency since processing takes place when changes occur and detail data is kept in the relational database.

What are Translations and its use?

Translation: The translation feature in analysis service allows you to display caption and attributes names that correspond to a specific language. It helps in providing GLOBALIZATION to the Cube.

What is Database dimension?

All the dimensions that are created using NEW DIMENSION Wizard are database dimensions. In other words, the dimensions which are at Database level are called Database Dimensions.

What is Cube dimension?

A cube dimension is an instance of a database dimension within a cube is called as cube dimension. A database dimension can be used in multiple cubes, and multiple cube dimensions can be based on a single database dimension

Difference between Database dimension and Cube dimension?

The Database dimension has only Name and ID properties, whereas a Cube dimension has several more properties.
Database dimension is created one where as Cube dimension is referenced from database dimension.
Database dimension exists only once.where as Cube dimensions can be created more than one using ROLE PLAYING Dimensions concept.

How will you add a dimension to cube?

To add a dimension to a cube follow these steps.

In Solution Explorer, right-click the cube, and then click View Designer.

In the Design tab for the cube, click the Dimension Usage tab.
Either click the Add Cube Dimension button, or right-click anywhere on the work surface and then click Add Cube Dimension.
In the Add Cube Dimension dialog box, use one of the following steps:
To add an existing dimension, select the dimension, and then click OK.
To create a new dimension to add to the cube, click New dimension, and then follow the steps in the Dimension Wizard.

What is SCD (slowly changing dimension)?

Slowly changing dimensions (SCD) determine how the historical changes in the dimension tables are handled. Implementing the SCD mechanism enables users to know to which category an item belonged to in any given date.

What are types of SCD?

It is a concept of STORING Historical Changes and when ever an IT guy finds a new way to store then a new Type will come into picture. Basically there are 3 types of SCD they are given below

SCD type1
SCD type2
SCD type3

What is Type1, Type2, Type3 of SCD?

Type 1: In Type 1 Slowly Changing Dimension, the new information simply overwrites the original information. In other words, no history is kept.

In our example, recall we originally have the following table:

Customer Key	Name	State
1001	Christina	Illinois

After Christina moved from Illinois to California, the new information replaces the new record, and we have the following table:

Customer Key	Name	State
1001	Christina	California

Advantages: This is the easiest way to handle the Slowly Changing Dimension problem, since there is no need to keep track of the old information.

Disadvantages: All history is lost. By applying this methodology, it is not possible to trace back in history.

Usage: About 50% of the time.

When to use Type 1: Type 1 slowly changing dimension should be used when it is not necessary for the data warehouse to keep track of historical changes.

Type 2: In Type 2 Slowly Changing Dimension, a new record is added to the table to represent the new information. Therefore, both the original and the new record will be present. The new record gets its own primary key.

In our example, recall we originally have the following table:

Customer Key	Name	State
1001	Christina	Illinois

After Christina moved from Illinois to California, we add the new information as a new row into the table:

Customer Key	Name	State
1001	Christina	Illinois
1005	Christina	California

Advantages: This allows us to accurately keep all historical information.

Disadvantages:

This will cause the size of the table to grow fast. In cases where the number of rows for the table is very high to start with, storage and performance can become a concern.
This necessarily complicates the ETL process.

Usage: About 50% of the time.

Type3 : In Type 3 Slowly Changing Dimension, there will be two columns to indicate the particular attribute of interest, one indicating the original value, and one indicating the current value. There will also be a column that indicates when the current value becomes active.

In our example, recall we originally have the following table:

Customer Key	Name	State
1001	Christina	Illinois

To accommodate Type 3 Slowly Changing Dimension, we will now have the following columns:

Customer Key,Name,OriginalState,CurrentState,Effective Date

After Christina moved from Illinois to California, the original information gets updated, and we have the following table (assuming the effective date of change is January 15, 2003):

Customer Key	Name	OriginalState	CurrentState	Effective Date
1001	Christina	Illinois	California	15-JAN-2003

Advantages:

This does not increase the size of the table, since new information is updated.
This allows us to keep some part of history.

Disadvantages: Type 3 will not be able to keep all history where an attribute is changed more than once. For example, if Christina later moves to Texas on December 15, 2003, the California information will be lost.

Usage: Type 3 is rarely used in actual practice.

What is role playing dimension with two examples?

Role play dimensions: We already discussed about this. This is nothing but CONFIRMED Dimensions. A dimension can play different role in a fact table you can recognize a roleplay dimension when there are multiple columns in a fact table that each have foreign keys to the same dimension table.

Ex1: There are three dimension keys in the factinternalsales,factresellersales tables which all refer to the dimtime table,the same time dimension is used to track sales by that contain either of these fact table,the corresponding role-playing dimension are automatically added to the cube.

Ex2 : In retail banking, for checking account cube we could have transaction date dimension and effective date dimension. Both dimensions have date, month, quarter and year attributes. The formats of attributes are the same on both dimensions, for example the date attribute is in ‘dd-mm-yyyy’ format. Both dimensions have members from 1993 to 2010.

What is measure group, measure?

Measure groups : These measure groups can contain different dimensions and be at different granularity but so long as you model your cube correctly, your users will be able to use measures from each of these measure groups in their queries easily and without worrying about the underlying complexity.

Creating multiple measure groups : To create a new measure group in the Cube Editor, go to the Cube Structure tab and right-click on the cube name in the Measures pane and select ‘New Measure Group’. You’ll then need to select the fact table to create the measure group from and then the new measure group will be created; any columns that aren’t used as foreign key columns in the DSV will automatically be created as measures, and you’ll also get an extra measure of aggregation type Count. It’s a good idea to delete any measures you are not going to use at this stage.

Measures : Measures are the numeric values that our users want to aggregate, slice, dice and otherwise analyze, and as a result, it’s important to make sure they behave the way we want them to. One of the fundamental reasons for using Analysis Services is that, unlike a relational database it allows us to build into our cube design business rules about measures: how they should be formatted, how they should aggregate up, how they interact with specific dimensions and so on.

What is attribute?

An attribute is a specification that defines a property of an object, element, or file. It may also refer to or set the specific value for a given instance of such.

What is surrogate key?

A surrogate key is the SQL generated key which acts like an alternate primary key for the table in database, Data warehouses commonly use a surrogate key to uniquely identify an entity. A surrogate is not generated by the user but by the system. A primary difference between a primary key and surrogate key in few databases is that primarykey uniquely identifies a record while a Surrogatekey uniquely identifies an entity.

Ex: An employee may be recruited before the year 2000 while another employee with the same name may be recruited after the year 2000. Here, the primary key will uniquely identify the record while the surrogate key will be generated by the system (say a serial number) since the SK is NOT derived from the data.

How many types of relations are there between dimension and measure group?

They are six relation between the dimension and measure group, they are

No Relationship
Regular
Refernce
Many to Many
Data Mining
Fact

What is regular type, no relation type, fact type, referenced type, many-to-many type with example?

No relationship: The dimension and measure group are not related.

Regular: The dimension table is joined directly to the fact table.

Referenced: The dimension table is joined to an intermediate table, which in turn,is joined to the fact table.

Many to many:The dimension table is to an intermediate fact table,the intermediate fact table is joined , in turn, to an intermediate dimension table to which the fact table is joined.

Data mining:The target dimension is based on a mining model built from the source dimension. The source dimension must also be included in the cube.

Fact table: The dimension table is the fact table.

What are calculated members and what is its use?

Calculations are item in the cube that are eveluated at runtime

Calculated members: You can create customized measures or dimension members, called calculated members, by combining cube data, arithmetic operators, numbers, and/or functions.

Example: You can create a calculated member called Marks that converts dollars to marks by multiplying an existing dollar measure by a conversion rate. Marks can then be displayed to end users in a separate row or column. Calculated member definitions are stored, but their values exist only in memory. In the preceding example, values in marks are displayed to end users but are not stored as cube data.

What are KPIs and what is its use?

In Analysis Services, a KPI is a collection of calculations that are associated with a measure group in a cube that are used to evaluate business success. We use KPI to see the business at the particular point, this is represents with some graphical items such as traffic signals,ganze etc

What are actions, how many types of actions are there, explain with example?

Actions are powerful way of extending the value of SSAS cubes for the end user. They can click on a cube or portion of a cube to start an application with the selected item as a parameter, or to retrieve information about the selected item.

One of the objects supported by a SQL Server Analysis Services cube is the action. An action is an event that a user can initiate when accessing cube data. The event can take a number of forms. For example, a user might be able to view a Reporting Services report, open a Web page, or drill through to detailed information related to the cube data

Analysis Services supports three types of actions..

Report action: Report action Returns a Reporting Services report that is associated with the cube data on which the action is based.

Drill through: Drillthrough Returns a result set that provides detailed information related to the cube data on which the action is based.

Standard: Standard has five action subtypes that are based on the specified cube data.

Dataset: Returns a mutlidimensional dataset.

Proprietary: Returns a string that can be interpreted by a client application.

Rowset: Returns a tabular rowset.

Statement: Returns a command string that can be run by a client application.

URL: Returns a URL that can be opened by a client application, usually a browser.

What is partition, how will you implement it?

You can use the Partition Wizard to define partitions for a measure group in a cube. By default, a single partition is defined for each measure group in a cube. Access and processing performance, however, can degrade for large partitions. By creating multiple partitions, each containing a portion of the data for a measure group, you can improve the access and processing performance for that measure group.

What is the minimum and maximum number of partitions required for a measure group?

In 2005 a MAX of 2000 partitions can be created per measure group and that limit is lifted in later versions.

In any version the MINIMUM is ONE Partition per measure group.

What are Aggregations and its use?

Aggregations provide performance improvements by allowing Microsoft SQL Server Analysis Services (SSAS) to retrieve pre-calculated totals directly from cube storage instead of having to recalculate data from an underlying data source for each query. To design these aggregations, you can use the Aggregation Design Wizard. This wizard guides you through the following steps:

Selecting standard or custom settings for the storage and caching options of a partition, measure group, or cube.
Providing estimated or actual counts for objects referenced by the partition, measure group, or cube.
Specifying aggregation options and limits to optimize the storage and query performance delivered by designed aggregations.
Saving and optionally processing the partition, measure group, or cube to generate the defined aggregations.
After you use the Aggregation Design Wizard, you can use the Usage-Based Optimization Wizard to design aggregations based on the usage patterns of the business users and client applications that query the cube.

What is perspective, have you ever created perspective?

Perspectives are a way to reduce the complexity of cubes by hidden elements like measure groups, measures, dimensions, hierarchies etc. It’s nothing but slicing of a cube, for ex we are having retail and hospital data and end user is subscribed to see only hospital data, then we can create perspective according to it.

What is deploy, process and build?

Bulid: Verifies the project files and create several local files.

Deploy: Deploy the structure of the cube(Skeleton) to the server.

Process: Read the data from the source and build the dimesions and cube structures

Elaborating the same is given below.

Build: Its is a used to process the data of the cube database. Build is a version of a program. As a rule, a build is a pre-release version and as such is identified by a build number, rather than by a release number. Reiterative (repeated) builds are an important part of the development process. Throughout development, application components are collected and repeatedly compiled for testing purposes, to ensure a reliable final product. Build tools, such as make or Ant, enable developers to automate some programming tasks. As a verb, to build can mean either to write code or to put individual coded components of a program together.

Deployment: During development of an Analysis Services project in Business Intelligence Development Studio, you frequently deploy the project to a development server in order to create the Analysis Services database defined by the project. This is required to test the project.

for example, to browse cells in the cube, browse dimension members, or verify key performance indicators (KPIs) formulas.

What is the maximum size of a dimension?

The maximum size of the dimension is 4 gb.

What are the types of processing and explain each?

They are 6 types of processing in ssas ,they are

Process Full
Process Data
Process Index
Process Incremental
Process Structure
UnProcess

Process Full: Processes an Analysis Services object and all the objects that it contains. When Process Full is executed against an object that has already been processed, Analysis Services drops all data in the object, and then processes the object. This kind of processing is required when a structural change has been made to an object, for example, when an attribute hierarchy is added, deleted, or renamed. This processing option is supported for cubes, databases, dimensions, measure groups, mining models, mining structures, and partitions.

Process Data: Processes data only without building aggregations or indexes. If there is data is in the partitions, it will be dropped before re-populating the partition with source data. This processing option is supported for dimensions, cubes, measure groups, and partitions.

Process Index: Creates or rebuilds indexes and aggregations for all processed partitions. This option causes an error on unprocessed objects. This processing option is supported for cubes, dimensions, measure groups, and partitions.

Process Increment: Adds newly available fact data and process only to the relevant partitions. This processing option is supported for measure groups, and partitions.

Process Structure: If the cube is unprocessed, Analysis Services will process, if it is necessary, all the cube’s dimensions. After that, Analysis Services will create only cube definitions. If this option is applied to a mining structure, it populates the mining structure with source data. The difference between this option and the Process Full option is that this option does not iterate the processing down to the mining models themselves. This processing option is supported for cubes and mining structures.

Unprocess : Drops the data in the object specified and any lower-level constituent objects. After the data is dropped, it is not reloaded. This processing option is supported for cubes, databases, dimensions, measure groups, mining models, mining structures, and partitions.

Process Default: Detects the process state of an object, and performs processing necessary to deliver unprocessed or partially processed objects to a fully processed state. This processing option is supported for cubes, databases, dimensions, measure groups, mining models, mining structures, and partitions.

What is a cube?

The basic unit of storage and analysis in Analysis Services is the cube. A cube is a collection of data that’s been aggregated to allow queries to return data quickly.

For example, a cube of order data might be aggregated by time period and by title, making the cube fast when you ask questions concerning orders by week or orders by title.

What is AMO?

The full form of AMO is Analysis Managament Objects. This is used to create or alter cubes from .NET code.

After creating the cube, if we added a new column to the OLTP table then how you add this new attribute to the cube?

Just open the datasourceview and on right click we find the option REFRESH. Click the REFRESH then it will add new attributes to the table which can be added to Cube.

REAL TIME INTERVIEW QUESTIONS -

What is the size of the Cube in your last Project?

Answer to this question varies from project to project and mainly depends on how BIG is your database and how COMPLEX the database design is. Generally for the database with a TRANSACTION TABLE of 50 crore records, the cube size will be around 100GB. So, better go with 100GB as answer to this question.

What is size of the database in your last Project?

You can expect this question immediately after you answer 100GB to the last question. The database size will be 600 to 800GB for which the cube will come to 100 GB. So go with 800GB for this question.

What is size of the fact(Transaction) table in your last Project?

This will be the next question if you answer 800GB as your dataabase size. Here he is not expecting SIZE in GBs but the interviewer will be expecting NUMBER OF ROWS in the Transaction table. Go with 57Crore records for this question.

How frequently you process the cube?

You have to be very careful here. Frequency of processing cube depends on HOW FREQUENTLY YOU ARE GETTING NEW DATA. Once the new data comes then SSIS team loads it and send a mail to SSAS team after load is completed successfully. Once SSAS team receives the mail then these guys will look for best time to PROCESS.

Typically we get data either Weekly or Monthly. So you can say that the processing of the cube will be done either Weekly or monthly.

How frequently you get DATA from clients?

This answer should be based on your last answer. IF you answered WEEKLY to last question then the Answer to this question also should be WEEKLY. IF MONTHLY for last question then this answer also should be MONTHLY.

What type of Processing Options you used to process the cube in your Project?

This is the toughest question to answer. This depends on DATA you have and CLIENTS requirements. Let me explain here.

If the database is SMALL, let’s say it has only 1 crore records then people do FULL PROCESS as it wont take much time.
If the database is MEDIUM, let’s say it has only 15 crore records then people prefer to do INCREMENTAL PROCESS unless CLIENTS ask us to do FULL PROCESS as it takes little bit of time.
If the database is HUGE, let’s say it has more than 35 to 40 crore records then people prefer to do INCREMENTAL PROCESS unless CLIENTS ask us to do FULL PROCESS as it takes lot of time. In this case we TRY to convince clients for INCREMENTAL and if they don’t agree then we don’t have any other option.
Incremental process will come into picture ONLY when there is no updates to the OLD data i.e no changes to already existing data else NO OTHER OPTION than FULL PROCESS.

How you provide security to cube?

By defining roles we provide security to cubes. Using roles we can restrict users from accessing restricted data. Procedure as follows -

Define Role
Set Permission
Add appropriate Users to the role

How you move the cube from one server to another?

There are many ways to do the same. Let me explain four here and cleverly you can say “I worked on 4 SSAS projects till date and implemented different types in all the four.”

Backup and restore – This is the simplest way. Take the Backup from development server and copy the backup to FTP folder of clients. After doing this drop a mail to Client’s Admin and he will take care of RESTORE part.
Directly PROCESS the cube in PRODUCTION environment. For this you need access to Production which will not be given by clients unless the clients are *********. One of the client I worked for given FULL access to me ..
Under Srart –> All Programs –> Sql Server –> Analysis Services you can see deployment wizard. This is one way of moving the cube. This method has some steps to follow. First deploy your cube and FOUR files will be created in BIN folder of PROJECT folder. Copy those FOUR files and paste in Production server in any directory. Then OPEN this DEPLOYMENT Wizard in production and when it ask for Database file then point to the location where you copied the files. After that NEXT,NEXT,NEXT … OK .. Cube will be deployed and processed.
This way is most beautiful one. Synchronization, In this we will first deploy and process the cube in STAGING ENVIRONMENT and then we will go to production server. Connect to Analysis services in SSMS and select Synchronize by right clicking on Databases folder in SSMS of analysis services. Then select source as STAGING SERVER and then click on OK. The changes in the cube present in the Staging server will be copied to the production server.

What is the toughest challenge you face in your Project?

There are couple of this where we face difficulty.

While working on RELATIONSHIPS between Measure Groups and Dimensions.
Working on Complex calculations
Performance tuning

How you created Partitions of the cube in your Last Project?

Partitions can be created on different data. Few people do it on PRODUCT NAME wise and many prefer to do it on DATE data wise. you go with DATE wise.

In dates, we can create MONTH wise,WEEK wise,QUARTER wise and some times YEAR wise. This all depends on how much data you are coming per WEEK or MONTH or QUARTER … If you are getting 50 lakhs records per month then tell you do MONTH wise.

How many dimensions in your last cube?

47 to 50.

How many measure groups in your last cube?

Total 10 and in that 4 are Fact tables and remaining 6 are Fact less fact tables.

What is the Schema of your last cube?

Snowflake

Why not STAR Schema ?

My data base design doesn’t support STAR Schema.

What are the different relationships that you are used in your cube?

Regular
Referenced
Many to Many
Fact
No Relationship

Have you created the KPI’s , If then Explain?

Don’t add much to this as the questions in this will be tricky. Just tell that you worked on couple of KPI and you have basic knowledge on this. (Don’t worry, this is not MANDATORY)

How you define Aggregations in your Project?

We defined the aggregations for MOST FREQUENTLY USED data in SSRS reports.

Size of SSAS team in your last Project?

Just 2 guys as we guys are really in demand and lot of scarcity:)

How many Resources worked on same Cube in your Project?

Only 2 and one in morning shift and another in Evening shift.

How much time it take to Process the Cube?

This is Very very important question. This again depends on the SIZE of database,Complexity of the database and your server settings. For database with 50 cr transaction records, it generally takes 3.5 hrs.

How many Calculation you done in Your Project?

I answer more than 5000 and if you tell the same then you are caught unless you are super good in MDX. Best answer for you is “Worked on 50 calculations”.

Q: What is a RAGGED hierarchy?
Under normal circumstances, each level in a hierarchy in Microsoft SQL Server 2005 Analysis Services (SSAS) has the same number of members above it as any other member at the same level. In a ragged hierarchy, the logical parent member of at least one member is not in the level immediately above the member. When this occurs, the hierarchy descends to different levels for different drilldown paths. Expanding through every level for every drilldown path is then unnecessarily complicated.

Q: What are the roles of an Analysis Services Information Worker?

Q: What is processing?
Processing is a critical and resource intensive operation in the data warehouse lifecycle and needs to be carefully optimized and executed. Analysis Services 2005 offers a high performance and scalable processing architecture with a comprehensive set of controls for database administrators.
We can process an OLAP database, individual cube, Dimension or a specific Partition in a cube.

Q: Name few Business Analysis Enhancements for SSAS?
The following table lists the business intelligence enhancements that are available in Microsoft SQL Server 2005 Analysis Services (SSAS). The table also shows the cube or dimension to which each business intelligence enhancement applies, and indicates whether an enhancement can be applied to an object that was created without using a data source and for which no schema has been generated.

*Enhancement*	*Type*	*Applied to*	No data source
Time Intelligence	Cube	Cube	No
Account Intelligence	Dimension	Dimension or cube	No
Dimension Intelligence	Dimension	Dimension or cube	Yes
Custom Aggregation	Dimension	Dimension (unary operator) or cube	No
Semiadditive Behavior	Cube	Cube	Yes>
Custom Member Formula	Dimension	Dimension or cube	No
Custom Sorting and Uniqueness Settings	Dimension	Dimension or cube	Yes
Dimension Writeback	Dimension	Dimension or cube	Yes

Q: How do I find the bottom 10 customers with the lowest sales in 2003 that were not null?

A: Simply using bottomcount will return customers with null sales. You will have to combine it with NONEMPTY or FILTER.

SELECT { [Measures].[Internet Sales Amount] } ON COLUMNS ,
BOTTOMCOUNT(
NONEMPTY(DESCENDANTS( [Customer].[Customer Geography].[All Customers]
, [Customer].[Customer Geography].[Customer] )
, ( [Measures].[Internet Sales Amount] ) )
, 10
, ( [Measures].[Internet Sales Amount] )
) ON ROWS
FROM [Adventure Works]
WHERE ( [Date].[Calendar].[Calendar Year].&[2003] ) ;

1.What is Analysis Services
1.Microsoft SQL Server 2012 Analysis Services (SSAS) delivers online analytical processing (OLAP) and data mining functionality for business intelligence applications.
2.Analysis Services supports OLAP by letting we design, create, and manage multidimensional structures that contain data aggregated from other data sources, such as relational databases.
3.For data mining applications, Analysis Services lets we design, create, and visualize data mining models that are constructed from other data sources by using a wide variety of industry-standar data mining algorithms

2.What are the features of Analysis Services

1.Ease of use with a lot of wizards and designers.
2.Flexible data model creation and management
3.Scalable architecture to handle OLAP
4.Provides integration of administration tools, data sources, security, caching, and reporting etc.

5.Provides extensive support for custom applications

3.What is the need for SSAS 2012 component?

1.Analysis Services is the only component in SQL Server using which we can perform Analysis and Forecast operations
2.Faster Analysis and Troubleshooting.
3.Ability to create and manage Data warehouses
4.Apply efficient Security Principles

4. What languages are used in SSAS 2012

1.Structured Query Language (SQL)
2.Multidimensional Expressions (MDX) - an industry standard query language orientated towards analysis
3.Data Mining Extensions (DMX) - an industry standard query language oriented toward data mining.
4.Analysis Services Scripting Language (ASSL) - used to manage Analysis Services database objects

5.What is the difference between a derived measure and a calculated measure?

1. The difference between a derived measure and a calculated measure is when the calculation is performed.

2. A derived measure is calculated before aggregations are created, and the values of the derived measure are stored in the cube.

3. A calculated measure is calculated after aggregations are created, and the values of a calculated measure aren’t stored in the cube.

4. The primary criterion for choosing between a derived measure and a calculated measure is not efficiency, but accuracy

6. What is FASMI ?

A database is called a OLAP Database if the database satisfies the FASMI rules

1. Fast Analysis– is defined in the OLAP scenario in five seconds or less.

2. Shared – Must support access to data by many users in the factors of Sensitivity and Write Backs.

3. Multidimensional – The data inside the OLAP Database must be multidimensional in structure.

4. Information – The OLAP database Must support large volumes of data.

7.What is a partition?

1. A partition in Analysis Services is the physical location of stored cube data.

2. Every cube has at least one partition by default. Each time we create a measure group, another partition is created.

3. Queries run faster against a partitioned cube because Analysis Services only needs to read data from the partitions that contain the answers to the queries.

4. Queries run even faster when partition also stores aggregations, the pre calculated totals for additive measures. Partitions are a powerful and flexible means of managing cubes, especially large cubes

8. What are the roles of an Analysis Services Information Worker?

The role of an Analysis Services information worker is the traditional "domain expert" role in business intelligence (BI) someone who understands the data employed by a solution and is able to translate the data into business information.The role of an Analysis Services information worker often has one of the following job titles: Business Analyst (Report Consumer), Manager (Report Consumer), Technical Trainer, Help Desk/Operation, or Network Administrator

9.Explain the TWO-Tier Architecture of SSAS?

1. SSAS uses both server and client components to supply OLAP and data mining functionality BI Applications.

2. The server component is implemented as a Microsoft Windows service. Each instance of Analysis Services implemented as a separate instance of the Windows service.

3. Clients communicate with Analysis Services using the standard the XMLA (XML For Analysis) , protocol for issuing commands and receiving responses, exposed as a web service

10. How many types of dimension are possible in SSAS?

1. Bill of Materials

2. Currency

3. Channel

4. Customer

5. Geography

6. Organizations

7. Products

8. promotion

9. Regular

10. Scenario

11. Time

12. Unary

13. Accounts

1. Define Data Warehouse.
Ans. A data warehouse is a relational database that is designed for query and analysis rather than for transaction processing. It usually contains historical data derived from transaction data, but it can include data from other sources.

2. Which are three layers in which data warehouse maintains it’s functions?
Ans. A data warehouse maintains its functions in staging, integration and access.

3. Can you elaborate it?
Ans. Staging: It is used to store raw data for use by developers.

Integration: It is used to integrate data and to have a level of abstraction from users.

Access: It is for accessing data out of data warehouse for it’s users.

4. What is Data Mart?
Ans. A Data Mart is the access layer of the data warehouse environment that is used to make data accessible to it’s users. It is subset of Data Warehouse.

5. What is dimensional approach for storing data in data warehouse?
Ans. In dimensional approach transaction data are partitioned into either “facts”, which are generally numeric transaction data, or “dimensions”, which are the reference information that gives context to the facts.

6. What is normalization approach for storing data in data warehouse?
Ans. In normalization approach the data in the data warehouse are stored following, to a degree, database normalization rules. Tables are grouped together by subject areas that reflect general data categories (e.g., data on customers, products, finance, etc.).

7. How we organize sales transactions data into dimensional approach?
Ans. A sales transaction can be broken up into facts such as the number of products ordered and the price paid for the products, and into dimensions such as order date, customer name, product number, order ship-to and bill-to locations, and salesperson responsible for receiving the order.

8. What is the main advantage in using dimensional approach?
Ans. The main advantage in using dimensional approach is that the data warehouse is easier for the user to understand and to use. The retrieval of data from data warehouse tends to operate very quickly.

9. What are the main disadvantages of using dimensional approach?
Ans. There are mainly two disadvantages of using dimensional approach to storing data in data warehouse.

1. For maintaining integrity of facts and dimensions, the process of loading the data from data warehouse from different operational systems gets complicated.

2. If organization is adopting the dimensional approach changes the way in which it does the business, it is difficult to modify the data warehouse structure.

10. What are the advantages and disadvantage of using normalization approach for storing data in data warehouse?
Ans. Advantages 1. The main advantage is that it is easy to add information into the database.

Disadvantages

1. It is difficult to join data from different sources into meaningful information

2. It is also difficult to access the information without a precise understanding of the source of data and the data structure of the data warehouse.

11. What is dimension?
Ans. It is a data element. It categorizes each item in a data set into non-overlapping regions. It aids to “slice and dice” data in data warehouse.

12. Can you describe different types of dimensions?
Ans. There are three types of dimensions are available in dimensional approach.

Conformed dimension: It is a set of data attributes which is physically implemented in multiple database tables using same structure, attributes, domain values, definitions and concepts in each implementation. It cuts across many facts.

Dimensions are conformed when they are either same(with keys) or one is perfect subset of other.

Junk Dimension: It is grouping of low cardinality flags and indicators. These flags and indicators are removed from the fact table while placing them into a useful dimensional framework.

Role Playing Dimension: Often dimensions are recycled for multiple applications within the same database. i.e. “date” dimension can be used for “sales date” or “hire date”. This is referred as role playing dimension.

13. What is cube?
Ans. Cube is a data structure that allows fast analysis of data. It is the capability of manipulating and analyzing data from different perspective.

14. What data cleansing(or data scrubbing) will do?
Ans. It will remove errors, redundancies and inconsistencies in the data that is being imported into a data mart or data warehouse. (part of quality assurance process)

15. What do you know about data mining?
Ans. It is process of researching data marts and data warehouses to detect specific patterns in the data sets. It may be performed on databases and multi-dimensional data cubes with ad-hoc query tools and OLAP.

The queries and reports are designed to answer specific questions to uncover trends or hidden relationships in the data.

16. What you know about drill-down and drill-up?
Ans. Drill-down is the ability of a data-mining tool to move down into increasing levels of detail in a data mart, data warehouse or multi-dimensional data cube.

Drill-up is the ability of a data-mining tool to move back up into higher levels of details in a data mart, data warehouse or multi-dimensional data cube.

17. What is a fact table?
Ans. It is primary table that contains measurements(mostly numeric data like grade, wages, etc).

18. What is OLAP(On-Line Analytical Processing)?
Ans. Online retrieval and analysis of data to disclose business trends and statistics not directly visible in the data is known as OLAP.

19. What is OLTP(On-Line Transaction Processing)?
Ans. It refers to a class of systems that facilitates and manage transaction-oriented applications, typically for data entry and retrieval transaction processing.

20. What are advantages and disadvantages of data warehouse?
Ans. Advantages of using data warehouse includes:

1. Integrating data from multiple sources.
2. Performing new types of analysis and reducing cost to access historical data.
3. Improving turnaround time for reporting and analysis.
4. Supporting ad-hoc query and reporting.

Disadvantage

1. Long initial implementation time and associated high cost.
2. Difficult to accommodate changes in data types and ranges, data source schema, indexes and queries.
3. Data owners lose control over their data, raising ownership, privacy and security issues.
4. Adding new data source takes time and associated high cost.

Basic MDX Questions

1. What is MDX?
MDX is a language that articulates selections, calculations, and some metadata definitions against an Online Analytical Processing (OLAP) database, and provides some capabilities for specifying how query results are to be represented.
2. Where you use MDX in SQL Server 2005 OLAP?
MDX queries and expressions in SQL Server 2005 Analysis Services are used to do the following:
• Return data to a client application from a SQL Server 2005 Analysis Services cube.
• Format query results.
• Perform cube design tasks, including the definition of calculated members, named sets, scoped assignments, and key performance indicators (KPIs).
• Perform administrative tasks, including dimension and cell security.

3. What are different APIs that support MDX?
• OLE DB for OLAP
• ADO MD,
• ADOMD.Net,
• XMLA(XML for Analysis),
• Hyperion Sybase C and Java APIs,
4. What is Tuple in SSAS?
• A tuple is the intersection of one (and only one) member taken from one or several of the dimensions in the cube. A tuple identifies (or has the potential to identify) a single Cell in the multi-dimensional matrix.
• To compose a tuple with more than one dimension, you must wrap the members in parentheses (Tuple).
• You can’t compose an empty tuple. () is not a valid tuple.

5. Why don’t a tuple can have more than one member from same dimension?
• Well, if you do, it is inevitable that the ‘tuple’ that you create will end up pointing to more than one cell.
6. What is SET in SSAS?
• A set is a collection of tuples with the same dimensionality. It may have more than one tuple, but it can also have only one tuple, or even have zero tuples, in which case it is an empty set.
• If Set has three tuples with different dimensionality then we will not call that as SET.
• The curly braces { Tuple1, Tuple2 }, are used to denote the Set

7. What are query Axes in MDX?
• Query axes specify the edges of a cell set returned by a Multidimensional Expressions (MDX) SELECT statement.
• A query can have up to 128 query axes. The first five are named axes columns, Rows, pages, sections and chapters.
• However, a query that uses axis 1 must also use axis 0, and a query that uses axis 2 must also use axes 1 and 0. You cannot skip an axis in a query-you’ll get an error.
• The axis designations can come in any order.
• For axes beyond axis(4), you must use the numbers, because there are no names for them.
8. Explain the difference between Query Axis and Slicer Axis ?
Query Axis

Query axes specify the edges of a cellset returned by a Multidimensional Expressions (MDX) SELECT statement.
The Quesry axis is defined in the SELECT clause of the SELECT statement in MDX.
Query axes—the set of hierarchies from which data is retrieved for multiple members

Slicer axis

The slicer axis filters the data returned by the Multidimensional Expressions (MDX) SELECT statement.
The slicer axis is defined in the WHERE clause of the SELECT statement in MDX.
Slicer axis—the set of hierarchies from which data is retrieved for a single member.

9. Explain the MDX SELECT Statement?
• In MDX, the SELECT statement specifies a result set that contains a subset of multidimensional data that has been returned from a cube.
• The ON keyword is used with an axis name to specify where dimensions from your database are displayed
• Each Axe is made up of either one or more tuple or a Set.
• MDX uses curly braces, { and }, to enclose a set of elements from a particular dimension or set of dimensions.
• We can separate different elements with commas (,). Element names may be enclosed by [ ] characters, and may have multiple parts separated by dot (.) characters.
• The FROM clause in an MDX query names the cube from which the data is being queried.
• A WHERE clause that optionally determines which dimension or member to use as the slicer axis that restricts the extracting of data to a specific dimension or member.
10. What is the use of colon operator in MDX Select statement?
• You can specify a set as a range of members in that order by listing two members from the same level as endpoints and putting a colon between them to mean “These members and every member between them.”

• For example
SELECT
{ [Time].[Sep,2004] : [Time].[Mar,2005] } on columns,
{ [Product].[Tools] : [Product].[Home Audio] } on rows
FROM [Sales]
WHERE ([Customer].[Lubbock, TX], [Measures].[Unit Sales])
11. How to establish cube context in MDX query?
• Every MDX query runs within a specified cube context. In the SELECT statement, the FROM clause determines the cube context.
• Although the FROM clause specifies the cube context as within a single cube, this does not have to limit you from working with data from more than one cube at a time.
• You can use the MDX LookupCube function to retrieve data from cubes outside the cube context. Additionally, functions such as the Filter function, are available that allow temporary restriction of the context while evaluating the query.
12. What is use of .Members function in MDX?
• The .Members operator takes a dimension, hierarchy, or level on its left-hand side,
and returns a set of all members associated with that metadata scope. For example, [Customer].Members results in the set of all customers
13. What is the use of .Children Function in MDX?
• The Children function returns a naturally ordered set that contains the children of a specified member. If the specified member has no children, this function returns an empty set.
14. What is use of Descendants() function and specify the Flags usage?
• Returns the set of descendants of a member at a specified level or distance, optionally including or excluding descendants in other levels.
• Syntax :- Descendants(Member_Expression/Set_expressoin [,Level_Expression /Distance [ ,Desc_Flag ] ] )

15. What is use of NON EMPTY Keyword in MDX?
• The NON EMPTY Keyword is used to remove the entirely empty slices from the query result.
16. What precisely Dimensionality of Tuple refers to?
• The “dimensionality” of a tuple refers to the set of dimensions whose members compose it. The order in which dimensions appear in a tuple is an important part of a tuple’s dimensionality
17. Explain CrossJoin() function use in MDX?
• In many cases, you will want to take the cross-product of members (or tuples) in two different sets (that is, specify all of their possible combinations). The CrossJoin() function is the most direct way of combining the two sets in this way.
• The syntax is as follows: CrossJoin (set1, set2)
• CrossJoin() only takes two sets as inputs. If you want to take the CrossJoin() of three or more sets, such as times, scenarios, and products, you can do it by nesting calls to CrossJoin().

18. Explain the Filter() function use in MDX?
• Filter() lets you reduce a set by including in the resulting set only those elements that meet some criteria.
• Filter() takes one set and one Boolean expression as its arguments and returns that subset where the Boolean expression is true.
• For example, the expression
Filter ( { [Product].[Product Category].Members }, [Measures].[Dollar Sales] >= 500 )
19. Explain the Order() function use in MDX?
• To put the tuples in a set into a sequence based on associated data values, we need to use the Order() function.
• The syntax for the Order() function is: Order (set1, expression [, ASC DESC BASC BDESC])
20. Explain DIMENSION PROPERTIES function usage in MDX?
• DIMENSION PROPERTIES a function is used to retrieve the properties of the members in cube.

What is Analysis Services? List out the features?
Microsoft SQL Server 2005 Analysis Services (SSAS) delivers online analytical processing (OLAP) and data mining functionality for business intelligence applications. Analysis Services supports OLAP by letting we design, create, and manage multidimensional structures that contain data aggregated from other data sources, such as relational databases. For data mining applications, Analysis Services lets we design, create, and visualize data mining models that are constructed from other data sources by using a wide variety of industry-standard
data mining algorithms.
Analysis Services is a middle tier server for analytical processing, OLAP, and Data mining. It manages multidimensional cubes of data and provides access to heaps of information including aggregation of data. One can create data mining models from data sources and use it for Business Intelligence also including reporting features.
Analysis service provides a combined view of the data used in OLAP or Data mining. Services here refer to OLAP, Data mining. Analysis services assists in creating, designing and managing multidimensional structures containing data from varied sources. It provides a wide array of data mining algorithms for specific trends and needs.

Some of the key features are:

· Ease of use with a lot of wizards and designers.

· Flexible data model creation and management

· Scalable architecture to handle OLAP

· Provides integration of administration tools, data sources, security, caching, and reporting etc.

· Provides extensive support for custom applications

What is UDM? Its significance in SSAS?
The role of a Unified Dimensional Model (UDM) is to provide a bridge between the user and the data sources. A UDM is constructed over one or more physical data sources, and then the end user issues queries against the UDM using one of a variety of client tools, such as Microsoft Excel. At a minimum, when the UDM is constructed merely as a thin layer over the data source, the advantages to the end user are a simpler, more readily understood model of the data, isolation from heterogeneous backend data sources, and improved performance for summary type queries. In some scenarios a simple UDM like this is constructed totally automatically. With greater investment in the construction of the UDM, additional benefits accrue from the richness of metadata that the model can provide.

The UDM provides the following benefits:
ï¿½ Allows the user model to be greatly enriched.
ï¿½ Provides high performance queries supporting interactive analysis, even over huge data volumes.
ï¿½ Allows business rules to be captured in the model to support richer analysis.

Q: What is the need for SSAS component?

· Analysis Services is the only component in SQL Server using which we can perform Analysis and Forecast operations.

· SSAS is very easy to use and interactive.

· Faster Analysis and Troubleshooting.

· Ability to create and manage Data warehouses.

· Apply efficient Security Principles.

Explain the TWO-Tier Architecture of SSAS?

· SSAS uses both server and client components to supply OLAP and data mining functionality BI Applications.

· The server component is implemented as a Microsoft Windows service. Each instance of Analysis Services implemented as a separate instance of the Windows service.

· Clients communicate with Analysis Services using the standard the XMLA (XML For Analysis) , protocol for issuing commands and receiving responses, exposed as a web service.

What are the components of SSAS?

· An OLAP Engine is used for enabling fast ad hoc queries by end users. A user can interactively explore data by drilling, slicing or pivoting.

· Drilling refers to the process of exploring details of the data.

· Slicing refers to the process of placing data in rows and columns.

· Pivoting refers to switching categories of data between rows and columns.

· In OLAP, we will be using what are called as Dimensional Databases.

What is FASMI ?
A database is called a OLAP Database if the database satisfies the FASMI rules :

· Fast Analysisï¿½ is defined in the OLAP scenario in five seconds or less.

· Shared ï¿½ Must support access to data by many users in the factors of Sensitivity and Write Backs.

· Multidimensional ï¿½ The data inside the OLAP Database must be multidimensional in structure.

· Information ï¿½ The OLAP database Must support large volumes of data..

What languages are used in SSAS ?

· Structured Query Language (SQL)

· Multidimensional Expressions (MDX) - an industry standard query language orientated towards analysis

· Data Mining Extensions (DMX) - an industry standard query language oriented toward data mining.

· Analysis Services Scripting Language (ASSL) - used to manage Analysis Services database objects.

How Cubes are implemented in SSAS ?

· Cubes are multidimensional models that store data from one or more sources.

· Cubes can also store aggregations

· SSAS Cubes are created using the Cube Wizard.

· We also build Dimensions when creating Cubes.

· Cubes can see only the DSV( logical View).

What is the difference between a derived measure and a calculated measure?
The difference between a derived measure and a calculated measure is when the calculation is performed. A derived measure is calculated before aggregations are created, and the values of the derived measure are stored in the cube. A calculated measure is calculated after aggregations are created, and the values of a calculated measure arenï¿½t stored in the cube. The primary criterion for choosing between a derived measure and a calculated measure is not efficiency, but accuracy.

What is a partition?
A partition in Analysis Services is the physical location of stored cube data. Every cube has at least one partition by default. Each time we create a measure group, another partition is created. Queries run faster against a partitioned cube because Analysis Services only needs to read data from the partitions that contain the answers to the queries. Queries run even faster when partition also stores aggregations, the pre calculated totals for additive measures. Partitions are a powerful and flexible means of managing cubes, especially large cubes.

While creating a new calculated member in a cube what is the use of property
called non-empty behavior?
Nonempty behavior is important property for ratio calculations. If the denominator Is empty, an MDX expression will return an error just as it would if the denominator Were equal to zero. By selecting one or more measures for the Non-Empty Behavior property, we are establishing a requirement that each selected measure first be evaluated before the calculation expression is evaluated. If each selected measure is empty, then The expression is also treated as empty and no error is returned.

What is a RAGGED hierarchy?
Under normal circumstances, each level in a hierarchy in Microsoft SQL Server 2005 Analysis Services (SSAS) has the same number of members above it as any other member at the same level. In a ragged hierarchy, the logical parent member of at least one member is not in the level immediately above the member. When this occurs, the hierarchy descends to different levels for different drilldown paths. Expanding through every level for every drilldown path is then unnecessarily complicated.

What are the roles of an Analysis Services Information Worker?

What are the different ways of creating Aggregations?
We can create aggregations for faster MDX statements using Aggregation Wizard or thru UBO ï¿½ Usage Based Optimizations. Always, prefer UBO method in realtime performance troubleshooting.

What is WriteBack? What are the pre-conditions?
The Enable/Disable Writeback dialog box enables or disables writeback for a measure group in a cube. Enabling writeback on a measure group defines a writeback partition and creates a writeback table for that measure group. Disabling writeback on a measure group removes the writeback partition but does not delete the writeback table, to avoid unanticipated data loss.

What is processing?
Processing is a critical and resource intensive operation in the data warehouse lifecycle and needs to be carefully optimized and executed. Analysis Services 2005 offers a high performance and scalable processing architecture with a comprehensive set of controls for database administrators.
We can process an OLAP database, individual cube, Dimension or a specific Partition in a cube.

Name few Business Analysis Enhancements for SSAS?
The following table lists the business intelligence enhancements that are available in Microsoft SQL Server 2005 Analysis Services (SSAS). The table also shows the cube or dimension to which each business intelligence enhancement applies, and indicates whether an enhancement can be applied to an object that was created without using a data source and for which no schema has been generated.

*Enhancement*	*Type*	*Applied to*	No data source
Time Intelligence	Cube	Cube	No
Account Intelligence	Dimension	Dimension or cube	No
Dimension Intelligence	Dimension	Dimension or cube	Yes
Custom Aggregation	Dimension	Dimension (unary operator) or cube	No
Semiadditive Behavior	Cube	Cube	Yes>
Custom Member Formula	Dimension	Dimension or cube	No
Custom Sorting and Uniqueness Settings	Dimension	Dimension or cube	Yes
Dimension Writeback	Dimension	Dimension or cube	Yes

What MDX functions do you most commonly use?
This is a great question because you only know this answer by experience. If you ask me this question, the answer practically rushes out of me. ï¿½CrossJoin, Descendants, and NonEmpty, in addition to Sum, Count, and Aggregate. My personal favorite is CrossJoin because it allows me identify non-contiguous slices of the cube and aggregate even though those cube cells donï¿½t roll up to a natural ancestor.ï¿½ Indeed, CrossJoin has easily been my bread and butter.

Where do you put calculated members?
The reflexive answer is ï¿½in the Measures dimensionï¿½ but this is the obvious answer. So I always follow up with another question. ï¿½If you want to create a calculated member that intersects all measures, where do you put it?ï¿½ A high percentage of candidates canï¿½t answer this question, and the answer is ï¿½In a dimension other than Measures.ï¿½ If they can answer it, I immediately ask them why. The answer is ï¿½Because a member in a dimension cannot intersect its own relatives in that dimension.ï¿½

How do I find the bottom 10 customers with the lowest sales in 2003 that were not null?

A: Simply using bottomcount will return customers with null sales. You will have to combine it with NONEMPTY or FILTER.

SELECT { [Measures].[Internet Sales Amount] } ON COLUMNS ,
BOTTOMCOUNT(
NONEMPTY(DESCENDANTS( [Customer].[Customer Geography].[All Customers]
, [Customer].[Customer Geography].[Customer] )
, ( [Measures].[Internet Sales Amount] ) )
, 10
, ( [Measures].[Internet Sales Amount] )
) ON ROWS
FROM [Adventure Works]
WHERE ( [Date].[Calendar].[Calendar Year].&[2003] ) ;

How in MDX query can I get top 3 sales years based on order quantity?

By default Analysis Services returns members in an order specified during attribute design. Attribute properties that define ordering are "OrderBy" and "OrderByAttribute". Lets say we want to see order counts for each year. In Adventure Works MDX query would be:

SELECT {[Measures].[Reseller Order Quantity]} ON 0
, [Date].[Calendar].[Calendar Year].Members ON 1
FROM [Adventure Works];

Same query using TopCount:
SELECT
{[Measures].[Reseller Order Quantity]} ON 0,
TopCount([Date].[Calendar].[Calendar Year].Members,3, [Measures].[Reseller Order Quantity]) ON 1
FROM [Adventure Works];

How do you extract first tuple from the set?

Use could usefunction Set.Item(0)
Example:

SELECT {{[Date].[Calendar].[Calendar Year].Members
}.Item(0)}
ON 0
FROM [Adventure Works]

How can I setup default dimension member in Calculation script?

You can use ALTER CUBE statement. Syntax:
ALTER CUBE CurrentCube | YourCubeName UPDATE DIMENSION , DEFAULT_MEMBER='';