Q: What is Analysis Services? List out
the features?
Microsoft SQL Server 2014 Analysis Services (SSAS) delivers online analytical processing (OLAP) and data mining functionality for business intelligence applications. Analysis Services supports OLAP by letting us design, create, and manage multidimensional structures that contain data aggregated from other data sources, such as relational databases. For data mining applications, Analysis Services lets we design, create, and visualize data mining models that are constructed from other data sources by using a wide variety of industry-standard data mining algorithms.
Analysis Services is a middle tier server for analytical processing, OLAP, and Data mining. It manages multidimensional cubes of data and provides access to heaps of information including aggregation of data. One can create data mining models from data sources and use it for Business Intelligence also including reporting features.
Analysis service provides a combined view of the data used in OLAP or Data mining. Services here refer to OLAP, Data mining. Analysis services assists in creating, designing and managing multidimensional structures containing data from varied sources. It provides a wide array of data mining algorithms for specific trends and needs.
Microsoft SQL Server 2014 Analysis Services (SSAS) delivers online analytical processing (OLAP) and data mining functionality for business intelligence applications. Analysis Services supports OLAP by letting us design, create, and manage multidimensional structures that contain data aggregated from other data sources, such as relational databases. For data mining applications, Analysis Services lets we design, create, and visualize data mining models that are constructed from other data sources by using a wide variety of industry-standard data mining algorithms.
Analysis Services is a middle tier server for analytical processing, OLAP, and Data mining. It manages multidimensional cubes of data and provides access to heaps of information including aggregation of data. One can create data mining models from data sources and use it for Business Intelligence also including reporting features.
Analysis service provides a combined view of the data used in OLAP or Data mining. Services here refer to OLAP, Data mining. Analysis services assists in creating, designing and managing multidimensional structures containing data from varied sources. It provides a wide array of data mining algorithms for specific trends and needs.
Some of the key features are:
§ Ease of use with a lot of wizards and designers.
§ Flexible data model creation and management
§ Scalable architecture to handle OLAP
§ Provides integration of administration tools, data
sources, security, caching, and reporting etc.
§ Provides extensive support for custom applications
Q: What is UDM? Its significance in
SSAS?
The role of a Unified Dimensional Model (UDM) is to provide a bridge between the user and the data sources. A UDM is constructed over one or more physical data sources, and then the end user issues queries against the UDM using one of a variety of client tools, such as Microsoft Excel. At a minimum, when the UDM is constructed merely as a thin layer over the data source, the advantages to the end user are a simpler, more readily understood model of the data, isolation from heterogeneous backend data sources, and improved performance for summary type queries. In some scenarios a simple UDM like this is constructed totally automatically. With greater investment in the construction of the UDM, additional benefits accrue from the richness of metadata that the model can provide.
The role of a Unified Dimensional Model (UDM) is to provide a bridge between the user and the data sources. A UDM is constructed over one or more physical data sources, and then the end user issues queries against the UDM using one of a variety of client tools, such as Microsoft Excel. At a minimum, when the UDM is constructed merely as a thin layer over the data source, the advantages to the end user are a simpler, more readily understood model of the data, isolation from heterogeneous backend data sources, and improved performance for summary type queries. In some scenarios a simple UDM like this is constructed totally automatically. With greater investment in the construction of the UDM, additional benefits accrue from the richness of metadata that the model can provide.
The UDM provides the following
benefits:
• Allows the user model to be greatly enriched.
• Provides high performance queries supporting interactive analysis, even over huge data volumes.
• Allows business rules to be captured in the model to support richer analysis.
• Allows the user model to be greatly enriched.
• Provides high performance queries supporting interactive analysis, even over huge data volumes.
• Allows business rules to be captured in the model to support richer analysis.
Q:What
is the need for SSAS component?
§ Analysis Services is the only component in SQL
Server using which we can perform Analysis and Forecast operations.
§ SSAS is very easy to use and interactive.
§ Faster Analysis and Troubleshooting.
§ Ability to create and manage Data warehouses.
§ Apply efficient Security Principles.
Q: Explain the TWO-Tier Architecture of
SSAS?
§ SSAS uses both server and client components to
supply OLAP and data mining functionality BI Applications.
§ The server component is implemented as a Microsoft
Windows service. Each instance of Analysis Services implemented as a separate
instance of the Windows service.
§ Clients communicate with Analysis Services using
the standard XMLA (XML For Analysis) , protocol for issuing commands and
receiving responses, exposed as a web service.
Q: What are the components of SSAS?
§ An OLAP Engine is used for enabling fast adhoc
queries by end users. A user can interactively explore data by drilling,
slicing or pivoting.
§ Drilling refers to the process of exploring details
of the data.
§ Slicing refers to the process of placing data in
rows and columns.
§ Pivoting refers to switching categories of data
between rows and columns.
§ In OLAP, we will be using what are called as
Dimensional Databases.
Q: What is FASMI ?
A database is called an OLAP Database if the database satisfies the FASMI rules :
A database is called an OLAP Database if the database satisfies the FASMI rules :
§ Fast Analysis– is defined in the OLAP scenario in five seconds or
less.
§ Shared – Must
support access to data by many users in the factors of Sensitivity
and Write Backs.
§ Multidimensional – The data inside the OLAP Database must be
multidimensional in structure.
§ Information – The OLAP database Must support large volumes of
data..
Q: What languages are used in SSAS ?
§ Structured Query Language (SQL)
§ Multidimensional Expressions (MDX) - an industry
standard query language orientated towards analysis
§ Data Mining Extensions (DMX) - an industry standard
query language oriented toward data mining.
§ Analysis Services Scripting Language (ASSL) - used
to manage Analysis Services database objects.
Q: How Cubes are implemented in SSAS ?
§ Cubes are multidimensional models that store data
from one or more sources.
§ Cubes can also store aggregations
§ SSAS Cubes are created using the Cube Wizard.
§ We also build Dimensions when creating Cubes.
§ Cubes can see only the DSV( logical View).
Q: What is the difference between a
derived measure and a calculated measure?
The difference between a derived measure and a calculated measure is when the calculation is performed. A derived measure is calculated before aggregations are created, and the values of the derived measure are stored in the cube. A calculated measure is calculated after aggregations are created, and the values of a calculated measure aren’t stored in the cube. The primary criterion for choosing between a derived measure and a calculated measure is not efficiency, but accuracy.
The difference between a derived measure and a calculated measure is when the calculation is performed. A derived measure is calculated before aggregations are created, and the values of the derived measure are stored in the cube. A calculated measure is calculated after aggregations are created, and the values of a calculated measure aren’t stored in the cube. The primary criterion for choosing between a derived measure and a calculated measure is not efficiency, but accuracy.
Q: What is a partition?
A partition in Analysis Services is the physical location of stored cube data. Every cube has at least one partition by default. Each time we create a measure group, another partition is created. Queries run faster against a partitioned cube because Analysis Services only needs to read data from the partitions that contain the answers to the queries. Queries run even faster when partition also stores aggregations, the pre calculated totals for additive measures. Partitions are a powerful and flexible means of managing cubes, especially large cubes.
A partition in Analysis Services is the physical location of stored cube data. Every cube has at least one partition by default. Each time we create a measure group, another partition is created. Queries run faster against a partitioned cube because Analysis Services only needs to read data from the partitions that contain the answers to the queries. Queries run even faster when partition also stores aggregations, the pre calculated totals for additive measures. Partitions are a powerful and flexible means of managing cubes, especially large cubes.
Q: While creating a new calculated
member in a cube what is the use of property
called non-empty behavior?
Nonempty behavior is important property for ratio calculations. If the denominator Is empty, an MDX expression will return an error just as it would if the denominator Were equal to zero. By selecting one or more measures for the Non-Empty Behavior property, we are establishing a requirement that each selected measure first be evaluated before the calculation expression is evaluated. If each selected measure is empty, then The expression is also treated as empty and no error is returned.
called non-empty behavior?
Nonempty behavior is important property for ratio calculations. If the denominator Is empty, an MDX expression will return an error just as it would if the denominator Were equal to zero. By selecting one or more measures for the Non-Empty Behavior property, we are establishing a requirement that each selected measure first be evaluated before the calculation expression is evaluated. If each selected measure is empty, then The expression is also treated as empty and no error is returned.
Q: What is a RAGGED hierarchy?
Under normal circumstances, each level in a hierarchy in Microsoft SQL Server Analysis Services (SSAS) has the same number of members above it as any other member at the same level. In a ragged hierarchy, the logical parent member of at least one member is not in the level immediately above the member. When this occurs, the hierarchy descends to different levels for different drilldown paths. Expanding through every level for every drilldown path is then unnecessarily complicated.
Under normal circumstances, each level in a hierarchy in Microsoft SQL Server Analysis Services (SSAS) has the same number of members above it as any other member at the same level. In a ragged hierarchy, the logical parent member of at least one member is not in the level immediately above the member. When this occurs, the hierarchy descends to different levels for different drilldown paths. Expanding through every level for every drilldown path is then unnecessarily complicated.
Q: What are the roles of an Analysis
Services Information Worker?
The role of an Analysis Services
information worker is the traditional "domain expert" role in
business intelligence (BI) someone who understands the data employed by a
solution and is able to translate the data into business information. The role
of an Analysis Services information worker often has one of the following job
titles: Business Analyst (Report Consumer), Manager (Report Consumer),
Technical Trainer, Help Desk/Operation, or Network Administrator.
Q: What are the different ways of
creating Aggregations?
We can create aggregations for faster MDX statements using Aggregation Wizard or thru UBO – Usage Based Optimizations. Always, prefer UBO method in realtime performance troubleshooting.
We can create aggregations for faster MDX statements using Aggregation Wizard or thru UBO – Usage Based Optimizations. Always, prefer UBO method in realtime performance troubleshooting.
Q: What is WriteBack? What are the
pre-conditions?
The Enable/Disable Writeback dialog box enables or disables writeback for a measure group in a cube. Enabling writeback on a measure group defines a writeback partition and creates a writeback table for that measure group. Disabling writeback on a measure group removes the writeback partition but does not delete the writeback table, to avoid unanticipated data loss.
The Enable/Disable Writeback dialog box enables or disables writeback for a measure group in a cube. Enabling writeback on a measure group defines a writeback partition and creates a writeback table for that measure group. Disabling writeback on a measure group removes the writeback partition but does not delete the writeback table, to avoid unanticipated data loss.
Q: What is processing?
Processing is a critical and resource intensive operation in the data warehouse lifecycle and needs to be carefully optimized and executed. Analysis Services offers a high performance and scalable processing architecture with a comprehensive set of controls for database administrators.
We can process an OLAP database, individual cube, Dimension or a specific Partition in a cube.
Processing is a critical and resource intensive operation in the data warehouse lifecycle and needs to be carefully optimized and executed. Analysis Services offers a high performance and scalable processing architecture with a comprehensive set of controls for database administrators.
We can process an OLAP database, individual cube, Dimension or a specific Partition in a cube.
Q: Name few Business Analysis Enhancements
for SSAS?
The following table lists the business intelligence enhancements that are available in Microsoft SQL Server Analysis Services (SSAS). The table also shows the cube or dimension to which each business intelligence enhancement applies, and indicates whether an enhancement can be applied to an object that was created without using a data source and for which no schema has been generated.
The following table lists the business intelligence enhancements that are available in Microsoft SQL Server Analysis Services (SSAS). The table also shows the cube or dimension to which each business intelligence enhancement applies, and indicates whether an enhancement can be applied to an object that was created without using a data source and for which no schema has been generated.
Enhancement
|
Type
|
Applied
to
|
No data source
|
Time Intelligence
|
Cube
|
Cube
|
No
|
Account Intelligence
|
Dimension
|
Dimension or cube
|
No
|
Dimension Intelligence
|
Dimension
|
Dimension or cube
|
Yes
|
Custom Aggregation
|
Dimension
|
Dimension (unary operator) or cube
|
No
|
Semiadditive Behavior
|
Cube
|
Cube
|
Yes
|
Custom Member Formula
|
Dimension
|
Dimension or cube
|
No
|
Custom Sorting and Uniqueness
Settings
|
Dimension
|
Dimension or cube
|
Yes
|
Dimension Writeback
|
Dimension
|
Dimension or cube
|
Yes
|
Q: What MDX functions do you most
commonly use?
This is a great question because you only know this answer by experience. If you ask me this question, the answer practically rushes out of me. “CrossJoin, Descendants, and NonEmpty, in addition to Sum, Count, and Aggregate. My personal favorite is CrossJoin because it allows me identify non-contiguous slices of the cube and aggregate even though those cube cells don’t roll up to a natural ancestor.” Indeed, CrossJoin has easily been my bread and butter.
This is a great question because you only know this answer by experience. If you ask me this question, the answer practically rushes out of me. “CrossJoin, Descendants, and NonEmpty, in addition to Sum, Count, and Aggregate. My personal favorite is CrossJoin because it allows me identify non-contiguous slices of the cube and aggregate even though those cube cells don’t roll up to a natural ancestor.” Indeed, CrossJoin has easily been my bread and butter.
Q: Where do you put calculated members?
The reflexive answer is “in the Measures dimension” but this is the obvious answer. So I always follow up with another question. “If you want to create a calculated member that intersects all measures, where do you put it?” A high percentage of candidates can’t answer this question, and the answer is “In a dimension other than Measures.” If they can answer it, I immediately ask them why. The answer is “Because a member in a dimension cannot intersect its own relatives in that dimension.”
The reflexive answer is “in the Measures dimension” but this is the obvious answer. So I always follow up with another question. “If you want to create a calculated member that intersects all measures, where do you put it?” A high percentage of candidates can’t answer this question, and the answer is “In a dimension other than Measures.” If they can answer it, I immediately ask them why. The answer is “Because a member in a dimension cannot intersect its own relatives in that dimension.”
Q: How do I find the bottom 10
customers with the lowest sales in 2003 that were not null?
A: Simply using bottomcount will return customers with null sales. You will have to combine it with NONEMPTY or FILTER.
SELECT { [Measures].[Internet Sales Amount] } ON COLUMNS ,
BOTTOMCOUNT(
NONEMPTY(DESCENDANTS( [Customer].[Customer Geography].[All Customers]
, [Customer].[Customer Geography].[Customer] )
, ( [Measures].[Internet Sales Amount] ) )
, 10
, ( [Measures].[Internet Sales Amount] )
) ON ROWS
FROM [Adventure Works]
WHERE ( [Date].[Calendar].[Calendar Year].&[2003] ) ;
A: Simply using bottomcount will return customers with null sales. You will have to combine it with NONEMPTY or FILTER.
SELECT { [Measures].[Internet Sales Amount] } ON COLUMNS ,
BOTTOMCOUNT(
NONEMPTY(DESCENDANTS( [Customer].[Customer Geography].[All Customers]
, [Customer].[Customer Geography].[Customer] )
, ( [Measures].[Internet Sales Amount] ) )
, 10
, ( [Measures].[Internet Sales Amount] )
) ON ROWS
FROM [Adventure Works]
WHERE ( [Date].[Calendar].[Calendar Year].&[2003] ) ;
Q: How in MDX query can I get top 3
sales years based on order quantity?
By default Analysis Services returns members in an order specified during attribute design. Attribute properties that define ordering are "OrderBy" and "OrderByAttribute". Lets say we want to see order counts for each year. In Adventure Works MDX query would be:
SELECT {[Measures].[Reseller Order Quantity]} ON 0
, [Date].[Calendar].[Calendar Year].Members ON 1
FROM [Adventure Works];
Same query using TopCount:
SELECT
{[Measures].[Reseller Order Quantity]} ON 0,
TopCount([Date].[Calendar].[Calendar Year].Members,3, [Measures].[Reseller Order Quantity]) ON 1
FROM [Adventure Works];
By default Analysis Services returns members in an order specified during attribute design. Attribute properties that define ordering are "OrderBy" and "OrderByAttribute". Lets say we want to see order counts for each year. In Adventure Works MDX query would be:
SELECT {[Measures].[Reseller Order Quantity]} ON 0
, [Date].[Calendar].[Calendar Year].Members ON 1
FROM [Adventure Works];
Same query using TopCount:
SELECT
{[Measures].[Reseller Order Quantity]} ON 0,
TopCount([Date].[Calendar].[Calendar Year].Members,3, [Measures].[Reseller Order Quantity]) ON 1
FROM [Adventure Works];
Q: How do you extract first tuple from
the set?
Use could usefunctionSet.Item(0)
Example:
SELECT {{[Date].[Calendar].[Calendar Year].Members
}.Item(0)}
ON 0
FROM [Adventure Works]
Use could usefunctionSet.Item(0)
Example:
SELECT {{[Date].[Calendar].[Calendar Year].Members
}.Item(0)}
ON 0
FROM [Adventure Works]
Q: How can I setup default dimension
member in Calculation script?
You can use ALTER CUBE statement. Syntax:
ALTER CUBE CurrentCube | YourCubeName UPDATE DIMENSION , DEFAULT_MEMBER='';
You can use ALTER CUBE statement. Syntax:
ALTER CUBE CurrentCube | YourCubeName UPDATE DIMENSION , DEFAULT_MEMBER='';
What
are Database Dimension and Cube Dimension? What is the difference between them?
A Database Dimension is a Dimension
which exists independent of a Cube. You can see a Database Dimension as a separate
entity/file in the Solution Explorer/Project Folder. An analogy, this is pretty
similar to a Class in a Programming Language.
A Cube Dimension is an Instance of a
Database Dimension. An analogy, this is pretty similar to an Object (Instance
of a Class) in a Programming Language.
Here are some of the
highlights/differences of Database and Cube Dimensions:
- A Database Dimension is independent of a Cube, but a Cube Dimension
is always a part of a Cube.
- One Database Dimension can be added to a Cube more than once (Role-Playing Dimension) as different instances. In this scenario, each Instance is
treated as a separate Cube Dimension.
- A Database Dimension can be added to more than one Cube, whereas a
Cube Dimension can belong to a Single Cube at any given point.
What
is a Linked Dimension? In what scenarios do you use it?
A Linked Dimension is a Dimension
which is based on (Linked To) another Database Dimension which might be either
located on the same Analysis Services server as the Linked Dimension or on a
different Analysis Services server.
Linked Dimensions can be used when the
exact same dimension can be used across multiple Cubes within an Organization
like a Time Dimension, Geography Dimension etc.
Here are some of the highlights of a
Linked Dimension:
- More than one Linked Dimension can be created from a Single
Database Dimension.
- These can be used to implement the concept of Conformed Dimensions.
- For an end user, a Linked Dimension appears like any other
Dimension.
What
are the different ways to create a Time Dimension in Analysis Services?
Time Dimension is one of the most
important and most common type of dimensions as pretty much every metric is
analyzed over time. Analysis Services offers following different ways to create
a Time Dimension:
- Create Using an Existing Table: This is one of the common approaches for building a Time
Dimension. In this approach, a table is created in the underlying data
source and pre-populated with data and it is then used to create the Time
Dimension in SSAS.
- Generate a Time Table in the Data Source: In this approach there will be no underlying table and at the
time of creation of a Time Dimension in SSAS, Analysis Services creates a
table in the underlying data source (relational database) and populates it
with the required data using the date range, attributes, and calendars
etc. which are specified at the time of creation of Time Dimension in
Business Intelligence Development Studio. This option requires permissions
to create a table in the underlying data source.
- Generate a Time Table on the Server: In this approach there will be no underlying table and at the
time of creation of a Time Dimension in SSAS, Analysis Services creates a
table on the server and this table is used to generate the Time Dimension
by the wizard. The Dimension created using this approach is called a
Server Time Dimension.
What
is Type property of a Dimension? What is the purpose of setting this property?
Type property of a Dimension is used to
specify the type of information that the Analysis Services Dimension contains.
Like a Time Dimension (Contains Years, Quarters, Months, Dates, and so on),
Geography Dimension (Contains Geographical Information), and Accounts Dimension
(Contains Accounts related information) etc. This property is set to Regular by
default.
There are basically two important uses
of setting this property:
- This property is used by the Business Intelligence Wizard to assign/generate standard classification/MDXexpressions.
- This property is used by the Analysis Services Client Applications
like Excel etc. to adjust the user interface/rendering format
appropriately. For example, when a dimension is added to a Pivot Table in Excel, certain types of dimensions are automatically added across the
columns axis (like Time Dimension) and a few other types of dimensions are
added to the rows axis (Like Product Dimension) automatically.
What
is a Storage Mode? What are the different storage modes applicable to
Dimensions?
A Storage Mode defines the location in
which the Dimension data will be stored and the format (Relational or
Multidimensional) in which the data will be stored.
Following are the two Storage Modes
supported by Dimensions:
- ROLAP (Relational On-Line Analytical Processing):
- When the Storage Mode is set to ROLAP for a Dimension, then the
Dimension Data will be stored in the relational database tables.
- This storage Mode offers effective memory usage by avoiding
duplication of data compared to MOLAP Storage Mode.
- Using this Storage Mode will result in a slowdown in the query
performance compared to MOLAP Storage Mode.
- MOLAP (Multidimensional On-Line Analytical Processing):
- When the Storage Mode is set to MOLAP for a Dimension, then the
Dimension Data will be stored in a multidimensional format in the
Analysis Services/OLAP Server.
- This Storage Mode offers poor memory usage as it involves
duplication of data (first copy is the data in the underlying dimensional
tables and the second copy is the data in the Dimension in the OLAP
Server).
- Using this Storage Mode will result in best query performance
compared to any other Storage Modes available in SSAS.
What
is the difference between Attribute Hierarchy and User Defined Hierarchy?
An Attribute Hierarchy is a Hierarchy
created by SQL Server Analysis Services for every Attribute in a Dimension by
default. An Attribute by default contains only two levels - An "All"
level and a "Detail" level which is nothing but the Dimension
Members.
A User Defined Hierarchy is a Hierarchy
defined explicitly by the user/developer and often contains multiple levels.
For example, a Calendar Hierarchy contains Year, Quarter, Month, and Date as
its levels.
Here are some of the
highlights/differences of Attribute and User Defined Hierarchies:
- Attribute Hierarchies are always Two-Level (Unless All Level is
suppressed) whereas User Defined Hierarchies are often Multi-Level.
- By default, Every Attribute in a Dimension has an Attribute
Hierarchy whereas User Defined Hierarchies have to be explicitly defined
by the user/developer.
- Every Dimension has at least one Attribute Hierarchy by default
whereas every Dimension does not necessarily contain a User Defined
Hierarchy. In essence, a Dimension can contain zero, one, or more User
Defined Hierarchies.
- Attribute Hierarchies can be enabled or disabled. Disable the
Attribute Hierarchy for those attributes which are commonly not used to
slice and dice the data during analysis, like Address, Phone Number, and
Unit Price etc. Doing this will improve the cube processing performance
and also reduces the size of the cube as those attributes are not considered
for performing aggregations.
- Attribute Hierarchies can be made visible or hidden. When an
Attribute Hierarchy is hidden, it will not be visible to the client
application while browsing the Dimension/Cube. Attribute Hierarchies for
those attributes which are part of the User Defined Hierarchies, like Day,
Month, Quarter, and Year, which are part of the Calendar Hierarchy, can be
hidden, since the attribute is available to the end users through the User
Defined Hierarchy and helps eliminate the confusion/redundancy for end
users.
What
is an Attribute Relationship? What are the different types of Attribute
Relationships?
An Attribute Relationship is
a relationship between various attributes within a Dimension. By default, every
Attribute in a Dimension is related to the Key Attribute. Quite often these
default Attribute Relationships need to be modified to suit the User Defined
Hierarchies and other end user requirements.
There are basically two types of
Attribute Relationships:
- Rigid: Attribute Relationship should be set to
Rigid when the relationship between those attributes is not going to
change over time. For example, relationship between a Month and a Date is
Rigid since a particular Date always belongs to a particular Month like
1st Feb 2012 always belongs to Feb Month of 2012. Try to set the
relationship to Rigid wherever possible.
- Flexible: Attribute Relationship
should be set to Flexible when the relationship between those attributes
is going to change over time. For example, relationship between an
Employee and a Manager is Flexible since a particular Employee might work
under one manager during this year (time period) and under a different
manager during next year (another time period).
What
are KeyColumns and NameColumn properties of an Attribute? What is the different
between them?
KeyColumns is a property of an SSAS
Dimension Attribute and it forms the Key (Unique) for the attribute. It can be
bound to one or more columns in the underlying database table. When User
Defined Hierarchies are created in the dimension (Attribute Relationships
defined), setting this property becomes very critical and often requires
setting this to a combination of more than one column from the Data Source
View. For Example, say you have a Date Dimension and a hierarchy called
Calendar Hierarchy (Year -> Quarter -> Month). Now what happens is that,
Month gets repeated across different quarters and quarters get repeated across
different years making the attribute as non-unique (like January can belong to
Q1 of any year and similar Q1 can belong to any year). So to make the attribute
unique, KeyColumns for Month should be set to something like Year and Month and
similarly for Quarter should be set to Year and Quarter.
A NameColumn is a property of an SSAS
Dimension Attribute and it is used to identify the column from the underlying
Data Source View which provides the name of the attribute which is displayed to
the end user by making it more user friendly instead of displaying the Key
Column value. For Example, you might have ProductCategoryKey as 1, 2, 3, &
4, and ProductCategoryName as Bikes, Components, Clothing, & Accessories
respectively. Now, NameColumn will be set to ProductCategoryName so that user
sees them as Bikes, Components etc. even though the data in the background is
processed/retrieved using the Key Column values as 1, 2 etc.
Here are some of the
highlights/differences of KeyColumns and NameColumn properties:
- KeyColumns property is defaulted to the Attribute itself, and the
NameColumn property is defaulted to Key Column (when the KeyColumns is set
to only one column).
- Column(s) provided in the KeyColumns should be able to uniquely
identify all the values of the respective attribute, whereas NameColumn
need not be unique.
- KeyColumns can contain one or more columns whereas NameColumn can
contain only one column.
What
is an Unknown Member? What is its significance?
An Unknown Member is a built-in member
provided by SQL Server Analysis Services. It represents a Missing or Null
value. Basically when a Dimension is Processed, Analysis Services populates
each of the attributes with distinct values from the underlying data source and
in this process, if it encounters Null value then it converts them
appropriately (to 0 in case of numeric columns and to empty string in case of
string columns) and marks them as Unknown Member for easy interpretation by the
end user.
One of the important uses of an Unknown
Member is to handle Early Arriving Facts. This is a scenario,
in which the transactional record (Fact) would have come into the system but
the corresponding Dimension/Contextual data is yet to come in which could be
due to various reasons like Improperly Designed Data Load Process, Failure in
the ETL Process, and a Technical Glitch in the transactional system causing
delay in pushing the Dimension Data.
Unknown Member can be enabled or
disabled (set to None) or set to Visible or Hidden based on the end user
requirements.
What
are Dimension Translations? In what scenarios do you use them?
Translation in SSAS is a mechanism to
support Localization, in which the labels, names, and captions associated with
any SSAS Object (pretty much every SSAS Object supports Translations) are
translated from one language to another language based on the country/language
of the user accessing the data. A Dimension Translation is same as Translation
in SSAS, but in this case the labels, names, and captions associated with
Dimensions, Attributes, Hierarchies, and/or Dimension Members are translated
from one language to another language.
Translations are very useful in achieving
higher level of adoption of the BI/Analytics system (SSAS). This will eliminate
the language barriers among users from different locations/languages and
presents the same information in different languages making single version of
truth available to users across different geographical locations.
Here are some of the highlights of
Dimension Translations:
- There can be multiple Translations associated with a single
Dimension, Attribute, Hierarchy, and Dimension Member etc.
- The collation and language settings/information from the client
application/computer is used to determine and provide the analysis
services metadata/data to the client application.
What
is SQL Server Analysis Services (SSAS)?
- SQL Server Analysis Services (SSAS) is the On-Line Analytical
Processing (OLAP) Component of SQL Server. SSAS allows you to build
multidimensional structures called Cubes to pre-calculate and
store complex aggregations, and also to build mining models to perform
data analysis to identify valuable information like trends, patterns,
relationships etc. within the data using Data Mining capabilities
of SSAS, which otherwise could be really difficult to determine without
Data Mining capabilities.
- SSAS comes bundled with SQL Server and you get to choose whether or
not to install this component as part of the SQL Server Installation.
What
is OLAP? How is it different from OLTP?
- OLAP stands for On-Line Analytical Processing. It is a capability
or a set of tools which enables the end users to easily and effectively
access the data warehouse data using a wide range of tools like Microsoft Excel, Reporting Services, and
many other 3rd party business intelligence tools.
- OLAP is used for analysis purposes to support day-to-day business
decisions and is characterized by less frequent data updates and contains
historical data. Whereas, OLTP (On-Line Transactional Processing) is used
to support day-to-day business operations and is characterized by frequent
data updates and contains the most recent data along with limited
historical data based on the retention policy driven by business needs.
What is cube:
A Cube acts as an OLAP database to the subscribers who need
to query data from an OLAP data store. A Cube is the main object of a SSAS
solution where the majority of fine tuning, calculations, aggregation
design, storage design, defining relationship and a lot of other
configurations are developed. We will create a cube using our dimension
and fact tables.
What
is a Data Source? What are the different data sources supported by SSAS?
- A Data Source contains
the connection information used by SSAS to connect to the underlying
database to load the data into SSAS during processing. A Data Source
primarily contains the following information (apart from various other
properties like Query timeout, Isolation etc.):
- Provider
- Server Name
- Database Name
- Impersonation Information
- SSAS Supports both .Net and OLE DB Providers. Following are some of
the major sources supported by SSAS: SQL Server, MS Access, Oracle,
Teradata, IBM DB2, and other relational databases with the appropriate OLE
DB provider.
What
is Impersonation? What are the different impersonation options available in
SSAS?
- Impersonation allows SSAS to assume the identity/security context
of the client application which is used by SSAS to perform the server side
data operations like data access, processing etc.
As part of impersonation, the following options are available in SSAS:
- Use a specific Windows user name and password: This option lets you to specify Windows account credentials
which will be used by SSAS to perform operations like source data access,
processing etc.
- Use the service account: When
this option is selected, SSAS uses the credentials of the service account
under which the Analysis Services service is configured/running for
source data access, processing etc.
- Use the credentials of the current user: When this option is set, SSAS uses the credentials of the
current user for performing operations like DMX Open Queries, Local cubes
etc. This option cannot be used for performing server side operations
like source data access, processing etc.
- Inherit: This option let's the SSAS server decide
which impersonation mode is suitable for each type of operation. When
this option is set, by default SSAS will use the service account for
operations like processing and the credentials of the current user for
operations like Local cubes, querying the data mining models, etc...
What
is a Data Source View?
- A Data Source View (DSV) is
a logical view of the underlying database schema and offers a layer of
abstraction for the underlying database schema. This layer acts as a
source for SSAS and captures the schema related information from the
underlying database. The schematic information present in DSV includes the
following:
- Underlying database Table(s)/View(s) metadata
- Primary Key & Foreign Key relationships between the underlying database Table(s)
- Additional columns in the form of Named Calculations
- Complex logic on the underlying Table(s)/View(s) in the form of
Named Queries
- SSAS can only see the schematic information present in the DSV and
it cannot see the schematic information from the underlying database.
What
is a Named Calculation? In what scenarios do you use it?
- A Named Calculation is a new column added to a Table in DSV and is
based on an expression. This capability allows you to add an extra column
into your DSV which is based on one or more columns from underlying data
source Table(s)/View(s) combined using an expression without requiring the
addition of a physical column in the underlying database Table(s)/View(s).
- The expression used in the Named Calculation should conform to the
underlying data source dialect. For example, if the underlying data source
is SQL Server, then it should conform to T-SQL, If is it Oracle, then it
should conform to PL/SQL, etc...
- Named Calculations can be used in many scenarios, following are
some of the common scenarios:
- For creating Derived Columns. Say you have First Name and Last
Name in the underlying data source Table/View and you want to get the
Full Name as "First Name + space + Last Name". Such things can
be added as a Named Calculation.
- For performing Lookup Operations. Say you have an Employee table
which has AddressID and an Address table in which AddressID is the
Primary Key. Now, to get the address information (say Country) into the
Employee table in DSV, a Named Calculation can be added to the Employee
table with the following expression:
(
SELECT Country
FROM Address
WHERE AddressID =
Employee.AddressID
)
What
is a Named Query? In what scenarios do you use it?
- A Named Query is a SQL query/expression in your DSV which acts as a
Table. It is used to combine data from one or more Table(s)/View(s) from
the underlying data source without requiring any schematic changes to the
underlying data source Table(s)/View(s).
- The SQL Query used in the Named Query should conform to the
underlying data source dialect. For example, if the underlying data source
is SQL Server, then it should conform to T-SQL, If is it Oracle, then it
should conform to PL/SQL, etc...
- Named Queries are used in various scenarios, following are some of
the common scenarios:
- Combining data from multiple Tables/Views from the underlying data
source by using either simple or complex join operations.
- Adding filter conditions for filtering out unwanted data or selecting
the required data (limiting the data).
- Pretty much everything that can be done using a Named Calculation
can also be done using a Named Query.
What
are the pros and cons of using Tables and Named Queries in DSV?
- Following are some of the pros and cons of using Tables and Named
Queries in DSV. Tables in the below comparison refers to the Table in DSV
which references a single Table or a View in the underlying source
database.
Tables
|
Named
Queries
|
Named Calculations can be added to Tables in DSV.
|
Named Calculations cannot be added to Named
Queries in DSV.
|
Keys and Relationships are automatically set (by
the wizard) based on the Keys and Relationships in the underlying database
schema.
|
Keys and Relationships have to be set explicitly
in the DSV.
|
Only one Table/View from the underlying data
source can be referenced in DSV.
|
More than one Table/View from the underlying data
source can be referenced using a SQL Expression in the DSV.
|
Any filter/limiting conditions cannot be applied
on a table in DSV.
|
Filter/limiting conditions can be applied as part
of the SQL expression in the Named Query in the DSV.
|
- Although Named Calculations and Named Queries can be used to extend
the functionality of SSAS to address the evolving business needs, it is
always a good practice to first build a good Dimensional
Model in the beginning of a Data
Warehousing/SSAS project.
What
is the purpose of setting Logical Keys and Relationships in DSV?
- Many of the user interfaces/designers/wizards in BIDS which are
part of a SSAS project depend on the Primary Key and Relationships between
Fact and Dimension tables. Hence it is important to define the Primary Key
and Relationships in DSV.
- By default, the Data Source View Wizard detects the Physical
Primary Keys and Relationships between the tables in the underlying source
database and applies the same Keys and Relationships in DSV layer.
However, Logical Keys and Relationships need to be defined explicitly in
the following scenarios:
- If the DSV table is referring to an underlying database View.
- If the DSV table is created as a Named Query.
- If any additional relationships need to be defined in the DSV
layer apart from the ones that are physically defined in the underlying
source database.
Is
it possible to combine data from multiple data sources in SSAS? If yes, how do
you accomplish it?
- SSAS allows combining data from multiple underlying data sources
into a single DSV. To be able to add Table(s)/View(s) from multiple data
sources, first you need to create a DSV using your first source and this
source acts as the primary data source. Now after the initial DSV is
created, you can add one or more data sources into DSV which will act as
secondary data sources and you can choose additional Table(s)/View(s) from
the secondary data sources which you want to include in your DSV.
- The key thing while combining data from multiple data sources is
that the Primary Data Source must support OPENROWSET queries. Hence in
most cases, SQL Server is used as the Primary Data Source.
What
is a SQL Server Analysis Services Dimension?
A Dimension represents a set of
contextual data about the transactional data that is stored in the fact
table(s). It basically forms the story line around the data by providing
context and thereby helping the users to understand and make sense out of the
data.
Example: Let us say that there is a retail chain and the
CFO of the chain announces that the "Total Sales is $10,00,000". In
this case, $10,00,000 is merely a number and does not provide any valuable
information and does not make any sense to the user.
Now let us say that the CFO of the
chain makes a slight change in the above announcement and says "Total
Sales for FY 2012 is $10,00,000". This makes some sense and provides some
amount of context which in this case is sales amount mentioned is for Financial
Year 2012 (FY2012). FY2012 represents a date (time) component and hence it
represents a Date (Time) Dimension.
Similarly, we can go on and add
additional Dimensions to this data (Fact Table/Cube) to provide more context
about the data (in this scenario $10,00,000 is the data we are talking about).
Say we change the above statement to something like "Total Sales of Facial
Tissues for FY 2012 in California is $10,00,000". This gives much better
context compared to the previous two statements. In this statement, Facial
Tissues represents a Product which introduces us to Product Dimension and
California represents a Geographical Location which introduces us to Geography
Dimension. In this manner, we can add as many dimensions as we want, to give
better context to the data, so that end users can analyze the data from
different dimensions and take more effective decisions.
Here are some highlights of
Dimension(s):
- It represents contextual information which adds context/meaning to
the data being analyzed.
- Helps in viewing/analyzing the data from different
dimensions/angles to get a better understanding of the data.
- A dimension is organized in the form of Attributes and Hierarchies.
- Here are some examples of a Dimension:
- Time Dimension
- Product Dimension
- Geography Dimension
- Customer Dimension
- Employee Dimension
- Here are some examples of a Dimension Attribute
- Year, Quarter, Month etc. in case of a Time Dimension
- Color, Size etc. in case of a Product Dimension
What
are Natural Key and Surrogate Key in SQL Server Analysis Services?
A Natural Key is a type of key in a
table which uniquely identifies each record and has a business meaning attached
to it.
Example: Incident/Ticket Number, SSN, Employee Email
Alias, etc.
Here are some highlights of a Natural
Key:
- This can be numeric, string, a combination of both etc. depending
on the type of data you are dealing with. In case of non-numeric data,
JOINs are slower and occupies more space than an integer.
- History cannot be maintained if used as a Primary Key.
- Merging data from different source systems can be difficult when
there is a conflict in the type of value coming from each source system.
A Surrogate Key is a type of key in a
table which uniquely identifies each record, but has no business meaning
attached to it. It is merely a value used to uniquely identify a record in a
table.
Example: Identity Columns, GUID (Unique identifier), etc.
Here are some highlights of a Surrogate
Key:
- Most of the times this is a numeric value (like INT, BIGINT, etc.)
and hence the JOINs are faster and occupies less space than character
based data.
- Allows for maintenance of history data (SCD Type 2) when used as a
Primary Key.
- Data coming from multiple source systems can be easily integrated
(especially in the scenarios like Mergers, Acquisitions etc. in the
industry).
What
is a Hierarchy in SQL Server Analysis Services? What are the different types of
Hierarchies? Explain each one of them.
A Hierarchy is a collection of one or
more related Attributes which are organized in a Parent-Child fashion. An
Attribute at a higher level is a parent of an Attribute at the next level and
so on. They provide a very convenient way to perform roll up/drill down analysis
and helps in rolling up and drilling down the numeric data in a very effective
manner.
In analogy, it is like your Bing maps.
If you want to locate a particular place and if you want to do it manually
(without doing a text search) then you usually first locate the Country, then
locate the State within the Country, after that locate the City within the
State, and finally locate the place you are looking for within the City.
Example: A Calendar Hierarchy might contain Attributes
like Year, Quarter, Month, and Day which are organized as a Hierarchy with Year
as a parent of Quarter, Quarter as a parent of Month, and Month as a parent of
Day.
Here are the common types of
Hierarchies:
- Natural Hierarchy: A
Natural Hierarchy is a Hierarchy in which Every Child has a Single
Parent. Example: A Calendar Hierarchy.
- Balanced Hierarchy: A
Balanced Hierarchy is a Hierarchy in which no matter through what path we
traverse the Hierarchy, there is a Member at every level and every path
has the same number of Levels. Example: A Calendar
Hierarchy.
- Unbalanced Hierarchy: An
Unbalanced Hierarchy is a Hierarchy in which number of members and number
of Levels varies across different branches of the Hierarchy. In this type
of a Hierarchy, Leaf Level Members might belong to different Levels. Example:
An Employee/Organization Hierarchy where in say 1 manager (Manager 1) has
2 or more people reporting to him and who belong to different Levels say
L1 & L2. On the other hand say there is another manager (at the
same level as Manager 1) who has 2 or more people reporting to him and all
of them belonging to the same Level say L1.
- Ragged Hierarchy: A
Ragged Hierarchy is a Hierarchy in which irrespective of the path you use
to traverse, every path has the same number of Levels but not every level
is guaranteed to have members except for the Top most Level and Bottom
most Level (Leaf Level). Example: A Geography
Hierarchy.
What
is a Slowly Changing Dimension in SQL Server Analysis Services? What are the
different types of Slowly Changing Dimensions?
The Slowly Changing Dimension (SCD)
concept is basically about how the data modifications are absorbed and
maintained in a Dimension Table. In an SCD the contents/members change over a
period of time. Apart from the existing members which are modified, new records
are added to the dimension similar to any other dimension.
Here are the major 3 types of Slowly
Changing Dimensions:
- Type 1: In this type of SCD, modifications to the
dimension members are absorbed by overwriting the existing member data
present in the Dimension Table. Example: Say you have
an Address Dimension which contains addresses of all the employees in an
organization. When the address of an employee changes, then the old data
(address) in the Dimension is overwritten with the new data (address) and
there is no way to get the old address from the table.
- Type 2: In this type of SCD, modifications to the
dimension members are absorbed by adding it as a new record into the
Dimension Table. Meaning, the history of all the changes happening to the
dimension members is maintained by creating a new record each time a dimension
member is modified. The new (modified) record and the old record(s) are
identified using some kind of a flag like say IsActive, IsDeleted etc. or
using Start and End Date fields to indicate the validity of the record.
New dimension members are inserted into the Dimension Table with
appropriate flag value which indicates that this is the active/current
record. Unlimited amount of history can be maintained in this type of
SCD. Example: Considering the above example of an
Address Dimension. When the address of an employee changes, then the old
data (address) in the Dimension Table is retained as it is and the flag is
updated to indicate that this is a historical record. The modified data
(record) is inserted as a new record into the table and the flag is
updated to indicate that this is the current record.
- Type 3: In this type of SCD, modifications to the
dimension members are absorbed with the help of additional columns in the
Dimension Table. In this SCD type, the amount of history data (number of
modifications) that can be maintained is limited by the number of columns
(fixed) in the Dimension Table, which are used for tracking the history
details. This is decided at the time of designing the Data Mart. Example:
Considering the above example of an Address Dimension. When the address of
an employee changes, then the old data (address) in the Dimension is
retained as it is and the new address is updated in a separate column to
indicate that this is the latest address.
Apart from the above listed 3 types of
SCDs, there are other types like Type 0, Type 4, Type
5, & Type 6 etc. which are used rarely.
What
is a Parent-Child Dimension in SQL Server Analysis Services? Explain with an
example.
A Parent-Child Dimension is a Dimension
in which two attributes in the same dimension are related to each other and
they together define the linear relationship among the dimension members. The
first attribute which uniquely identifies each dimension member is called the
Member Key Attribute and the second attribute which identifies the parent of a
dimension member is called a Parent Key Attribute.
Example: Consider an Employee Dimension which has
EmployeeID as a Primary/Unique Key which uniquely identifies each Employee in
the organization. The same Employee Dimension contains another attribute as
ManagerID which identifies the Manager of an Employee and ManagerID contains
one of the values present in the EmployeeID since Manager is also an Employee,
hence he/she also has a record for self in the Employee Table. In this scenario,
EmployeeID is the Member Key Column and ManagerID is the Parent Key Column.
Here are some highlights of a
Parent-Child Dimension:
- Both the Parent and the Child members exist in the same dimension.
- There is a Parent-Child relationship between different members
(rows) of the same dimension.
- Parent Key Column in the dimension table is a Foreign Key Column
(Can be physical or logical) which refers to the Member Key Column which
is a Primary/Unique Key Column in the same dimension table.
What
is a Role-Playing Dimension in SQL Server Analysis Services? Explain with an
example.
A Role-Playing Dimension is a Dimension
which is connected to the same Fact Table multiple times using different
Foreign Keys. This helps the users to visualize the same cube data in different
contexts/angles to get a better understanding and make better decisions.
Example: Consider a Time Dimension which is joined to the
same Fact Table (Say FactSales) multiple times, each time using a different
Foreign Key in the Fact Table like Order Date, Due Date, Ship Date, Delivery
Date, etc. Essentially there is only one single physical dimension called Date
Dimension. However, it is joined multiple times to the Fact Table to help the
users to visualize the cube data in the context of different dates.
Here are some highlights of a
Role-Playing Dimension:
- It is a single physical Dimension Table.
- Same Dimension Table connects to the same Fact Table multiple times
using different Foreign Keys from the Fact Table.
- When a Role-Playing Dimension is added to the cube, it appears as a
different dimension (one instance for each Foreign Key to which it is
joined) to the end users and hence playing multiple roles.
What
is a Conformed Dimension in SQL Server Analysis Services? Explain with an
example.
A Conformed Dimension is a Dimension
which connects to multiple Fact Tables across one or more Data Marts (cubes). A
Confirmed Dimension is physically implemented across multiple Data Marts with
exactly the same structure, attributes, values (dimension members), meaning and
definition. Each instance means exactly the same thing providing the exact same
context irrespective of the Fact Table to which it is connected or the Data
Mart to which it belongs.
Example: A Date Dimension has exactly the same set of attributes,
same members and same meaning irrespective of which Fact Table it is connected
to or to which Data Mart it belongs to as long as it belongs to the same
organization. For instance, a Fiscal Calendar is exactly the same with same
start and end dates across all the departments within an organization.
Here are some highlights of Conformed
Dimensions:
- Same Dimension joins to multiple Fact Tables or is used across
multiple Data Marts.
- It is a master dimension and is used across multiple dimensional models.
- Each instance of Conformed Dimensions are exactly the same in every
aspect including the attribute names, definitions, etc.
What
is a Degenerate Dimension in SQL Server Analysis Services? In what scenarios do
you use it?
A Degenerate Dimension is a Dimension
which is derived out of a Fact Table and it appears to the end user as a
separate/distinct Dimension, its data is actually stored in the Fact Table.
It's a Dimension table which does not have an underlying physical table of its
own.
Degenerate Dimensions are commonly used
when the Fact Table contains/represents Transactional data like Order Details,
etc. and each Order has an Order Number associated with it, which forms the
unique value in the Degenerate Dimension.
Example: Degenerate Dimensions having unique Order Numbers
can be used to identify the various items sold as part of a particular order.
Here are some highlights of Degenerate
Dimension:
- It is derived from the Fact Table and does not have an underlying
physical Dimension Table of its own.
- It is also called as a Fact Dimension.
- Since these dimensions are built on top of Fact Table, these are
usually very large dimensions.
- The attribute of a Degenerate Dimension is not a Foreign Key in the
Fact Table.
What
is a Junk Dimension in SQL Server Analysis Services? In what scenarios do you
use it?
A Junk Dimension is often a collection
of Unrelated Attributes like indicators, flags, codes, etc. This Dimension
usually contains data which cannot be created as a separate Dimension as they
tend to be too small (often) and are not worth having a separate Dimension.
These Dimensions can be used in various
scenarios, but one of the common scenarios is when a Fact Table contains a lot
of Attributes which are like indicators, flags, etc. Using Junk Dimensions,
such Attributes can be removed/cleaned up from a Fact Table.
Example: It contains values like Yes/No, Pending/In
Progress/Completed, Open/Resolved/Closed, etc.
Here are some highlights of a Junk
Dimension:
- It is also called as a Garbage Dimension.
- Junk Dimensions are usually small in size.
- It is a convenient way to consolidate many Smaller Dimensions which
are not really worth having as a separate Dimension.
What
are Database Dimension and Cube Dimension? What is the difference between them?
A Database Dimension is a Dimension
which exists independent of a Cube. You can see a Database Dimension as a
separate entity/file in the Solution Explorer/Project Folder. An analogy, this
is pretty similar to a Class in a Programming Language.
A Cube Dimension is an Instance of a
Database Dimension. An analogy, this is pretty similar to an Object (Instance
of a Class) in a Programming Language.
Here are some of the highlights/differences
of Database and Cube Dimensions:
- A Database Dimension is independent of a Cube, but a Cube Dimension
is always a part of a Cube.
- One Database Dimension can be added to a Cube more than once (Role-Playing Dimension) as
different instances. In this scenario, each Instance is treated as a
separate Cube Dimension.
- A Database Dimension can be added to more than one Cube, whereas a
Cube Dimension can belong to a Single Cube at any given point.
What
is a Linked Dimension? In what scenarios do you use it?
A Linked Dimension is a Dimension
which is based on (Linked To) another Database Dimension which might be either
located on the same Analysis Services server as the Linked Dimension or on a
different Analysis Services server.
Linked Dimensions can be used when the
exact same dimension can be used across multiple Cubes within an Organization
like a Time Dimension, Geography Dimension etc.
Here are some of the highlights of a
Linked Dimension:
- More than one Linked Dimension can be created from a Single
Database Dimension.
- These can be used to implement the concept of Conformed Dimensions.
- For an end user, a Linked Dimension appears like any other
Dimension.
What
are the different ways to create a Time Dimension in Analysis Services?
Time Dimension is one of the most
important and most common type of dimensions as pretty much every metric is
analyzed over time. Analysis Services offers following different ways to create
a Time Dimension:
- Create Using an Existing Table: This is one of the common approaches for building a Time
Dimension. In this approach, a table is created in the underlying data
source and pre-populated with data and it is then used to create the Time
Dimension in SSAS.
- Generate a Time Table in the Data Source: In this approach there will be no underlying table and at the
time of creation of a Time Dimension in SSAS, Analysis Services creates a
table in the underlying data source (relational database) and populates it
with the required data using the date range, attributes, and calendars
etc. which are specified at the time of creation of Time Dimension in
Business Intelligence Development Studio. This option requires permissions
to create a table in the underlying data source.
- Generate a Time Table on the Server: In this approach there will be no underlying table and at the
time of creation of a Time Dimension in SSAS, Analysis Services creates a
table on the server and this table is used to generate the Time Dimension
by the wizard. The Dimension created using this approach is called a
Server Time Dimension.
What
is Type property of a Dimension? What is the purpose of setting this property?
Type property of a Dimension is used to
specify the type of information that the Analysis Services Dimension contains.
Like a Time Dimension (Contains Years, Quarters, Months, Dates, and so on),
Geography Dimension (Contains Geographical Information), and Accounts Dimension
(Contains Accounts related information) etc. This property is set to Regular by
default.
There are basically two important uses
of setting this property:
- This property is used by the Business Intelligence Wizard
to assign/generate standard classification/MDXexpressions.
- This property is used by the Analysis Services Client Applications
like Excel etc. to adjust the user interface/rendering format
appropriately. For example, when a dimension is added to a Pivot Table in Excel,
certain types of dimensions are automatically added across the columns
axis (like Time Dimension) and a few other types of dimensions are added
to the rows axis (Like Product Dimension) automatically.
What
is a Storage Mode? What are the different storage modes applicable to
Dimensions?
A Storage Mode defines the location in
which the Dimension data will be stored and the format (Relational or
Multidimensional) in which the data will be stored.
Following are the two Storage Modes
supported by Dimensions:
- ROLAP (Relational On-Line Analytical Processing):
- When the Storage Mode is set to ROLAP for a Dimension, then the
Dimension Data will be stored in the relational database tables.
- This storage Mode offers effective memory usage by avoiding
duplication of data compared to MOLAP Storage Mode.
- Using this Storage Mode will result in a slowdown in the query
performance compared to MOLAP Storage Mode.
- MOLAP (Multidimensional On-Line Analytical Processing):
- When the Storage Mode is set to MOLAP for a Dimension, then the
Dimension Data will be stored in a multidimensional format in the
Analysis Services/OLAP Server.
- This Storage Mode offers poor memory usage as it involves
duplication of data (first copy is the data in the underlying dimensional
tables and the second copy is the data in the Dimension in the OLAP
Server).
- Using this Storage Mode will result in best query performance
compared to any other Storage Modes available in SSAS.
What
is the difference between Attribute Hierarchy and User Defined Hierarchy?
An Attribute Hierarchy is a Hierarchy
created by SQL Server Analysis Services for every Attribute in a Dimension by
default. An Attribute by default contains only two levels - An "All"
level and a "Detail" level which is nothing but the Dimension
Members.
A User Defined Hierarchy is a Hierarchy
defined explicitly by the user/developer and often contains multiple levels.
For example, a Calendar Hierarchy contains Year, Quarter, Month, and Date as
its levels.
Here are some of the
highlights/differences of Attribute and User Defined Hierarchies:
- Attribute Hierarchies are always Two-Level (Unless All Level is
suppressed) whereas User Defined Hierarchies are often Multi-Level.
- By default, Every Attribute in a Dimension has an Attribute
Hierarchy whereas User Defined Hierarchies have to be explicitly defined
by the user/developer.
- Every Dimension has at least one Attribute Hierarchy by default
whereas every Dimension does not necessarily contain a User Defined
Hierarchy. In essence, a Dimension can contain zero, one, or more User
Defined Hierarchies.
- Attribute Hierarchies can be enabled or disabled. Disable the
Attribute Hierarchy for those attributes which are commonly not used to
slice and dice the data during analysis, like Address, Phone Number, and
Unit Price etc. Doing this will improve the cube processing performance
and also reduces the size of the cube as those attributes are not
considered for performing aggregations.
- Attribute Hierarchies can be made visible or hidden. When an
Attribute Hierarchy is hidden, it will not be visible to the client
application while browsing the Dimension/Cube. Attribute Hierarchies for
those attributes which are part of the User Defined Hierarchies, like Day,
Month, Quarter, and Year, which are part of the Calendar Hierarchy, can be
hidden, since the attribute is available to the end users through the User
Defined Hierarchy and helps eliminate the confusion/redundancy for end
users.
What
is an Attribute Relationship? What are the different types of Attribute
Relationships?
An Attribute Relationship is
a relationship between various attributes within a Dimension. By default, every
Attribute in a Dimension is related to the Key Attribute. Quite often these
default Attribute Relationships need to be modified to suit the User Defined
Hierarchies and other end user requirements.
There are basically two types of
Attribute Relationships:
- Rigid: Attribute Relationship should be set to
Rigid when the relationship between those attributes is not going to
change over time. For example, relationship between a Month and a Date is
Rigid since a particular Date always belongs to a particular Month like 1st
Feb 2012 always belongs to Feb Month of 2012. Try to set the relationship
to Rigid wherever possible.
- Flexible: Attribute Relationship
should be set to Flexible when the relationship between those attributes
is going to change over time. For example, relationship between an
Employee and a Manager is Flexible since a particular Employee might work
under one manager during this year (time period) and under a different
manager during next year (another time period).
What
are KeyColumns and NameColumn properties of an Attribute? What is the different
between them?
KeyColumns is a property of an SSAS
Dimension Attribute and it forms the Key (Unique) for the attribute. It can be
bound to one or more columns in the underlying database table. When User
Defined Hierarchies are created in the dimension (Attribute Relationships
defined), setting this property becomes very critical and often requires
setting this to a combination of more than one column from the Data Source
View. For Example, say you have a Date Dimension and a hierarchy called
Calendar Hierarchy (Year -> Quarter -> Month). Now what happens is that,
Month gets repeated across different quarters and quarters get repeated across
different years making the attribute as non-unique (like January can belong to Q1
of any year and similar Q1 can belong to any year). So to make the attribute
unique, KeyColumns for Month should be set to something like Year and Month and
similarly for Quarter should be set to Year and Quarter.
A NameColumn is a property of an SSAS
Dimension Attribute and it is used to identify the column from the underlying
Data Source View which provides the name of the attribute which is displayed to
the end user by making it more user friendly instead of displaying the Key
Column value. For Example, you might have ProductCategoryKey as 1, 2, 3, &
4, and ProductCategoryName as Bikes, Components, Clothing, & Accessories
respectively. Now, NameColumn will be set to ProductCategoryName so that user
sees them as Bikes, Components etc. even though the data in the background is
processed/retrieved using the Key Column values as 1, 2 etc.
Here are some of the
highlights/differences of KeyColumns and NameColumn properties:
- KeyColumns property is defaulted to the Attribute itself, and the
NameColumn property is defaulted to Key Column (when the KeyColumns is set
to only one column).
- Column(s) provided in the KeyColumns should be able to uniquely
identify all the values of the respective attribute, whereas NameColumn
need not be unique.
- KeyColumns can contain one or more columns whereas NameColumn can
contain only one column.
What
is an Unknown Member? What is its significance?
An Unknown Member is a built-in member
provided by SQL Server Analysis Services. It represents a Missing or Null
value. Basically when a Dimension is Processed, Analysis Services populates
each of the attributes with distinct values from the underlying data source and
in this process, if it encounters Null value then it converts them
appropriately (to 0 in case of numeric columns and to empty string in case of
string columns) and marks them as Unknown Member for easy interpretation by the
end user.
One of the important uses of an Unknown
Member is to handle Early Arriving Facts. This is a scenario,
in which the transactional record (Fact) would have come into the system but
the corresponding Dimension/Contextual data is yet to come in which could be
due to various reasons like Improperly Designed Data Load Process, Failure in
the ETL Process, and a Technical Glitch in the transactional system causing
delay in pushing the Dimension Data.
Unknown Member can be enabled or
disabled (set to None) or set to Visible or Hidden based on the end user
requirements.
What
are Dimension Translations? In what scenarios do you use them?
Translation in SSAS is a mechanism to
support Localization, in which the labels, names, and captions associated with
any SSAS Object (pretty much every SSAS Object supports Translations) are
translated from one language to another language based on the country/language
of the user accessing the data. A Dimension Translation is same as Translation
in SSAS, but in this case the labels, names, and captions associated with
Dimensions, Attributes, Hierarchies, and/or Dimension Members are translated
from one language to another language.
Translations are very useful in
achieving higher level of adoption of the BI/Analytics system (SSAS). This will
eliminate the language barriers among users from different locations/languages
and presents the same information in different languages making single version
of truth available to users across different geographical locations.
Here are some of the highlights of
Dimension Translations:
- There can be multiple Translations associated with a single
Dimension, Attribute, Hierarchy, and Dimension Member etc.
- The collation and language settings/information from the client
application/computer is used to determine and provide the analysis
services metadata/data to the client application.
What
are Measures and Measure Groups? What is the difference between them?
A Measure is any numeric quantity/value
that represents a metric aligned to an organization's goals. This is the value
which the business users are interested in, and are interested in viewing these
values from different angles and different granularity levels. A measure is
also commonly called a fact. The term "measures" and
"facts" are used interchangeably.
A Measure Group is a collection/group
of measures which belong to the same underlying fact table. In SSAS, typically
each Measure Group is tied to each one of the underlying fact tables.
A Measure is single numeric value
whereas a Measure Group is a collection of measures.
What
are the different types of Measures? Explain each one of them with an example.
Below are the most common types of
measures/facts:
- Fully Additive Facts: These
are facts which can be added across all the associated dimensions. For
example, sales amount is a fact which can be summed across different
dimensions like customer, geography, date, product, and so on.
- Semi-Additive Facts: These
are facts which can be added across only few dimensions rather than all dimensions.
For example, bank balance is a fact which can be summed across the
customer dimension (i.e. the total balance of all the customers in a bank
at the end of a particular quarter). However, the same fact cannot
be added across the date dimension (i.e. the total balance at the end of
quarter 1 is $X million and $Y million at the end of quarter 2, so at the
end of quarter 2, the total balance is only $Y million and not $X+$Y).
- Non-Additive Facts: These
are facts which cannot be added across any of the dimensions in the cube.
For example, profit margin is a fact which cannot be added across any of
the dimensions. For example, if product P1 has a 10% profit and product P2
has a 10% profit then your net profit is still 10% and not 20%. We
cannot add profit margins across product dimensions. Similarly, if your
profit margin is 10% on Day1 and 10% on Day2, then your net Profit Margin
at the end of Day2 is still 10% and not 20%.
- Derived Facts: Derived
facts are the facts which are calculated from one or more base facts,
often by applying additional criteria. Often these are not stored in the
cube and are calculated on the fly at the time of accessing them. For
example, profit margin.
- Factless Facts: A
factless fact table is one which only has references (Foreign Keys) to the
dimensions and it does not contain any measures. These types of fact
tables are often used to capture events (valid transactions without a net
change in a measure value). For example, a balance enquiry at an automated
teller machine (ATM). Though there is no change in the account balance,
this transaction is still important for analysis purposes.
- Textual Facts: Textual
facts refer to the textual data present in the fact table, which is not
measurable (non-additive), but is important for analysis purposes. For
example, codes (i.e. product codes), flags (i.e. status flag), etc.
What
is the purpose of Dimension Usage settings? Explain different types of
relationships between Facts and Dimensions.
The Dimension Usage tab in the Cube
Designer in SQL Server Business Intelligence Development Studio defines the relationship between a Cube
Dimension and a Measure Group (s). A Dimension which is related to one of more
Measure Groups, directly/indirectly, is called as a Cube Dimension. A Cube
Dimension is an instance of a database Dimension as explained in the previous tip.
Following are the four different types
of relationships between a Cube Dimension and a Measure Group:
- Regular: In a Regular relationship,
primary key column of a dimension is directly connected to the fact table.
This type of relationship is similar to the relationship between a
dimension and a fact in a Star Schema, and
it can be based on either the physical primary key-foreign key
relationship in the underlying relational database or the logical primary
key-foreign key relationship defined in the Data Source View.
- Referenced: In a
Referenced relationship, primary key columns of a dimension is indirectly
connected to the fact table through a key column in the intermediate
dimension table. This type of relationship is similar to the indirect
relationship between a dimension and a fact, through an intermediate
dimension, in a Snowflake Schema.
- Fact: In a Fact relationship, the
dimension table and the fact table are one and the same. Basically a Fact Dimension or Degenerate Dimensionis created using one or more columns from the fact table and this
degenerate dimension is used while defining/establishing the relationship
in case of a fact relationship.
- Many-to-Many: In a
Many-to-Many relationship, a dimension is indirectly connected to a
Measure Group through an intermediate fact table which joins with the
dimension table. It is analogous to a scenario, where one project can have
multiple project managers and one project manager can manage multiple projects.
What
are Calculated Members? How do they differ from Measures?
Calculated Members are members of
a measure group and are defined based on a combination of one or more base
measures, arithmetic/conditional operators, numeric values, and functions, etc.
For example, profit is a calculated member/calculate measure, which is defined
based on various base measures like selling price, cost, price, tax amount, freight
amount, etc.
The value of a measure (base measure)
is stored in a cube as part of the cube processing process. Whereas the value
of a calculated member/measure is calculated on the fly in response to a user
request and only the definition is stored in the cube.
What
are Named Sets? What are the two types of Named Sets?
A Named Set is a set of dimension members
(usually a subset of dimension members) and is defined using MDX (a
Multidimensional Expression). Often Named Sets are defined for improved
usability by the end users and client applications. Apart from that, they can also
be used for various calculations at the cube level. Similar to calculated
members/measures, named sets are defined using a combination of cube/dimension
data, arithmetic operators, numeric values, functions, etc. Some of the
examples of Named Sets are top 50 customers, top 10 products, top 5 students,
etc.
Named Sets are of two types: Static
Named Sets and Dynamic Named Sets.
Static Named Sets, when defined in
cube, are evaluated during cube processing process. Dynamic Named Sets are
evaluated each time the query is invoked by the user.
What
are KPIs? What are the different properties associated with a KPI?
KPI stands for Key Performance Indicator.
A KPI is a measure of an organization's performance in a pre-defined area of
interest. KPIs are defined to align with the pre-defined organizational goals
and help the business decision makers gain insights into their business
performance.
Often KPIs have the following five
commonly used properties:
- Name: Indicates the name of the
Key Performance Indicator.
- Actual/Value: Indicates
the actual value of a measure pre-defined to align with organizational
goals.
- Target/Goal: Indicates
the target value (i.e. goal) of a measure pre-defined to align with
organizational goals.
- Status: It is a numeric value and
indicates the status of the KPI like performance is better than expected,
performance is as expected, performance is not as expected, performance is
much lower than expected, etc.
- Trend: It is a numeric value and
indicates the KPIs trend like performance is constant over a period of
time, performance is improving over a period of time, performance is
degrading over a period of time, etc.
Apart from the above listed properties,
most of the times, KPIs contain the following two optional properties:
- Status Indicator: It is a
graphical Indicator used to visually display the status of a KPI. Usually
colors like red, yellow, and green are used or even other graphics like
smiley or unhappy faces.
- Trend Indicator: It is a
graphical indicator used to visually display the trend of a KPI. Usually
up arrow, right arrow, and down arrow are used.
What
are Actions in SSAS? What are the different types of Actions in SQL Server
Analysis Services?
Actions in SSAS allow us to extend the
cube functionality and enable the users to interact with the cube. An Action in
simple terms is basically an event, which can be initiated by a
user/application and it can take various forms depending upon the type of
Action defined.
Actions are primarily of following
three types:
- Drillthrough Actions: A Drillthrough Actionretrieves
the detail level information associated with the cube data based on which
the Drillthrough Action is defined.
- Reporting Actions: A
Reporting Action retrieves an SSRS report which is associated with the
cube data. The command which invokes the SSRS report contains the report
URL along with the report parameters.
- Standard Actions: A
Standard Action retrieves the action element associated with the cube
data. Standard actions are further categorized into 5 different
subcategories and the action element varies for each of these
subcategories. The following are the types of Standard Actions:
- Dataset Action: Returns
a dataset to the client application and the action content is an MDX
expression.
- Proprietary Action: Performs
an operation as defined by the client application. The action content for
this type of action is specific to the calling client application and the
client application is responsible for interpreting the meaning of the
Action.
- Rowset Action: A
Rowset Action returns a Rowset to the client application. The action
content is a command to retrieve the data.
- Statement Action: The
action content for this type of Action is an OLE DB command and it
returns a command string to the client application.
- URL Action: The
Action Content for this type of action is an URL and it returns a URL to
the client application which can be opened usually in a web browser. This
is the default action.
What
are partitions in cubes? How do they different from table partitions at a SQL
Server database level?
A partition is physical storage space
which contains either all or a portion of measure group data. Each measure
group in SSAS has one partition by default.
A partition can be either bound to a
table in the underlying relational database or a query pointing to the table(s)
in the underlying database and has filters in it.
In terms of storage, cube partitions in
SSAS and table partitions in a database
are similar. Both these types of partitions are used to improve the
performance. However, partitions in SSAS offer additional benefits including:
- Each partition can be processed separately (i.e. a measure group
can be split across multiple partitions, for example one partition for
each year). Only the partitions in which data has been modified can be
processed thereby improving the processing time of the cube.
- Partitions provide improved manageability by allowing us to define
storage mode, aggregation design, etc. at the partition level and these settings
can vary between different partitions belonging to the same measure group.
What
are the different Storage Modes supported by Cube Partitions?
There are primarily two types of data
in SSAS: summary and detail data. Based on the approach used to store each of
these two types of data, there are three standard storage modes supported by
partitions:
- ROLAP: ROLAP stands for Real Time
Online Analytical Processing. In this storage mode, summary data is stored
in the relational data warehouse and detail data is stored in the
relational database. This storage mode offers low latency, but it requires
large storage space as well as slower processing and query response times.
- MOLAP: MOLAP stands for
Multidimensional Online Analytical Processing. In this storage mode, both
summary and detail data is stored on the OLAP server (multidimensional
storage). This storage mode offers faster query response and processing
times, but offers a high latency and requires average amount of storage
space. This storage mode leads to duplication of data as the detail data
is present in both the relational as well as the multidimensional storage.
- HOLAP: HOLAP stands for Hybrid
Online Analytical Processing. This storage mode is a combination of ROLAP
and MOLAP storage modes. In this storage mode, summary data is stored in
OLAP server (Multidimensional storage) and detail data is stored in the
relational data warehouse. This storage mode offers optimal storage space,
query response time, latency and fast processing times.
There are different variations of these
Standard Storage Modes. Visit this msdn article for
more details.
What
is proactive caching in SQL Server Analysis Services?
Proactive caching is an advanced
feature in SSAS and it enables a cube to reflect the most recent data present
in the underlying database by automatically refreshing the cube based on the
predefined settings. This feature allows the users to view the data in near real-time.
Proactive caching can be configured to
refresh the cache (MOLAP cache) either on a pre-defined schedule or in response
to an event (change in the data) from the underlying relational database.
Proactive caching settings also determine whether the data is queried from the
underlying relational database (ROLAP) or is read from the outdated MOLAP
cache, while the MOLAP cache is rebuilt.
Proactive caching helps in minimizing
latency and achieve high performance.
What
are aggregates? What is the purpose of defining an aggregation design in
Analysis Services?
Aggregates are summarized values, each
of which corresponds to a combination of an attribute from each dimension and a
measure group. An aggregate in SSAS is the differentiating factor between OLAP and
OLTP, and is the fundamental principle of SSAS/OLAP, which offers blazing fast
performance.
In theory, many people believe that
SSAS stores aggregated values for every combination of each attribute from each
dimension and in each measure group. However, in reality, SSAS stores only a
part of all combinations and not all the possible combinations. In some
scenarios it might be helpful to create certain percentage of aggregations
every time the cube is processed, without leaving the decision to SSAS. This is
achieved by defining aggregation design.
The Aggregation Design Wizard is
used to design aggregations in SSAS and it provides the following options as
part of the aggregation design process:
- Design aggregations until estimated storage reaches "X"
MB.
- Design aggregations until performance gain reaches "X"
percentage.
- Design aggregations until the person designing the aggregations,
clicks the "Stop" button.
- Do not design aggregations at all (0% aggregation).
What
is Usage Based Optimization in SSAS? How is it performed?
Usage Based Optimization is the process
of defining aggregation design based on the actual queries executed by the
users/applications accessing the cube, which is captured using a log.
SSAS offers a wizard called Usage Based Optimization Wizard to perform Usage Based Optimization by
defining aggregation design. The Usage Based Optimization Wizard provides the
following options as part of the Aggregation Design Process:
- Design Aggregations until estimated storage reaches "X"
MB.
- Design Aggregations until performance gain reaches "X"
percentage.
- Design Aggregations until the person, designing the aggregations,
clicks the "Stop" button.
As we can see, the above options
offered by the Usage Based Optimization Wizard are similar to the ones offered
by the Aggregation Wizard. However, in the case of the Aggregation Wizard, all
possible queries are given equal importance, whereas in the case of the Usage
Based Optimization Wizard, the importance is based on the actual queries being
fired.
To use the Usage Based Optimization
Wizard, Query Log Properties need to be configured at the SSAS Server Level for
the tool to be able to log the queries fired by users for accessing the cube.
What
are the different aggregation functions available in SSAS? Can you explain each
one of them?
SSAS supports the following types of
aggregation functions for Measures:
- Fully Additive
- Sum: This
is the default aggregation type and calculates the sum of all the child
members of a member at any level in the hierarchy, except for leaf level.
- Count: Calculates
the count of all the child members of a member at any level in the
hierarchy except for the leaf level.
- Semi Additive
- Min: Returns
the lowest value among all the child members.
- Max: Returns
the highest value among all the child members.
- FirstChild: Returns
the value of the first child of a member.
- LastChild: Returns
the value of the last child of a member.
- FirstNonEmpty: Returns
the value of the first non-empty child of a member.
- LastNonEmpty: Returns
the value of the last non-empty child of a member.
- ByAccount: This
aggregation type is applicable only if there is an Account Dimension in
the cube. A Dimension is treated as an Account Dimension by SSAS, only
when the Type property of the dimension is set to "Account".
This function aggregates the data based on the aggregation function set
for members of an Account Dimension. If an Account Dimension does not
exist, then setting this aggregation type is equivalent to No Aggregation
(None).
- AverageOfChildren: Calculates
the average of values of all the non-empty child members of a member at
any level of a hierarchy, except at the leaf level.
- Non Additive
- None: When
an aggregation function is set to None, no aggregation is performed and
the value of the member (both leaf and non-leaf) is returned directly.
- DistinctCount: Returns
the count of unique/distinct child member of a member at any level of a
hierarchy, except for leaf level.
What
is Time Intelligence? How is it implemented in SSAS?
Time Intelligence is a technique, which
allows us to align the Time Dimension in SSAS with our actual Calendar, thereby
making time calculations like Period to Date, Comparison across Parallel Time Periods, Cumulative aggregates,
etc. very simple without the need for us to write explicit MDX
queries/expressions.
For implementing Time Intelligence, the Type property
of the Time Dimension should be explicitly set to "Time" and the
attributes of this dimension should be set appropriately like year, month, etc.
SSAS offers a wizard called Business
Intelligence Wizard which allows us to add different types of Intelligence to
SSAS like Time Intelligence, Account, Intelligence, and Dimension Intelligence
etc.
What
are translations? What is its significance in SSAS?
Translations in SSAS allow us to bind
labels/properties of those objects in SSAS which can be represented in multiple
languages. In simple terms, Translations allow us to display the Labels and
Captions of various SSAS objects in different languages. In SSAS both metadata
and data can be translated. The objects which support Translations in SSAS
include databases, cubes, dimensions, attributes, hierarchies, measure groups,
calculated members, KPIs, and various other objects.
In today's world, business are growing
and expanding to a very large extent and tend to have presence internationally.
In such situations, it would be essential that, SSAS objects support
localization and people from different geographical locations be able to see
the information in their local language.
What
are perspectives? How they can be used to implement security in SSAS?
A perspective is a visual layer on top
of SSAS and is used to display a subset of the cube/dimension based on either a
specific subject area or based on the target audience, or any other scenario
which might require exposing a subset of cube/dimension to the
users/applications to simplify the view of data.
Perspectives are a very useful feature
especially when the cubes are too large in size containing many dimensions and
measure groups. Such large cubes can be too much data for the users, if the
data is not exposed using suitable perspectives.
Perspectives are not a security
mechanism. They cannot be used to implement security in SSAS. The security
settings/restrictions applied at the cube/dimension/data level apply at the
perspective level and no separate security settings can be defined at the
perspective level.
What
are the different levels where security can be implemented in SSAS?
Security is one of the most common and
most important needs for any application especially when working with sensitive
data. In SSAS, security can be implemented at various levels:
- Server Level: A user
is either allowed or denied access to the OLAP Server.
- Database Level: A user
is either allowed or denied access to specific databases on the SSAS
Server.
- Cube Level: Users
can be allowed or denied access to specific cubes within a database
especially when a database contains multiple cubes.
- Dimension Level: Users are either allowed or denied access to specific dimensions
and thereby enabling or restricting them from browsing the data using
those specific dimensions.
- Measure Level: Either
one or more measures can be exposed or restricted from the user.
- Cell Level: This
can be of two types: Security for dimension members and security for data
cells. With dimension members, users can be allowed or restricted to see
specific members of a dimension. With data cells, users can be allowed or
restricted to see a specific range of cells. In both of these scenarios,
users can be given read, read/write, etc. permissions.
What
is writeback functionality in SSAS? In what scenarios is it useful?
Writeback is a functionality in SSAS
which allows the users to write the data back into the cube. Meaning, while
browsing the cube, they can make changes to the data and those changes are
written back into the cube.
Writeback can be enabled primarily at
two levels, either at the dimension and/or partition. For dimension level
writeback, the users can modify the members of a dimension. For partition
writeback, users are allowed to modify the cells of data and hence it is
commonly referred to as cell level writeback.
Users require special permissions to be
able to use the writeback functionality to write the data back into the cube.
For dimension writeback, users should have the read/write permissions and the
"enable dimension writeback" property should be set. In the case of
cell/partition level writeback users need to have read/write permissions and
the partition should be enabled for writeback.
Writeback functionality can be useful
in many scenarios such as to perform What-If analysis. Say a business user
wants to understand how much impact a change in freight and other taxes can
have on profitability, which they can simply achieve by trying to set a
different set of values for freight and taxes then see the profitability
changing.
What
are the deployment options available in SSAS?
Below are the different options to
deploy a SSAS solution to an OLAP Server:
- Business Intelligence Development Studio (BIDS): This is one of the primary tools used for developing SQL Server
BI solutions and is the most frequently used IDE by the developer
community in SQL Server 2008. BIDS can be used to deploy the SSAS Objects/Solutions. BIDS was replaced by the SQL Server Data Tools in SQL
Server 2012, but has much of the same functionality.
- Analysis Services Deployment Wizard: Analysis Services Deployment Wizard is a tool included as part of the SSAS installation and can
be used to deploy the SSAS objects/solutions.
- XML for Analysis Services (XMLA): It is
used for deploying the SSAS objects to the OLAP Server. Apart from
deployment purposes, XMLA scripts can
be used for various other purposes like managing SSAS objects,
creating/modifying SSAS objects and processing SSAS objects, etc. It is
also known as the Analysis Services Scripting Language (ASSL).
What
is processing in SSAS?
Processing is the process to load data
into SSAS objects like Cubes, Partitions, and Dimensions, etc. on the OLAP
Server with the data from the underlying relational data warehouse. As part of
the processing step(s), aggregations are performed, data is loaded into one or
more pre-defined partitions and various actions, which involve data
modifications are performed in this step. The processed data is stored into
respective locations like the OLAP Server, relational data warehouse, etc.
depending on the Storage Mode defined for the different
objects.
What
is database synchronization in SSAS? What is its advantage?
Database synchronization is the process
of updating the data and metadata in a database based on the target SSAS Server
based on the data and metadata from the source SSAS Server. Analysis Services
offers a tool called the "Synchronize Database Wizard" to synchronize
databases on two different SSAS databases.
The primary advantage of a database
synchronization via the Database Synchronization Wizard is that, while the
databases are being synchronized by the wizard, the destination database
continues to be available and the users can access the database as usual. Once
the synchronization is completed, Analysis Services drops the out dated copy
(old database) and switches automatically to the new database which has been
updated.
What
are the different processing modes available in SSAS? Explain each one of them.
Below are the processing options
available in SSAS:
- Process Add: This
processing mode is available for Dimensions, Measure Groups and
Partitions. When this processing mode is applied to Dimensions, it adds
new members into the dimensions from the underlying database and updates
the descriptions of the attributes. When this processing mode is applied
to Measure Groups and Partitions, it adds the incremental fact data into
the cube and processes only the affected partitions.
- Process Clear: This
processing mode is available for Dimensions, Measure Groups, Partitions,
Cubes, Mining Models and Structures as well as OLAP (SSAS) databases. This
processing mode is used to remove the data from an Analysis Services
object. Data is removed from an object and all the subsequent child level
objects.
- Process Clear Structure: This
processing mode is available only for Mining Structures as part of the
Data Mining component of SSAS. This processing mode is used to clear the
training data previously loaded into a mining structure.
- Process Data: This
processing mode is available for Dimensions, Measure Groups, Partitions,
and Cubes. This processing mode is used to process the data without
actually building or creating the aggregations and indexes.
- Process Default: This
processing mode is available for Dimensions, Measure Groups, Partitions,
Mining Models and Structures, Cubes, and OLAP (SSAS) databases. When this
process is applied to any of these objects, it first determines the current
state of the object from a processing standpoint and then does the
necessary processing to bring the object to current/fully processed state.
- Process Full: This
processing mode is available for Dimensions, Measure Groups, Partitions,
Mining Models and Structures, Cubes, and OLAP (SSAS) Databases. When this
processing mode is applied to an object, the object itself and all the
subsequent lower level objects are fully processed. The data is cleared
from an object if it already has some previously processed data.
- Process Index: This
processing mode is available for Dimensions, Measure Groups, Partitions,
and Cubes. This option works only on the processed objects and for
unprocessed objects you will get an error. When applied on any processed
objects, it creates aggregations and indexes if they don't already exist.
It recreates them if they already exist.
- Process Structure: This
processing mode is available for Mining Structures and Cubes. When this
processing mode is applied to a cube, it creates the cube structure and it
also processes the cube dimensions if they are not already processed. This
processing mode loads the data into the Mining Structure, when applied on
it.
- Process Update: This
processing mode is available only for Dimensions. When this processing
mode is used, it freshly reads all the dimension data from the database
and updates the data in the dimension.
What
are the different ways to process objects in SSAS?
SSAS objects can be processed using
various techniques. The following are the most popular tools and techniques
used to process objects in SSAS:
- SQL Server Integration Services (SSIS): Analysis Service Processing Task in SSIS can be used to process SSAS objects.
- Business Intelligence Development Studio (BIDS): SSAS Objects can be processed via Business Intelligence Development Studio from within the SSAS Project. BIDS was replaced by the
SQL Server Data Tools in SQL Server 2012, but has much of the same
functionality.
- SQL Server Management Studio (SSMS): SQL Server Management Studio can be used in multiple ways to
process SSAS objects.
- Object Explorer:
Using SSMS, one can directly connect to Analysis Services Server and
process the objects through the Object Explorer (GUI).
- XMLA: An XML for Analysis Services (XMLA) script can be executed directly in SSMS to process the SSAS
objects.
- SQL Server Agent: An
XMLA script can be executed using a SQL Server Agent Job
which also helps in scheduling the script to perform the processing
during off hours.
- Command Line: SSAS offers the following
two command line utilities which can be used to process SSAS objects:
- Power Shell: Power Shell is
a well known and very powerful command line utility which is part of SQL
Server. It can be used to process SSAS objects. Apart from this, Power
Shell can be used to perform various other operations within SQL Server.
- ascmd:
The ascmd utility
can be used to execute XMLA scripts as well as MDX and DMX queries
against an instance of SSAS. Using this utility, one can execute the
relevant scripts to process the SSAS objects.
- Programmatically:
Analysis Management Objects (AMO) is a collection of SSAS management
classes, which can be used for processing SSAS objects. Apart from this,
AMO can be used for various management activities in SSAS.
What
are the programmability options available in SSAS?
AMO (Analysis Management Objects) is a
set of classes used for managing an Analysis Services instance and can be used
to perform various management activities related to SSAS. Using AMO, the
following operations can be performed on the SSAS objects:
- Creation
- Modification
- Deletion
- Processing
With AMO, one cannot query the SSAS
objects. To query the data from SSAS programmatically, one can use ADOMD.NET
which is a .NET based data provider. It is primarily used to communicate with
SSAS and uses Multidimensional Expressions (MDX), Data Mining Extensions (DMX),
and Analysis Services Scripting Language (ASSL), etc. commands to query the
data.
What
are the different querying and programming languages supported by SSAS?
The following languages are supported
by SSAS:
- T-SQL: Transact-SQL queries
can be used to query certain aspects of SSAS.
- MDX: It stands for Multidimensional Expressions. MDX queries are the primary means of querying data in SSAS.
- DMX: It stands for Data Mining
Extensions. DMX is used for queries related to Data Mining in SSAS.
- AMO: It stands for Analysis
Management Objects. It is .NET based and can be used to manage SSAS
objects, creating/modifying/deleting SSAS objects and implementing
security in certain scenarios.
What
are the different tools used for accessing cubes?
Here is a list of different tools which
are used for accessing Cubes in Analysis Services:
- Business Intelligence Development Studio (BIDS): BIDS is used to access the cube and preview the data at the
design time. This option is useful only for the developer community and is
not applicable to business analysts and decision makers. BIDS was
replaced by the SQL Server Data Tools in SQL Server 2012, but has much of
the same functionality.
- SQL Server Management Studio (SSMS): SSMS is used to access the cube by people in different roles. It
is used to access the cube by the development/testing teams to validate
the data. Those business analysts and decision makers (very few) who are
familiar with SSMS; also use it for accessing the cubes.
- Microsoft Office Excel: Excel
being one of the most widely used data analysis tool, has the capability
to connect to cubes and perform different kinds of reporting such as pivot
tables, charts, etc. It is used by many decision makers, business
analysis, and even developers for reporting purposes.
- SQL Server Reporting Services (SSRS): SSRS allows Analysis Services databases or cubes as a data source
to build reports. This option is available both in BIDS\SQL Server Data
Tools while building canned reports and in Report Builder while doing
ad-hoc reporting.
- PerformancePoint Services (PPS): PPS which is now a part of SharePoint 2010 allows building KPIs,
scorecards, reports, and dashboards with Analysis Services cubes as data
source.
These are the major tools in the
Microsoft BI stack which support Analysis Services data sources. Apart from
these, there are various third party tools which support Analysis Services as a
data source.
How
do you capture which users are connecting to SSAS, which objects of SSAS are
being used, which queries are being executed, etc.?
Analysis Services offers Dynamic Management Views (DMV),
which are along the same lines as the DMVs for the relational database engine.
Using the DMVs is the most common approach to capturing these statistics and to
monitor the current status of the SSAS server. With the SSAS DMVs, one can
capture various statistics including:
- Utilization of resources such as CPU, memory, etc.
- Usage of aggregations, hierarchies, and other components of SSAS.
- Who is connecting to the OLAP database and what are the different
objects being used.
- Queries being executed by the users, time taken for executing
different user queries, and so on.
1.
What do you mean by Microsoft Business Intelligence and what components of SQL
Server supports this?
§
Microsoft defines its BI solution
as a platform to provide better and accurate information in a
easily understandable format for quicker and better decision making. It
consists of BI tools from SQL Server, SQL Server Integration Services (SSIS),
SQL Server Analysis Services (SSAS), SQL Server Reporting Services (SSRS),
Microsoft SharePoint and its Office products
There are 5 key areas
to test:
1.
Cube design: the stages of
creating a cube in BIDS, i.e. data source, DSV, dimension, cube.
2.
Performance
tuning: troubleshooting query performance as well as processing
performance.
3.
MDX: the multi dimensional expression
that we all love.
4.
Client tools: creating reports and
dashboards on Strategy Companion, Excel, Tableau, Panorama, etc.
5.
Administration: installing and
upgrading SSAS servers, configuring SSAS server settings, security, processing,
deployment, migration, backing up and restoring the cubes.
Beyond these 5, I
think it is very important for a SSAS developer (and ETL developer) to
understand the concept of dimensional modelling well. Hence I like throw in one
or two Kimball questions. Usually candidates understand dimensional modelling,
at least the basic, but we need to make sure.
1.
Cube design
Out of the 5 things
above, Cube Design is the most important one. It is the bread and butter of the
job. We need to establish 2 things here:
a) Do the candidates
know the basic cube design (things that your company use now)
b) Do they know the
advanced cube design (features that your company don’t use now)
Because it depends on
the SSAS features your company use, what constitutes “basic” are different from
company to company. But it’s probably not far from this list:
- Data source: connection string, impersonation
- DSV: table, view, named query, relationships
between entities
- Dimension: hierarchy, sort order, attribute
relationships, OrderByAttribute, display folder, default member
- Cube: referenced dimension, many-to-many,
partitioning, aggregation, measure expression, format string, calculated
members, error configuration, display folder
- Role: membership, cube access
Advanced cube design:
- DS & DSV: multiple DS, multiple DSVs,
logical primary key, friendly name, retrieve relationship, schema
restriction, number of connections
- Dimension: parent child dim, translations,
date calculation dimension, multiple KeyColumns, ragged dim, type 2 dim,
custom rollup, unary operator, write enabled, ROLAP.
- Cube: actions (URL & drill through),
translations, perspectives, HOLAP, proactive caching, input dimension
(dimension which is not connected to any MG, but used in calculation),
IsAggregatable, KPI.
- Role: dimensional security, cell security
- AMO: doing all of the above programmatically
using C#.
I usually like to
establish first if the candidate can do the job, i.e. basic stuff above/things
that we used in the company. I do this by picking 3 from the basic stuff, see
if they are comfortable. If they do, I pick 3 from the advanced stuff. If they
are not comfy with the basic, the question I need to answer is whether the
candidate would be able to sort it out by themselves on the job. I do that by
giving them some hints.
Whenever possible, I
prefer to wrap the question in a business context, rather than bare theory. For
example, question a) below is with business context, where as question b) is
theory:
a) Explain how you
handle currency conversion.
They can use measure expression, many-to-many, but we know immediately if they have done this stuff or not.
They can use measure expression, many-to-many, but we know immediately if they have done this stuff or not.
b) Explain different
modes of impersonation.
That is just theory. I prefer to give a context i.e. when would we want to use service account and when to use a specific user name? Or: if the connection string already defines how to connect to the database, then what’s impersonation for?
That is just theory. I prefer to give a context i.e. when would we want to use service account and when to use a specific user name? Or: if the connection string already defines how to connect to the database, then what’s impersonation for?
2.
Performance tuning
I think what I wrote
here [Optimising
Cube Query and Processing Performance]
is quite comprehensive to be used as a base for our questions. As per its
title, it covers the troubleshooting of both query performance and processing
performance.
For query performance
we can ask MDX queries, aggregation, partitioning, server settings. For
processing performance: dimension, materialized view, partitioning and the most
important of them all is incremental processing. I explain more about
processing performance here.
Again I like to wrap
them in a case, for example: a cube takes 5 hours to process and we need to get
it down to 1 hour. How can we do that? Many candidates never optimise dimension
processing, but any good SSAS developer will at least understand partitioning.
An expert developer may even give you a lecture about why aligning table
partitioning and incremental partition processing are very important.
Another favourite
question of mine is what to do if a) a query is slow, and b) a certain measure
is slow (other measures are fine). For a) trap the MDX, run it in MDX Studio
(thanks Mosha) and find out which part makes it slow by altering the query bit
by bit. For b) establish first if it’s a calculated measure or base measure. If
it’s calculated, look at the MDX and analyse on MDX Studio. If it’s a base
measure, look at the storage mode (ROLAP?), the dimension usage tab
(relationship type = fact or many to many? Try removing some relationships
& reprocess), measure expression (is it a simple A/B formula?), simplify
the attributes involved in a query (perhaps the issue is with the attributes,
not with the measures?).
3.
MDX
This is always
tricky. Not about figuring out what to ask, but what level of MDX do you need
in your company/project. Any SSAS developer can do basic MDX, right? Not quite.
Yes they can do the basic MDX select statement, but they might not know even
the basic stuff like filter and sorting. So we need to test those. I’d
recommend the following list:
Count, YTD, parallel
period, filter, existing, order, first/lastchild, as/descendents,
currentmember, if/iif/case, existing, head & tail, crossjoin, nonempty,
except, scope, top/bottomcount.
They are considered
every day use. Any SSAS developer who always queries their cubes using client
tools would be separated from those who write MDX to query the cubes. Obviously
we don’t need to ask all of them, just pick 3 or 4.
Before we test those
functions, it’s worth establishing if they grasp the concepts of
multidimensional database (OLAP) or not. For example: tuple, set, cube space,
axis, level, member. If they can’t differentiate a tuple from a set, it’s
probably not worth asking them any MDX function.
You might argue that
“O but we don’t use MDX here”. But it is essential that they understand the
concept and the basic functions. You can’t effectively compose SSRS reports
displaying data from your cubes if your SSAS developer can’t write MDX. You
can’t effectively build a cube if you don’t understand multidimensional
database concepts.
XMLA (ASSL) is a bit
different. They are not as important as MDX because you can script them on
SSMS. Whether it is to create or to alter SSAS objects, or to process them, you
can script them on SSMS. Besides, we have AMO for creating/altering SSAS
objects programmatically. With AMO you get proper development environment like
looping and branching, so you have more control. And it’s .NET. True that you
can use tools like Excel, Panorama & ProClarity to get the MDX, or use
Profiler to trap Strategy Companion & Tableau’s MDX statements, but you
can’t optimise the calculated members if your SSAS developer doesn’t know MDX.
Another thing to
consider is whether you need to cover DAX. This depends on whether you are
using (or will be using) PowerPivot or not. But DAX is becoming more and more
central in MS OLAP technology so in the long run (like 2 years from now) you
will need to cover DAX. But for now it’s still optional, depending on the use
of PowerPivot.
4.
Client Tools
Specific features are
different from one client tool to the next, but the basic ideas are similar:
creating report, chart and dashboard. Some of the specific features to test
are: create custom set, create calculated member, time calculation function
such as YTD and QTD, hyperlink to differ reports, drilldown. One of my
favourite questions is whether a specific calculation should be done in the
client tool or in SSAS.
The administration of
the client tool is usually done by somebody else, not the SSAS developer, so we
shouldn’t worry about it in this interview. But if you are a small company then
usually one person does it all, in which case you will need to ask the
candidate about installation, migration, user security, and other aspects of
the client tool.
5.
Administration
Administration tasks
are not the responsibility of an SSAS developer. They are the responsibility of
the SQL DBA. These days, in large corporations, SQL Server DBAs are asked to
manage all 4 aspects of SQL Server. They manage not only the relational engine,
but also SSAS server, SSIS packages and SSRS servers. But in smaller companies
SSAS developers are involved in administering the servers. So I include this
section (section 5, Admin) in the interview.
The
Questions
Enough with the
background, let’s get on with the questions. I’m going to label each question
with 1-5 indicating the 5 areas above, and LMH (low, medium, high) indicating
the difficulty. I’ll put the label in curly brackets: {}. Hopefully the labels
help you compose the right mix of interview questions.
1.
How do you ensure that January,
February, March, etc will be in the correct sequence when the user browse the
cube? {1-L}
2.
In SSAS, how do you design currency
conversion for a) from many transaction currencies to 1 reporting currency, b)
from 1 transaction currency to several reporting currencies and c) from many
transaction currencies to many reporting currencies? {1-M}
3.
A user reported that a Panorama report
that she uses every day is very slow today. Normally it opens in 1 or 2
seconds, but today it’s didn’t open at all (she has waited for 4 minutes).
Explain the approach how you are going to handle this case systematically.
{4-L}
Note: you can still ask this question if though you are not using Panorama.
Note: you can still ask this question if though you are not using Panorama.
4.
We have a risk analysis cube. This cube
is 500 GB in size, containing 20 billion fact rows. Explain how we: a) process
this cube efficiently; b) ensure a good query performance. {2-M}
5.
Explain (step by step) how you build a
cube from a data mart. {1-L}
6.
Explain how you migrate a cube from a
dev AS server to production. {5-L}
Follow up with advantage/disadvantage of each approach. {5-M}
Follow up with advantage/disadvantage of each approach. {5-M}
7.
A cube has 300 partitions and we want
to process only the partitions where the fact table row changed. We don’t want
to process the entire measure group. How do we do that? {2-M}
8.
We have a cube with 1 measure and 1
dimension. That dimension has only 1 attribute. Write an MDX to list the
members of that attribute of which measure1 is bigger than 10. We want to sort
it in descending order, i.e. largest amount first. {3-L}
9.
Like above but sort on the attribute,
not on the measure. {3-L}
10. People say that we shouldn’t use cell security
because it’s slow. Explain how we should use it then? How do you combine it
with dimensional security? {2-H}
11. What’s the difference between a tuple and a set?
What is an axis? {3-L}
12. You need to grant access to 200 users to access a
cube. Each user can only access certain data within the cube. The access is by
product and region, i.e. user1-20 can only access product1-50 and region1-3,
user21-40 can only access product30-80 and region2-5. How would you do that?
{5-M}
13. Some users need to see the sales figures in USD,
other users in GBP. How would you do that? {1-H}
14. You need to upgrade 4 SSAS servers (dev, UAT, prod,
DR) from 2005 to 2008. Explain would you approach this. {5-L}
15. People say that we can have a real time cube using
ROLAP or ProActive Caching. Explain how, and the plus/minus of each approach.
{1-H}
16. Cube processed OK but all measures return no value.
How do you troubleshoot this issue systematically? {1-M}
17. How do you do Slowly Changing Dimension type 2 in
SSAS? {1-M}
18. The data mart only has 1 measure: amount. This
measure could be asset, expense, income, etc depending on an attribute called
“Account”. How do you implement this in SSAS cube? {1-H}
19. The value of measure “price” is only valid if the
product dimension is at the leaf level and the location dimension is at the
“store” level. How do you set it? {2-M}
20. Attribute relationship: In what case would we want
to set: a) the cardinality to “one to one”? b) The relationship type to rigid?
What’s the risk (of doing so)? {1-M}
21. Why do you need to design aggregations? How do you
design aggregations (usage-based and non usage-based)? Explain how you test it.
{1-H}
22. People say that perspective is not a security
measure. What do they mean by that? {1-M}
23. Star vs snowflake: what’s the benefit of having a
snowflake data mart in SSAS? {1-M}
24. We have a dimension which is as big as the fact
table (in terms of the number of rows). What do you suggest we do with this
dimension? And with regards to SSAS, what would you do differently? {1-H}
25. Excel: how do you change language from English to
French when browsing a cube?
Answers:
1.
Order by key and put month number in
the key column. If they can’t answer this you should worry.
2.
It’s best if the candidate can explain
about the currency conversion BI wizard. But if he can explain the “traditional
way” it’s quite good too: using calculated measure and many-to-many
relationship.
3.
As with any support call, we need to
find out which report. Try to reproduce what the user did in NovaView. Find
what the MDX is (Tools menu, Direct MDX, ctrl-alt-V), execute in SSMS. If this
is slow then find out which part of the MDX makes it slow by building the query
step-by-step. Then consider: a) aggregation, b) block computation, c) convert
to base measure (do calculation in the relational/ETL), d) improve the MDX if
it’s a calculated measure, e) partitioning. Most of the time, when a user says
that usually it is 2 seconds but today it timed out, it is not the MDX or the
cube structure. It’s usually either the server or the connection. So check that
a) the server is up and running and b) from Panorama you can “open” the cube in
that server.
4.
Incremental processing (IP). If you
have huge cubes and the candidate has never done IP before, end the interview.
If you have huge cubes in your company (100 GB), your questions should be
focused on performance (processing & query). The job spec should clearly
say: wanted: SSAS performance tuning specialist (not a “normal” AS developer)
and prepare to pay 20% premium.
5.
It’s an easy question; every single
candidate should be able to answer this question as it is bread and butter. But
their answer shows their class. One of the important things I’m looking for
here is requirement analysis. An SSAS developer who “just build it” without
analysing the requirements is usually less helpful then those who analyse the
requirements. The other thing I’m looking for here is performance, e.g. when
building the DSV do they ensure that the SQL is efficient? When creating
calculated measure, do they ensure that the MDX is efficient? The last thing
I’m looking for here is how they do unit testing.
6.
a) backup-restore, b) script
(XMLA)-execute-process, c) deployment wizard.
7.
First detect (in relational) which
partition changed (you need some kind of flag / timestamp column on the fact
table). In the metadata table, update the status of each partition of which the
data has changed. Then based on this metadata table, process the changed partitions
using either a) AMO or b) “AS execute DDL” task in SSIS.
8.
See below.
9.
See below.
11. A tuple is a cut of a cube; a set is a collection
of several tuples (could be 1 tuple). A tuple is enclosed with parentheses: ();
a set is enclosed with curly brackets: {}. An axis is an edge of a
multidimensional result set returned by an MDX query (columns, rows, pages,
sections, chapters)
12. Put users into AD groups and add these groups as
the members of the cube roles. For each role, define the dimensional security
(dimension data tab) on product and region dims.
13. Create 2 roles: USD and GBP and add users into
these roles via AD groups. In each role, set the default member in the currency
dimension using dimensional security (dimension data tab in the role).
14. Upgrade Dev first (document the process), test the
cubes by running the reports (Panorama/Strategy Companion/Excel, etc). Use the
documented process to upgrade UAT, and test the UAT cubes. Then upgrade prod
and DR at the same time (using the documented process). Or DR first, test then
Prod. Before upgrading Dev, you might want to copy a Prod cube to Dev and
export some reports into Excel. After upgrading Dev, run the same reports and
compare with the results before upgrade.
15. There are 2 things here: dim and MG. In reality, a
real time cube doesn’t mean that all dims and all MGs are real time. But only 1
or 2 MG are real time. Usually all dims are MOLAP, perhaps with the exceptions
of 1 or 2 dims. To make an MG real time, we can define it as ROLAP storage
mode. For performance reason, best not to define the whole of MG as ROLAP, but
only 1 partition (usually the last one, if partition by month/period). Unless
it’s a small MG (<100,000 rows). For dims, to make it ROLAP set the
StorageMode property. ProActive Caching (PC) provides automatic management of
MOLAP storage. The cube incorporates the changes in the relational tables,
based on a notification mechanism. You can do PC on partition and dim. If the
fact table (or partition of the fact table) is small, and data is changing
frequently (like every minute), best to use ROLAP. If the fact table (or
partition of the fact table) is large (like 8m rows in partition), and the data
is rarely changed, best to use PC.
16. Do “explore data” on the fact table on DSV. Run the
SQL if it’s a named query. If this doesn’t return anything, check the fact
table. If this returns rows, check the dimension usage tab. Focus on 1 MG.
Remove all relationships (keep the cube dims), reprocess cube and see if you
get a measure at the top level. If you do, add the relationship to the dims one
by one until you find out which dim causing the issue.
17. No special settings, just create the dim as normal
from the SCD2 table. The only difference that it has 3 extra attributes:
IsCurrent, Valid From & To date.
18. Parent child dim, unary operator column, custom
rollup.
19. Scope statement
20. a) If for each member of attribute1 there is only 1
member of attribute2, b) if “member mappings” are fixed. Risk: if a member
changes its mapping.
21. Why: to increase query performance. How (usage
based): OLAP query log. How (non usage based): based on attributes which are
queried/used most. Refer to SSAS2008PerfGuide section 3.4.3.1. Test in Mosha’s
MDX Studio, before and after, 2 different cubes, same query.
22. You can’t prevent access to dim/attribute/measure
using perspective. They just can’t see it, but if they know the name of the
dim/attribute/measure they can access it.
23. Flexibility in building a dimension, speed of
processing. No difference in query performance.
24. Dimensional modelling: break into smaller
dimensions and/or snow flake the dim. SSAS: limit the number of attributes
(don’t just take all attributes but be selective), index on those attributes
(relational), limit the number of rows going into the cube by using a where
clause on the DSV/view, put attributes with lower grain (smaller number of
distinct values) into a referenced dim, linked via the main dim to the MG.
Number 8:
1
2
3
4
5
6
7
8
9
10
|
select order
( filter
(
{[dim1].[attribute1].[All].Children},
[Measures].[Measure1]
&gt; 10
),
[Measures].[Internet
Sales Amount],
desc
) on rows,
[Measures].[Measure1] on
columns
from [Cube]
|
Number 9:
1
2
3
4
5
6
7
8
9
10
|
select order
( filter
(
{[dim1].[attribute1].[All].Children},
[Measures].[Measure1]
&gt; 10
),
[dim1].[attribute1].CurrentMember.Name,
desc
) on rows,
[Measures].[Measure1] on
columns
from [Cube];
|
Learn More About
SSAS
Before discussing
any future questions, it is crucial to understand their basic concepts as well
as their applications. To start, the Analysis Service is generally used for
analytical processing, Online Analytical Processing (OLP) and data mining. SQL
Server Analysis Service enables integration and management of data from various
data sources to cater for business intelligence applications. These are all an
integral part of the SQL server enabling creation and the management of highly
complex and multidimensional data structures in addition to the identification
of vital data through data mining. There are specific questions listed in
the following paragraphs that will certainly be included in SSAS interviews
regarding their core concepts.
Describe the Significance
of SSAS
In order to perform
analysis and forecast operations, you need to have SSAS in SQL server. The
benefits of SSAS are many, it:
- Is convenient with user-friendly interactive features.
- Enables creation and management of
data warehouses.
- Offers high end security
principles for enhanced efficiency.
- Provides speedy analysis and
troubleshooting options.
How Many Languages
does SSAS Use?
Another basic
question that will examine your knowledge regarding SSAS will refer to the
languages used. Here are the four main languages:
- SQL- (Structured Query Language).
- MDX- (Multidimensional
Expressions) for analysis.
- DMX- (Data Mining Extensions) for
data mining operations.
- ASSL- (Analysis Service Scripting
Language) for managing analysis service databases.
What Types of Data
Sources does SSAS Support?
Data sources are
the bridge between SSAS and the database where the data is loaded into the
former during processing. A data source will invariably have a provider,
database name, server name and impersonation information. Both .Net and OLE DB
providers are supported by SSAS. Here is a list of supported sources:
- SQL Server
- MS Access
- Oracle
- Teradata
- IBM DB2
- Any other relational database
having appropriate OLB DB provider
What is Your Role
as an Analysis Services Information Worker?
This question
speaks to your very existence in the organization, and there should be a clear
understanding of what is expected from you and how you are going to accomplish
company goals. The role of modern day analysis services information worker has
its origin from the Domain Expert. In addition, you may be assigned with the
management of anything ranging from business analysis, technical training and
management of help desk/operation to Network Administration.
Definition of
Certain Terms and their Significance
You will more than
likely come across many technical terms that are related to SSAS. It is
important to know and understand them in order to give you an upper edge in
interviews.
How Do You Define a
Partition and How Does it Affect Operations?
The physical
location of stored data or a cube is a partition, and it is an effective and
highly flexible technique for managing large cubes. It enhances the efficiency
and potency of analytical services because partitions with advanced techniques
such as aggregation make queries run faster by enabling them to look into only
those measure groups that provide answers.
What is the
Difference Between OLAP And OLTP?
Online Analytical
Processing (OLAP) and Online Transactional Processing (OLTP) both function in
different ways.
- OLAP is designed for daily
business decisions whereas OLTP is used for daily business operations.
- The frequency of data updates is
very high in OLTP with the most recent data getting integrated
consistently. However, in OLAP, data update frequency is relatively low.
- OLAP uses a large amount of
historical, and OLTP only use a very limited amount of historical data.
What are some UDM
Benefits?
Unified Dimension
Model (UDM) is a method used to gap the bridge between the user and data
sources. It ensures a high performance ratio by effectively managing queries
involving huge volume of data and complicated processes such as interactive
analysis.
Questions Related
to the Functional Aspects
How Much Do You
Know About Impersonation?
Impersonation is
the means used by SSAS in data operations such as processing, data access, etc…
in order to identify the client application or ascertain its security. There
are four major forms for using impersonation:
- Using the service account.
- Using the current user
credentials.
- Using specific user name and
password for Windows.
- By default method of selecting
impersonation method appropriate to the type of operations.
How Do You Use
Named Calculation and Named Query in SSAS?
Named calculation
is essentially a column with an expression added to the table in DSV. It should
always conform to the language of the underlying data source. Named query, on
the other hand, combines data from one or more tables based upon the underlying
data source to form a new table based on a SQL query.
How Do You Combine
Multiple Data Sources in SSAS?
To combine multiple
data sources, you need to create a DSV from a source that serves as the primary
data source throughout operations. Generally, SQL server is found to be the
best source for creating an initial DSV. Afterwards, additional data sources
can be added into the DSV to act as secondary sources and you can incorporate
as many tables as you want into the DSV from these secondary sources.
The Microsoft Business Intelligence (BI) suite
provides a very powerful technology called SQL Server Analysis Services
(SSAS) to deliver OnLine Analytical Processing (OLAP) and data mining for
business intelligence applications.
The purpose of analysis services is to turn data into information and to provide quick and easy access to that information for decision makers. SSAS provides OLAP by letting you design, create and manage multidimensional structures that contain data aggregated from other data sources, such as relational databases and provides many data mining algorithms for mining data from data sources. So for delivering OLAP and data mining it uses client and server technologies.
The purpose of analysis services is to turn data into information and to provide quick and easy access to that information for decision makers. SSAS provides OLAP by letting you design, create and manage multidimensional structures that contain data aggregated from other data sources, such as relational databases and provides many data mining algorithms for mining data from data sources. So for delivering OLAP and data mining it uses client and server technologies.
The main idea of SSAS is to provide fast results
from data sources when we apply a query because in order to make a decision we
need data of various dimensions. OLAP does this by pre-calculating and storing
aggregates. This requires a lot of disk space, but here it provides the ability
to explore the data very quickly because we have already processed the data.
Now there are two basic questions that come into your mind:
Now there are two basic questions that come into your mind:
- Why
- How
Am I right? So now I will answer both the
questions as in the following:
- Why: Most of us use
relational databases, so in a relational database, data is stored and you can
fetch the data by applying queries, generating reports and so on, but here
if we think about queries and reports we can access data of certain things
but by utilizing analysis services we can play with data, or slice and
dice the data with very less effort.
The most important thing about this is that you can provide the power of analysis of data to someone else who is not a DBA or a database professional or developer.
- How: This is the most
important question, how to use analysis services for OLAP and Data mining.
Even I can't say, it is very easy to work on analysis services, it's a
challenge in itself. There are many tutorials provided for this so you can
use them. This is my first Article on SSAS so as soon as I get time I will
soon write a full series of articles on this.
Now I am starting the core part of analysis services. Here I am not jumping directly into the hands-on of how to create a cube at all. I am first providing an overview of the internal structure of SSAS.
SSAS uses both client and server components to provide OLAP and data mining functionality:
- The
Server component is implemented as a Microsoft Windows service. SSAS
supports multiple instances on the same computer, so each instance is
implemented as a separate instance of a Windows service.
- Clients
communicate with analysis services using XMLA (XML for Analysis), a
SOAP-based protocol for issuing commands and receiving responses, exposed
as a web service.
- Query
commands can be issued using the following languages:
- SQL: This is a structured query
language, most of us know about it very well, so I will not describe it
here.
- MDX
(Multidimensional Expression): MDX is a query language to retrieve
multidimensional data from SSAS. MDX is based on the XMLA specification.
- DMX
(Data Mining Extensions): This language works with data mining models.
You can use DMX to create a new structure of data mini models, to train
these models, to browse and to manage and predict against them.
- AASL
(Analysis Services Scripting Language): Most of the communication
of the SSAS client happens using SOAP messages. AASL is the XML dialect
used for these messages, made up of two parts; the first is DDL or
object definition language, that defines and describes an instance of
Analysis Services and the second is a command language that sends action
commands, such as Create , Alter or Process to an instance of Analysis
Services.
Since I have already said that SSAS is based on
client and server components, I will now explain both components of the
Architecture in detail.
- Server
Architecture: This runs as a Windows service. The
Msmdsrv.exe application is a server component. This application consists
of security, XMLA listener, query processor and other components that
perform the following tasks:
- Client
Architecture: SSAS has a thin client Component Architecture.
All queries and calculations are resolved by the server only. So for each
request a server to client connection is required. There are several
providers with SSAS to support various programming languages. These
providers communicate using SOAP packets. You can better understand this
by the following diagram:
SSAS also has a web architecture with a fully scalable middle tier for deployment by both small and large organizations. SSAS also provides middle tier support for web services.
Both client and server can communicate without using any provider. They can communicate by sending XMLA in a SOAP packet. Check the diagram given below for a better understanding.
I know many of you are feeling very boreD after reading so much theory. Thanks for your patience. I will now finally describe very the important keywords or I can say core concepts, that are the building blocks for SSAS. So if you have become to fell sleepy, now is the time to wake up and read these keywords very carefully because without this we are not able to understand SSAS.
Now without wasting any time I am explaining the following keywords:
- Cube: This is basic unit of
storage and analysis in SSAS. It is a multidimensional data structure that
holds data that's been aggregated to return data quickly when a query
fires.
A Cube is a related set of measures and dimensions used to analyze data. A cube is essentially synonymous with a Unified Dimensional Model (UDM). - Dimension: Dimensions are a group
of attributes based on columns of tables of a view. Dimensions are always
independent of a cube so they can be used in multiple cubes. Dimensions
are the criteria onto which analysis of business data is performed, like
time, geography and so on. For example time can be years, months,
quarterly, weekly and so on.
- Dimension
Table: This
table contains the entire dimensions onto which you want to summarize your
data, like in the case of a time dimension this table can contain Year,
Month, Quarter, Week and so on.
- Fact
Table: A
fact table contains the data that measures the organization's business
operations. A key characteristic of a fact table is that it contains
numerical data (facts) that can be summarized to provide information about
the history of the operation of the organization. Any table that you are
using for a function like sum or average can be called a fact table.
- Measures: A Cube is based on one
or more measures that are nothing but a column of facts table that you
want to analyze.
- Schema: A fact table and a
dimension table are always related and this relation inside the cube forms
a schema. There are multiple types of schema but the most common two are
start and snowflake schema. I will write a separate article for explaining
the various kinds of schema because here I don't want to complete all in
short.
Here I have described all the basic concepts as
much as possible in this article. Now I will define the objects or I can say
indexes for our hands-on articles that I will cover in my next articles. But if
required in between hands-on, I will post some theoretical articles like this
so with hands-on we can also understand the concepts.
- Create
SSAS Project in BIDS
- Create
a data source
- Create
Data Source View
- Create
Dimension
- Create
Cube
These are the basic topics that I will cover in
future articles and surely this list will increase as soon as I write more
articles.
Business intelligence is a collection of skills,
technologies and practices, by which we can gather data, analyze data and
provide data for forecasting and better decision making.
BI system has 3 important phases, Collect, Analyze
and View:-
·
Collect data: - Data in an enterprise can be stored in
various formats. Now these formats can vary from normalized structured RDBMS to
excel sheets or probably unstructured file formats. So the first step in BI is
to collect all these unstructured and scattered data and bring them in to one
uniform format.
This can be achieved in 3 steps:-
1.
Extract: - In this step we read unorganized data from
these sources by understanding their data structure.
2.
Transform: - In this step we transform the data in to a
standard format.
3.
Load: - Finally we load the standard format in to a data
warehouse and data mart (In the next part we have explained what data
warehouse, data mart and OLAP?).
·
Analyze data: - Once the data is loaded in to Data ware
house, you run tools, algorithms so that you can analyze and forecast
information.
·
View data: - Once you have analyzed the data you would like view
it. Now again how people want to view data can vary from simple tabular format
to complex graphical chart. So in this section we would need good reporting
tools to achieve the same.
Data warehouse is a database which is used to store
data for reporting and data analysis. Data in data ware house can come from
disparate sources like structured RDBMS, Files or any other source. ETL ( ETL
has been explained in the previous question) process fetches data from these
sources and loads it in to data warehouse.
Data warehouse can be further divided in to data
marts. Data warehouse focuses on wide data while data mart focuses on specific
processes.
Data warehouse are also termed as OLAP systems
(OLAP is explained in the next question) . The database designs for these
systems do not follow conventional normalization (1st, 2nd or 3rd normal form)
design. Most them use denormalized design like star schema and snow flake design.
They normally store data in fact and dimension tables.
Both OLTP and OLAP are types of IT systems.OLTP
(Online transaction processing system) deals with transactions ( insert, update
, delete and simple search ) while OLAP (Online analytical processing) deals
with analyzing historical data, forecasting etc.
Below is a simple table which chalks out the
differences.
OLTP
|
OLAP |
|
Design
|
Normalized. (1st normal form,
second normal form and third normal form).
|
Denormalized (Dimension and Fact
design).
|
Source
|
Daily transactions.
|
OLTP.
|
Motive
|
Faster insert, updates, deletes and
improve data quality by reducing redundancy.
|
Faster analysis and search by
combining tables.
|
SQL complexity
|
Simple and Medium.
|
Highly complex due to analysis and
forecasting.
|
The most important goal of OLAP application is
analysis on data. The most important thing in any analysis are “NUMBERS”. So
with OLAP application we would like to get those numbers, forecast them,
analyze them for better business growth. These numbers can be total sales,
number of customers etc.
These numbers are termed as “Measures” and measures
are mostly stored in “Fact” tables.
“Dimension” describes what these measures actually
mean. For example in the below table you can see we have two measures 3000 units
and 1500 $. One dimension is “ProductWiseSales” and the other dimension is
“AgewiseSalary’.
Dimensions are stored in dimension table.
Dimension
|
Measures
|
ProductWiseSales
|
3000 units
|
AgeWiseSalary
|
1500 $
|
A cube helps us to get a multi-dimensional view of
a data by using dimension and measures. For instance in the below table we have
“Units” as measure and we have 3 dimensions Product wise, Year wise and Country
wise.
Product Name
|
Units |
Year
|
Country
|
Shirts
|
3000
|
2011
|
India
|
Shirts
|
1000
|
2012
|
India
|
Pants
|
4500
|
2011
|
India
|
Pants
|
200
|
2012
|
India
|
Shirts
|
3100
|
2011
|
Nepal
|
Shirts
|
1200
|
2012
|
Nepal
|
Pants
|
600
|
2011
|
Nepal
|
Pants
|
1200
|
2012
|
Nepal
|
So if we change the dimensions and measures in a
cube format we would get a better picture. In other words cube is intersection
of multiple measures and their dimensions. If you visualize the same in
graphical mode below is the image how it will look like.
Star schema consists of fact and dimension tables.
The fact tables have the measures and dimension tables give more context to the
fact tables.
In the below figure “Star design” you can see we
have four dimension tables and each one of them are referencing the fact tables
for measure values. The references between dimension and fact tables are done
using simple foreign key relationships.
Figure:
- Star design
Snow flake design is very much similar to star
design. The exception is the dimension table. In snow flake dimension tables
are normalized as shown in the below figure “Snow flake design”. The below
design is very much similar to the star design shown previously but the
products table and vendor tables are separate tables.
The relationship is more of a normalized format. So
summing in other words Star design is pure denormalized design while snow flake
can have normalized dimension tables.
Figure:
- Snow flake design
Snowflake Schema
|
Star Schema
|
|
Normalization
|
Can have normalized dimension
tables.
|
Pure denormalized dimension tables.
|
Maintenance
|
Less redundancy so less maintenance. |
More redundancy due to denormalized
format so more maintenance.
|
Query |
Complex Queries due to normalized
dimension tables.
|
Simple queries due to pure
denormalized design.
|
Joins |
More joins due to normalization.
|
Less joins.
|
Usage guidelines
|
If you are concerned about
integrity and duplication.
|
More than data integrity speed and
performance is concern here.
|
SSIS (SQL Server integration services) helps
to collect data. In other words it does ETL (Extract
transformation and loading) as explained in the previous question.
SSAS (SQL Server analysis services) helps us
to analyze data by creating cubes, facts, dimensions and
measures.
SSRS (SQL Server reporting services) helps us
to view this analyzed data in different formats like
graphical, tabular etc.
·
Data flow defines flow of data from a
source to a destination. In other words it defines individual ETL.
·
Control flow defines work flow
(iterations, conditional checks etc.) and component execution (example: - send
email, copy files, invoke web services, FTP etc.). Control flow has nothing to
do with data. The data flow only deal with data i.e. movement, transformation
and loading.
·
Package is a collection of control
flows.
SSAS -
Performance Tuning of Tabular Models in SQL Server 2012 Analysis Services
What is Tabular Model Designer (SSAS Tabular)
The tabular model designer is part of SQL Server 2012 SQL Server Data Tools (SSDT), integrated with Microsoft Visual Studio 2010 or later, with additional project type templates specifically for developing professional tabular model solutions.
Benefits:
When
you install SQL Server Data Tools (SSDT), new project templates for creating
tabular models are added to the available project types. After creating a new
tabular model project by using one of the templates, you can begin model
authoring by using the tabular model designer tools and wizards.
In
addition to new templates and tools for authoring professional multidimensional
and tabular model solutions, the Visual Studio environment provides debugging
and project lifecycle capabilities that ensure you create the most powerful BI
solutions for your organization.
Comparing Tabular and Multidimensional Solutions (SSAS)
http://technet.microsoft.com/en-us/library/hh212940.aspx
Tabular models hosted in SQL Server 2012 Analysis Service provide a comparatively lightweight, easy to build and deploy solution for business intelligence. However, as you increase the data load, add more users, or run complex queries, you need to have a strategy for maintaining and tuning performance. This paper describes strategies and specific techniques for getting the best performance from your tabular models, including processing and partitioning strategies, DAX query tuning, and server tuning for specific workloads.
Download "Performance Tuning of Tabular Models in SQL Server 2012 Analysis Services" case study with detailed analysis:
http://msdn.microsoft.com/en-us/library/dn393915.aspx
- Types of Dimensions
- Types of Measures
- Types of relationships between dimensions and
measuregroups: None (IgnoreUnrelatedDimensions), Fact, Regular, Reference,
Many to Many, Data Mining
- Star Vs Snowflake schema and Dimensional
modeling
- Data storage modes - MOLAP, ROLAP, HOLAP
- MDX Query syntax
- Functions used commonly in MDX like Filter,
Descendants, BAsc and others
- Difference between EXISTS AND EXISTING, NON
EMPTY keyword and function, NON_EMPTY_BEHAVIOR, ParallelPeriod, AUTOEXISTS
- Difference between static and dynamic set
- Difference between natural and unnatural
hierarchy, attribute relationships
- Difference between rigid and flexible
relationships
- Difference between attirubte hierarchy and
user hierarchy
- Dimension, Hierarchy, Level, and Members
- Difference between database dimension and cube
dimension
- Importance of CALCULATE keyword in MDX script,
data pass and limiting cube space
- Effect of materialize
- Partition processing and Aggregation Usage
Wizard
- Perspectives, Translations, Linked Object
Wizard
- Handling late arriving dimensions / early
arriving facts
- Proactive caching, Lazy aggregations
- Partition processing options
- Role playing Dimensions, Junk Dimensions,
Conformed Dimensions, SCD and other types of dimensions
- Parent Child Hierarchy, NamingTemplate
property, MemberWithLeafLevelData property
- Cube performance, MDX performance
- How to pass parameter in MDX
- SSAS 2005 vs SSAS 2008
- Dimension security vs Cell security
- SCOPE statement, THIS keyword, SUBCUBE
- CASE (CASE, WHEN, THEN, ELSE, END) statement,
IF THEN END IF, IS keyword, HAVING clause
- CELL CALCULATION and CONDITION clause
- RECURSION and FREEZE statement
- Common types of errors encountered while
processing a dimension / measure groups / cube
- Logging and monitoring MDX scripts and cube
performance
- What is the difference between SSAS 2005 and SSAS2008?
- In 2005
its not possible to create an empty cube but in 2008 we can create an
empty cube.
- A new
feature in Analysis Services 2008 is the Attribute Relationships tab in
the Dimension Designer . to implement attribute relationship is complex in
ssas 2005
- we can
create ONLY 2000 partitions per Measure Group in ssas 2005 and
the same limit of partitions is removed in ssas 2008.
You can answer more but if you end this
with these then the interviewer feel that you are REAL EXPERIENCED.
- What is datawarehouse in short DWH?
The datawarehouse is an informational environment that
- Provides
an integrated and total view of the enterprise
- Makes
the enterprise’s current and historical information easily available for
decision making
- Makes decision-support
transactions possible without hindering operational systems
- Renders
the organization’s information consistent
- Presents
a flexible and interactive source of strategic information
OR a warehouse is a
- Subject
oriented
- Integrated
- Time
variant
- Non
volatile for doing decision support
OR
Collection of data in support of management’s decision making process”.
He defined the terms in the sentence as follows.
OR
Subject oriented:
It define the specific business domain ex: banking, retail, insurance, etc…..
Integrated:
It should be in a position to integrated data from various source
systems
Ex: sql,oracle,db2 etc……
Time variant:
It should be in a position to maintain the data the various time
periods.
Non volatile:
Once data is inserted it can’t be changed
- What
is data mart?
A data mart is a subset of an organizational data store, usually
oriented to a specific purpose or major data subject that may be distributed to
support business needs. Data marts are analytical data stores designed to focus
on specific business functions for a specific community within an organization.
Data marts are often derived from
subsets of data in a data warehouse, though in the
bottom-up data warehouse design methodology the data warehouse is created from
the union of organizational data marts.
They are 3 types of data mart they are
- Dependent
- Independent
- Logical
data mart
- What are the difference between data mart and data warehouse?
Datawarehouse is complete data where as Data mart is Subset of the same.
Ex:
All the organisation data may related to finance department, HR, banking
dept are stored in data warehouse where as in data mart only finance data or HR
department data will be stored. So data warehouse is a collection of different
data marts.
- Have you ever worked on performance tuning, if yes what are
the steps involved in it?
We need to identify the bottlenecks to tune the performance,
to overcome the bottleneck we need to following the following.
- Avoid
named queries
- Unnecessary
relationships between tables
- Proper
attribute relationships to be given
- Proper
aggregation design
- Proper
partitioning of data
- Proper
dimension usage design
- Avoid
unnecessary many to many relationships
- Avoid unnecessary
measures
- Set
AttributeHierarchyEnabled = FALSE to Attributes that is not required
- Won’t
take even single measure which is not necessary.
- What
are the difficulties faced in cube development?
This question is either to test whether you are really experienced or
when he doesnot have any questions to ask ..
You can tell any area where you feel difficult to work. But always the
best answers will be the following.
- Giving
attribute relationships
- Calculations
- Giving
dimension usage (many to many relationship)
- Analyzing
the requirements
- Explain the flow of creating a cube?
Steps to create a cube in ssas
- Create
a data source.
- Create
a datasource view.
- Create
Dimensions
- Create
a cube.
- Deploy
and Process the cube.
- What
is a datasource or DS?
The data source is the Physical Connection information that analysis
service uses to connect to the database that host the data. The data source
contains the connection string which specifies the server and the database
hosting the data as well as any necessary authentication credentials.
- What
is datasourceview or DSV?
A data source view is a persistent set of tables from a data source that
supply the data for a particular cube. BIDS also includes a wizard for creating
data source views, which you can invoke by right-clicking on the Data Source
Views folder in Solution Explorer.
- Datasource
view is the logical view of the data in the data source.
- Data
source view is the only thing a cube can see.
- What is named calculation?
A named calculation is a SQL expression represented as a calculated
column. This expression appears and behaves as a column in the table. A named
calculation lets you extend the relational schema of existing tables or views
in a data source view without modifying the tables or views in the underlying
data source.
Named calculation is used to create a new column in the DSV using hard
coded values or by using existing columns or even with both.
- What
is named query?
Named query in DSV is similar to View in Database. This is used to
create Virtual table in DSV which will not impact the underlying database.
Named query is mainly used to merge the two or more table in the datasource
view or to filter columns of a table.
- Why
we need named queries?
A named query is used to join multiple tables, to remove unnecessary
columns from a table of a database. You can achieve the same in
database using Views but this Named Queries will be the best bet whe you don’t
have access to create Views in database.
- How will you add a new column to an existing table in data
source view?
By using named calculations we can add a new column to an existing table
in the data source view. Named Calculation is explained above.
- What is dimension table?
A dimension table contains hierarchical data by which you’d like to
summarize. A dimension table contains specific business information, a
dimension table that contains the specific name of each member of the
dimension. The name of the dimension member is called an “attribute”
The key attribute in the dimension must contain a unique value for each
member of the dimension. This key attribute is called “primary key column”
The primary key column of each dimension table corresponding to the one
of the key column in any related fact table.
- What
is fact table?
A fact table contains the basic information that you wish to
summarize. The table that stores the detailed value for measure is called
fact table. In simple and best we can define as “The table which contains
METRICS” that are used to analyse the business.
It consists of 2 sections
1) Foregine key to the dimesion
2) measures/facts(a numerical value that used to monitor
business activity)
- What
is Factless fact table?
This is very important interview question. The “Factless Fact Table” is
a table which is similar to Fact Table except for having any measure; I mean
that this table just has the links to the dimensions. These tables enable
you to track events; indeed they are for recording events.
Factless fact tables are used for tracking a process or collecting
stats. They are called so because, the fact table does not have aggregatable
numeric values or information. They are mere key values with reference to the
dimensions from which the stats can be collected
- What is attribute relationships, why we need it?
Attribute relationships are the way of telling the analysis service
engine that how the attributes are related with each other. It will help
to relate two or more attributes to each other.Processing time will be
decreased if proper relationships are given. This increases the Cube Processing
performance and MDX query performance too.
In Microsoft SQL Server Analysis Services, attributes within a
dimension are always related either directly or indirectly to the key
attribute. When you define a dimension based on a star schema, which is where
all dimension attributes are derived from the same relational table, an
attribute relationship is automatically defined between the key attribute and
each non-key attribute of the dimension. When you define a dimension based on a
snowflake schema, which is where dimension attributes are derived from multiple
related tables, an attribute relationship is automatically defined as follows:
- Between
the key attribute and each non-key attribute bound to columns in the main
dimension table.
- Between
the key attribute and the attribute bound to the foreign key in the
secondary table that links the underlying dimension tables.
- Between
the attribute bound to foreign key in the secondary table and each non-key
attribute bound to columns from the secondary table.
- How
many types of attribute relationships are there?
They are 2 types of attribute relationships they are
- Rigid
- Flexible
Rigid: In Rigid
relationships where the relationship between the attributes is fixed,
attributes will not change levels or their respective attribute relationships.
Example: The time dimension. We know that month “January 2009″ will
ONLY belong to Year “2009″ and it wont be moved to any other year.
Flexible
: In Flexible relationship between the attributes is changed.
Example: An employee and department. An employee can be in accounts
department today but it is possible that the employee will be in Marketing
department tomorrow.
- How many types of dimensions are there and what are they?
They are 3 types of dimensions:
- confirm
dimension
- junk
dimension
- degenerate
attribute
- What
are confirmed dimensions, junk dimension and degenerated dimensions?
Confirm dimension: It is the dimension which is
sharable across the multiple facts or data model. This is also called
as Role Playing Dimensions.
junk dimension: A number of very small dimensions might be lumped (a
small irregularly shaped) together to form a single dimension, a junk dimension
– the attributes are not closely related. Grouping of Random flags and text
Attributes in a dimension and moving them to a separate sub dimension is known
as junk dimension.
Degenerated dimension: In this degenerate dimension contains their
values in fact table and the dimension id not available
in dimension table. Degenerated Dimension is a dimension key
without corresponding dimension.
Example: In the PointOfSale Transaction Fact table, we have:
Date Key (FK), Product Key (FK), Store Key (FK), Promotion
Key (FP), and POS Transaction Number
Date Dimension corresponds to Date Key, Production Dimension corresponds
to Production Key. In a traditional parent-child database, POS Transactional
Number would be the key to the transaction header record that contains all
the info valid for the transaction as a whole, such as the transaction date and
store identifier. But in this dimensional model, we have already
extracted this info into other dimension. Therefore, POS Transaction
Number looks like a dimension key in the fact table but does not have the
corresponding dimension table.
- What
are the types of database schema?
They are 3 types of database schema they are
- Star
- Snowflake
- Starflake
- What is star, snowflake and star flake schema?
Star
schema: In star schema fact table will be directly linked with all
dimension tables. The star schema’s dimensions are denormalized with each
dimension being represented by a single table. In a star schema a central
fact table connects a number of individual dimension tables.
Snowflake: The snowflake schema
is an extension of the star schema, where each point of
the star explodes into more points. In a star schema, each dimension is
represented by a single dimensional table, whereas in a snowflake schema, that
dimensional table is normalized into multiple lookup tables, each representing
a level in the dimensional hierarchy. In snow flake schema fact table will
be linked directly as well as there will be some intermediate dimension tables
between fact and dimension tables.
Star
flake: A hybrid structure that contains a mixture of star(denormalized)
and snowflake(normalized) schema’s.
- How
will you hide an attribute?
We can hide the attribute by selecting “AttributeHierarchyVisible =
False” in properties of the attribute.
- How will you make an attribute not process?
By selecting “ AttributeHierarchyEnabled = False”, we can make
an attribute not in process.
- What is use of IsAggregatable property?
In Analysis Service we generally see all dimension has All member. This
is because of IsAggregatable property of the attribute. You can set its value
to false, so that it will not show All member. Its default member for that
attribute. If you hide this member than you will have to set other attribute
value to default member else it will pick some value as default and this will
create confusion in browsing data if someone is not known to change in default
member.
- What are key, name and value columns of an attribute?
Key
column of any attribute: Contains the column or columns that
represent the key for the attribute, which is the column in the underlying
relational table in the data source view to which the attribute is bound. The
value of this column for each member is displayed to users unless a value is
specified for the NameColumn property.
Name
column of an attribute: Identifies the column that
provides the name of the attribute that is displayed to users, instead of the
value in the key column for the attribute. This column is used when the key
column value for an attribute member is cryptic or not otherwise useful to the
user, or when the key column is based on a composite key. The NameColumn
property is not used in parent-child hierarchies; instead, the NameColumn
property for child members is used as the member names in a parent-child
hierarchy.
Value
columns of an attribute: Identifies the column that provides the
value of the attribute. If the NameColumn element of the attribute is
specified, the same DataItem values are used as default values for the
ValueColumn element. If the NameColumn element of the attribute is not
specified and the KeyColumns collection of the attribute contains a single
KeyColumn element representing a key column with a string data type, the same
DataItem values are used as default values for the ValueColumn element.
- What
is hierarchy, what are its types and difference between them?
A hierarchy is a very important part of any OLAP engine and allows users
to drill down from summary levels hierarchies represent the way user
expect to explore data at more detailed level
hierarchies is made up of multipule levels creating the
structure based on end user requirements.
->years->quarter->month->week ,are all the levels of
calender hierarchy
They are 2 types of hierarchies they are
- Natural
hierarchy
- Unnatural
hierarchy
Natural
hierarchy: This means that the attributes are intuitively related to one another.
There is a clear relationship from the top of the hierarchy to the bottom.
Example: An example of this would be date: year, quarter and month
follow from each other, and in part, define each other.
Unnatural
hierarchy: This means that the attributes are not clearly related.
Example: An example of this might be geography; we may have country
-> state -> city, but it is not clear where Province might sit.
- What is Attribute hierarchy?
An attribute hierarchy is created for every attribute in a dimension,
and each hierarchy is available for dimensioning fact data. This hierarchy
consists of an “All” level and a detail level containing all members of the
hierarchy.
you can organize attributes into user-defined hierarchies to provide
navigation paths in a cube. Under certain circumstances, you may want to
disable or hide some attributes and their hierarchies.
- What
is use of AttributeHierarchyDisplayFolder property ?
AttributeHierarchyDisplayFolder: Identifies the
folder in which to display the associated attribute hierarchy to end users. For
example if I set the property value as “Test” to all the Attributes of a
dimension then a folder with the name “Test” will be created and all the
Attributes will be placed into the same.
- What is use of AttributeHierarchyEnabled?
AttributeHierarchyEnabled: Determines whether an
attribute hierarchy is generated by Analysis Services for the attribute. If the
attribute hierarchy is not enabled, the attribute cannot be used in a
user-defined hierarchy and the attribute hierarchy cannot be referenced in
Multidimensional Expressions (MDX) statements.
- What is use of AttributeHierarchyOptimizedState?
AttributeHierarchyOptimizedState: Determines the level
of optimization applied to the attribute hierarchy. By default, an attribute
hierarchy is FullyOptimized, which means that Analysis Services builds indexes
for the attribute hierarchy to improve query performance. The other option,
NotOptimized, means that no indexes are built for the attribute hierarchy.
Using NotOptimized is useful if the attribute hierarchy is used for
purposes other than querying, because no additional indexes are built for the
attribute. Other uses for an attribute hierarchy can be helping to order
another attribute.
- What is use of AttributeHierarchyOrdered ?
AttributeHierarchyOrdered: Determines whether
the associated attribute hierarchy is ordered. The default value is True.
However, if an attribute hierarchy will not be used for querying, you can save
processing time by changing the value of this property to False.
- What
is the use of AttributeHierarchyVisible ?
AttributeHierarchyVisible
: Determines whether the attribute hierarchy is visible to client
applications. The default value is True. However, if an attribute hierarchy
will not be used for querying, you can save processing time by changing the
value of this property to False.
- What are types of storage modes?
There are three standard storage modes in OLAP applications
- MOLAP
- ROLAP
- HOLAP
- Compare
the Three Storage Modes ?
Summary and comparison
Basic Storage Mode
|
Storage Location for Detail Data
|
Storage Location for Summary/ Aggregations
|
Storage space requirement
|
Query Response Time
|
Processing Time
|
Latency
|
MOLAP
|
Multidimensional Format
|
Multidimensional Format
|
MediumBecause detail data is stored in compressed
format.
|
Fast
|
Fast
|
High
|
HOLAP
|
Relational Database
|
Multidimensional Format
|
Small
|
Medium
|
Fast
|
Medium
|
ROLAP
|
Relational Database
|
Relational Database
|
Large
|
Slow
|
Slow
|
Low
|
- What
is MOLAP and its advantage?
MOLAP (Multi dimensional Online Analytical Processing) : MOLAP is the
most used storage type. Its designed to offer maximum query performance to the
users. the data and aggregations are stored in a multidimensional format,
compressed and optimized for performance. This is both good and bad. When a
cube with MOLAP storage is processed, the data is pulled from the
relational database, the aggregations are performed, and the data is stored in
the AS database. The data inside the cube will refresh only when the cube is
processed, so latency is high.
Advantages:
- Since
the data is stored on the OLAP server in optimized format, queries
(even complex calculations) are faster than ROLAP.
- The
data is compressed so it takes up less space.
- And
because the data is stored on the OLAP server, you don’t need to keep the
connection to the relational database.
- Cube
browsing is fastest using MOLAP.
- What is ROLAP and its advantage?
ROLAP (Relational Online Analytical Processing) : ROLAP does not have
the high latency disadvantage of MOLAP. With ROLAP, the data and aggregations
are stored in relational format. This means that there will be zero latency
between the relational source database and the cube.
Disadvantage of this mode is the performance, this type gives the
poorest query performance because no objects benefit from multi dimensional
storage.
Advantages:
- Since
the data is kept in the relational database instead of on the OLAP server,
you can view the data in almost real time.
- Also,
since the data is kept in the relational database, it allows for much
larger amounts of data, which can mean better scalability.
- Low
latency.
- What is HOLAP and its advantage?
Hybrid Online Analytical Processing (HOLAP): HOLAP is a combination of
MOLAP and ROLAP. HOLAP stores the detail data in the relational database but
stores the aggregations in multidimensional format. Because of this, the
aggregations will need to be processed when changes are occur. With HOLAP
you kind of have medium query performance: not as slow as ROLAP, but not as
fast as MOLAP. If, however, you were only querying aggregated data or using a
cached query, query performance would be similar to MOLAP. But when you need to
get that detail data, performance is closer to ROLAP.
Advantages:
- HOLAP
is best used when large amounts of aggregations are queried often with
little detail data, offering high performance and lower storage
requirements.
- Cubes
are smaller than MOLAP since the detail data is kept in the relational
database.
- Processing
time is less than MOLAP since only aggregations are stored in
multidimensional format.
- Low
latency since processing takes place when changes occur and detail data is
kept in the relational database.
- What
are Translations and its use?
Translation: The translation
feature in analysis service allows you to display caption and attributes names
that correspond to a specific language. It helps in providing GLOBALIZATION to
the Cube.
- What
is Database dimension?
All the dimensions that are created using NEW DIMENSION Wizard are
database dimensions. In other words, the dimensions which are at Database level
are called Database Dimensions.
- What is
Cube dimension?
A cube dimension is an instance of a database dimension within a cube is
called as cube dimension. A database dimension can be used in multiple cubes,
and multiple cube dimensions can be based on a single database dimension
- Difference
between Database dimension and Cube dimension?
- The
Database dimension has only Name and ID properties, whereas a Cube
dimension has several more properties.
- Database
dimension is created one where as Cube dimension is referenced from
database dimension.
- Database
dimension exists only once.where as Cube dimensions can be created more
than one using ROLE PLAYING Dimensions concept.
- How will you add a dimension to cube?
To add a dimension to a cube follow these steps.
- In
Solution Explorer, right-click the cube, and then click View Designer.
-
In the Design tab for the cube, click the Dimension Usage tab.
-
Either click the Add Cube Dimension button, or right-click anywhere on the
work surface and then click Add Cube Dimension.
-
In the Add Cube Dimension dialog box, use one of the following steps:
- To add
an existing dimension, select the dimension, and then click OK.
- To
create a new dimension to add to the cube, click New dimension, and then
follow the steps in the Dimension Wizard.
- What
is SCD (slowly changing dimension)?
Slowly changing dimensions (SCD) determine how the historical changes in
the dimension tables are handled. Implementing the SCD mechanism enables users
to know to which category an item belonged to in any given date.
- What
are types of SCD?
It is a concept of STORING Historical Changes and when ever an IT guy
finds a new way to store then a new Type will come into picture. Basically
there are 3 types of SCD they are given below
- SCD
type1
- SCD
type2
- SCD
type3
- What
is Type1, Type2, Type3 of SCD?
Type
1: In Type 1 Slowly Changing Dimension, the new information simply
overwrites the original information. In other words, no history is kept.
In our example, recall we originally have the following table:
Customer Key
|
Name
|
State
|
1001
|
Christina
|
Illinois
|
After Christina moved from Illinois to California, the new information
replaces the new record, and we have the following table:
Customer Key
|
Name
|
State
|
1001
|
Christina
|
California
|
Advantages: This is the easiest way to handle the Slowly Changing
Dimension problem, since there is no need to keep track of the old information.
Disadvantages: All history is lost. By applying this methodology, it is
not possible to trace back in history.
Usage: About 50% of the time.
When to use Type 1: Type 1 slowly changing dimension should be used
when it is not necessary for the data warehouse to keep track of historical
changes.
Type
2: In Type 2 Slowly Changing Dimension, a new record is added to the table
to represent the new information. Therefore, both the original and the new
record will be present. The new record gets its own primary key.
In our example, recall we originally have the following table:
Customer Key
|
Name
|
State
|
1001
|
Christina
|
Illinois
|
After Christina moved from Illinois to California, we add the new
information as a new row into the table:
Customer Key
|
Name
|
State
|
1001
|
Christina
|
Illinois
|
1005
|
Christina
|
California
|
Advantages: This allows us to accurately keep all historical
information.
Disadvantages:
- This
will cause the size of the table to grow fast. In cases where the number
of rows for the table is very high to start with, storage and performance
can become a concern.
- This
necessarily complicates the ETL process.
Usage: About 50% of the time.
Type3
: In Type 3 Slowly Changing Dimension, there will be two columns to indicate
the particular attribute of interest, one indicating the original value, and
one indicating the current value. There will also be a column that indicates
when the current value becomes active.
In our example, recall we originally have the following table:
Customer Key
|
Name
|
State
|
1001
|
Christina
|
Illinois
|
To accommodate Type 3 Slowly Changing Dimension, we will now have the
following columns:
Customer Key,Name,OriginalState,CurrentState,Effective Date
After Christina moved from Illinois to California, the original
information gets updated, and we have the following table (assuming the
effective date of change is January 15, 2003):
Customer Key
|
Name
|
OriginalState
|
CurrentState
|
Effective Date
|
1001
|
Christina
|
Illinois
|
California
|
15-JAN-2003
|
Advantages:
- This
does not increase the size of the table, since new information is updated.
- This
allows us to keep some part of history.
Disadvantages: Type 3 will not be able to keep all history where an
attribute is changed more than once. For example, if Christina later moves to
Texas on December 15, 2003, the California information will be lost.
Usage: Type 3 is rarely used in actual practice.
- What is role playing dimension with two examples?
Role
play dimensions: We already discussed about this. This is nothing but
CONFIRMED Dimensions. A dimension can play different role in a fact table you
can recognize a roleplay dimension when there are multiple columns in
a fact table that each have foreign keys to the same dimension table.
Ex1: There are three dimension keys in the
factinternalsales,factresellersales tables which all refer to the dimtime
table,the same time dimension is used to track sales by that contain
either of these fact table,the corresponding role-playing dimension are
automatically added to the cube.
Ex2 : In retail banking, for checking account cube we could have
transaction date dimension and effective date dimension. Both dimensions have
date, month, quarter and year attributes. The formats of attributes are the same
on both dimensions, for example the date attribute is in ‘dd-mm-yyyy’ format.
Both dimensions have members from 1993 to 2010.
- What is measure group, measure?
Measure
groups : These measure groups can contain different dimensions and be at
different granularity but so long as you model your cube correctly,
your users will be able to use measures from each of these measure groups in
their queries easily and without worrying about the underlying complexity.
Creating multiple measure groups : To create a new measure group in the
Cube Editor, go to the Cube Structure tab and right-click on the cube name in
the Measures pane and select ‘New Measure Group’. You’ll then need to select
the fact table to create the measure group from and then the new measure group
will be created; any columns that aren’t used as foreign key columns in the DSV
will automatically be created as measures, and you’ll also get an extra measure
of aggregation type Count. It’s a good idea to delete any measures you are not
going to use at this stage.
Measures
: Measures are the numeric values that our users want to aggregate,
slice, dice and otherwise analyze, and as a result, it’s important to make sure
they behave the way we want them to. One of the fundamental reasons for using
Analysis Services is that, unlike a relational database it allows us to build
into our cube design business rules about measures: how they should be
formatted, how they should aggregate up, how they interact with specific
dimensions and so on.
- What is attribute?
An attribute is a specification that defines a property of an object,
element, or file. It may also refer to or set the specific value for a given
instance of such.
- What is surrogate key?
A surrogate key is the SQL generated key which acts like an alternate
primary key for the table in database, Data warehouses commonly use a surrogate
key to uniquely identify an entity. A surrogate is not generated by the user
but by the system. A primary difference between a primary key and surrogate key
in few databases is that primarykey uniquely identifies a record while a
Surrogatekey uniquely identifies an entity.
Ex: An employee may be recruited before the year 2000 while another
employee with the same name may be recruited after the year 2000. Here, the
primary key will uniquely identify the record while the surrogate key will be
generated by the system (say a serial number) since the SK is NOT derived from
the data.
- How
many types of relations are there between dimension and measure group?
They are six relation between the dimension and measure group, they are
- No
Relationship
- Regular
- Refernce
- Many to
Many
- Data
Mining
- Fact
- What is regular type, no relation type, fact type, referenced
type, many-to-many type with example?
No
relationship: The dimension and measure group are not related.
Regular: The dimension
table is joined directly to the fact table.
Referenced: The dimension
table is joined to an intermediate table, which in turn,is joined to the fact
table.
Many
to many:The dimension table is to an intermediate fact table,the intermediate
fact table is joined , in turn, to an intermediate dimension table to which the
fact table is joined.
Data
mining:The target dimension is based on a mining model built from the source
dimension. The source dimension must also be included in the cube.
Fact
table: The dimension table is the fact table.
- What are calculated members and what is its use?
Calculations are item in the cube that are eveluated at runtime
Calculated members: You can create customized measures or dimension
members, called calculated members, by combining cube data, arithmetic
operators, numbers, and/or functions.
Example: You can create a calculated member called Marks that converts
dollars to marks by multiplying an existing dollar measure by a conversion
rate. Marks can then be displayed to end users in a separate row or
column. Calculated member definitions are stored, but their values exist
only in memory. In the preceding example, values in marks are displayed to end
users but are not stored as cube data.
- What are KPIs and what is its use?
In Analysis Services, a KPI is a collection of calculations that are
associated with a measure group in a cube that are used to evaluate business
success. We use KPI to see the business at the particular point, this is
represents with some graphical items such as traffic signals,ganze etc
- What
are actions, how many types of actions are there, explain with example?
Actions are powerful way of extending the value of SSAS cubes for the
end user. They can click on a cube or portion of a
cube to start an application with the selected item as a parameter,
or to retrieve information about the selected item.
One of the objects supported by a SQL Server Analysis Services cube is
the action. An action is an event that a user can initiate when accessing cube
data. The event can take a number of forms. For example, a user might be able
to view a Reporting Services report, open a Web page, or drill through to
detailed information related to the cube data
Analysis Services supports three types of actions..
Report
action: Report action Returns a Reporting Services report that is associated
with the cube data on which the action is based.
Drill
through: Drillthrough Returns a result set that provides detailed information
related to the cube data on which the action is based.
Standard: Standard has five
action subtypes that are based on the specified cube data.
Dataset: Returns a
mutlidimensional dataset.
Proprietary: Returns a
string that can be interpreted by a client application.
Rowset: Returns a
tabular rowset.
Statement: Returns a
command string that can be run by a client application.
URL: Returns a
URL that can be opened by a client application, usually a browser.
- What
is partition, how will you implement it?
You can use the Partition Wizard to define partitions for a measure
group in a cube. By default, a single partition is defined for each measure
group in a cube. Access and processing performance, however, can degrade for
large partitions. By creating multiple partitions, each containing a portion of
the data for a measure group, you can improve the access and processing
performance for that measure group.
- What
is the minimum and maximum number of partitions required for a measure
group?
In 2005 a MAX of 2000 partitions can be created per measure group and
that limit is lifted in later versions.
In any version the MINIMUM is ONE Partition per measure group.
- What
are Aggregations and its use?
Aggregations provide performance improvements by allowing Microsoft SQL
Server Analysis Services (SSAS) to retrieve pre-calculated totals directly from
cube storage instead of having to recalculate data from an underlying data
source for each query. To design these aggregations, you can use the
Aggregation Design Wizard. This wizard guides you through the following steps:
- Selecting
standard or custom settings for the storage and caching options of a
partition, measure group, or cube.
- Providing
estimated or actual counts for objects referenced by the partition,
measure group, or cube.
- Specifying
aggregation options and limits to optimize the storage and query
performance delivered by designed aggregations.
- Saving
and optionally processing the partition, measure group, or cube to
generate the defined aggregations.
- After
you use the Aggregation Design Wizard, you can use the Usage-Based
Optimization Wizard to design aggregations based on the usage patterns of
the business users and client applications that query the cube.
- What is perspective, have you ever created perspective?
Perspectives are a way to reduce the complexity of cubes by hidden
elements like measure groups, measures, dimensions, hierarchies etc. It’s
nothing but slicing of a cube, for ex we are having retail and hospital data
and end user is subscribed to see only hospital data, then we can create
perspective according to it.
- What is deploy, process and build?
Bulid: Verifies the
project files and create several local files.
Deploy: Deploy the structure
of the cube(Skeleton) to the server.
Process: Read the data
from the source and build the dimesions and cube structures
Elaborating the same is given below.
Build: Its is a used to
process the data of the cube database. Build is a version of a program. As a rule, a build
is a pre-release version and as such is identified by a build number, rather
than by a release number. Reiterative (repeated) builds are an important part
of the development process. Throughout development, application components are
collected and repeatedly compiled for testing
purposes, to ensure a reliable final product. Build tools, such as make or Ant, enable developers to automate some
programming tasks. As a verb, to build can mean either to write code or to put
individual coded components of a program together.
Deployment: During development of
an Analysis Services project in Business Intelligence Development Studio, you
frequently deploy the project to a development server in order to create the
Analysis Services database defined by the project. This is required to test the
project.
for example, to browse cells in the cube, browse dimension members, or
verify key performance indicators (KPIs) formulas.
- What is the maximum size of a dimension?
The maximum size of the dimension
is 4 gb.
- What are the types of processing and explain each?
They are 6 types of processing in ssas ,they are
- Process
Full
- Process
Data
- Process
Index
- Process
Incremental
- Process
Structure
- UnProcess
Process
Full: Processes an Analysis Services object and all the objects that it
contains. When Process Full is executed against an object that has already been
processed, Analysis Services drops all data in the object, and then processes
the object. This kind of processing is required when a structural change has
been made to an object, for example, when an attribute hierarchy is added,
deleted, or renamed. This processing option is supported for cubes, databases,
dimensions, measure groups, mining models, mining structures, and partitions.
Process
Data: Processes data only without building aggregations or indexes. If there
is data is in the partitions, it will be dropped before re-populating the
partition with source data. This processing option is supported for dimensions,
cubes, measure groups, and partitions.
Process
Index: Creates or rebuilds indexes and aggregations for all processed
partitions. This option causes an error on unprocessed objects. This processing
option is supported for cubes, dimensions, measure groups, and partitions.
Process
Increment: Adds newly available fact data and process only to the relevant
partitions. This processing option is supported for measure groups, and
partitions.
Process
Structure: If the cube is unprocessed, Analysis Services will process, if it is
necessary, all the cube’s dimensions. After that, Analysis Services will create
only cube definitions. If this option is applied to a mining structure, it
populates the mining structure with source data. The difference between this
option and the Process Full option is that this option does not iterate the
processing down to the mining models themselves. This processing option is
supported for cubes and mining structures.
Unprocess
: Drops the data in the object specified and any lower-level constituent
objects. After the data is dropped, it is not reloaded. This processing option
is supported for cubes, databases, dimensions, measure groups, mining models,
mining structures, and partitions.
Process
Default: Detects the process state of an object, and performs processing
necessary to deliver unprocessed or partially processed objects to a fully
processed state. This processing option is supported for cubes, databases,
dimensions, measure groups, mining models, mining structures, and partitions.
- What
is a cube?
The basic unit of storage and analysis
in Analysis Services is the cube. A cube is a
collection of data that’s been aggregated to allow queries to return data
quickly.
For example, a cube of order data might be aggregated by time period and
by title, making the cube fast when you ask questions concerning orders by week
or orders by title.
- What
is AMO?
The full form of AMO is Analysis Managament Objects. This is used to
create or alter cubes from .NET code.
- After
creating the cube, if we added a new column to the OLTP table then
how you add this new attribute to the cube?
Just open the datasourceview and on right click we find the option
REFRESH. Click the REFRESH then it will add new attributes to the table which
can be added to Cube.
REAL
TIME INTERVIEW QUESTIONS -
- What
is the size of the Cube in your last Project?
Answer to this question varies from project to project and mainly depends
on how BIG is your database and how COMPLEX the database design is. Generally
for the database with a TRANSACTION TABLE of 50 crore records, the cube size
will be around 100GB. So, better go with 100GB as answer to this question.
- What
is size of the database in your last Project?
You can expect this question
immediately after you answer 100GB to the last question. The database size will
be 600 to 800GB for which the cube will come to 100 GB. So go with 800GB for
this question.
- What
is size of the fact(Transaction) table in your last Project?
This
will be the next question if you answer 800GB as your dataabase size. Here he
is not expecting SIZE in GBs but the interviewer will be expecting NUMBER OF
ROWS in the Transaction table. Go with 57Crore records for this question.
- How
frequently you process the cube?
You
have to be very careful here. Frequency of processing cube depends on HOW
FREQUENTLY YOU ARE GETTING NEW DATA. Once the new data comes then SSIS team
loads it and send a mail to SSAS team after load is completed successfully.
Once SSAS team receives the mail then these guys will look for best time to
PROCESS.
Typically
we get data either Weekly or Monthly. So you can say that the processing of the
cube will be done either Weekly or monthly.
- How
frequently you get DATA from clients?
This
answer should be based on your last answer. IF you answered WEEKLY to last
question then the Answer to this question also should be WEEKLY. IF MONTHLY for
last question then this answer also should be MONTHLY.
- What
type of Processing Options you used to process the cube in your Project?
This
is the toughest question to answer. This depends on DATA you have and CLIENTS
requirements. Let me explain here.
- If the
database is SMALL, let’s say it has only 1 crore records then people do
FULL PROCESS as it wont take much time.
- If the
database is MEDIUM, let’s say it has only 15 crore records then people
prefer to do INCREMENTAL PROCESS unless CLIENTS ask us to do FULL PROCESS
as it takes little bit of time.
- If the
database is HUGE, let’s say it has more than 35 to 40 crore records then
people prefer to do INCREMENTAL PROCESS unless CLIENTS ask us to do FULL
PROCESS as it takes lot of time. In this case we TRY to convince clients
for INCREMENTAL and if they don’t agree then we don’t have any other
option.
- Incremental
process will come into picture ONLY when there is no updates to the OLD
data i.e no changes to already existing data else NO OTHER OPTION than
FULL PROCESS.
- How
you provide security to cube?
By
defining roles we provide security to cubes. Using roles we can restrict users
from accessing restricted data. Procedure as follows -
- Define
Role
- Set
Permission
- Add
appropriate Users to the role
- How
you move the cube from one server to another?
There
are many ways to do the same. Let me explain four here and cleverly you can say
“I worked on 4 SSAS projects till date and implemented different types in all
the four.”
- Backup
and restore – This is the simplest way. Take the Backup from development
server and copy the backup to FTP folder of clients. After doing this drop
a mail to Client’s Admin and he will take care of RESTORE part.
- Directly
PROCESS the cube in PRODUCTION environment. For this you need access to
Production which will not be given by clients unless the clients are
*********. One of the client I worked for given FULL access to me ..
- Under
Srart –> All Programs –> Sql Server –> Analysis Services you can
see deployment wizard. This is one way of moving the cube. This method has
some steps to follow. First deploy your cube and FOUR files will be
created in BIN folder of PROJECT folder. Copy those FOUR files and paste
in Production server in any directory. Then OPEN this DEPLOYMENT Wizard in
production and when it ask for Database file then point to the location
where you copied the files. After that NEXT,NEXT,NEXT … OK .. Cube will be
deployed and processed.
- This
way is most beautiful one. Synchronization, In this we will first deploy
and process the cube in STAGING ENVIRONMENT and then we will go to
production server. Connect to Analysis services in SSMS and select
Synchronize by right clicking on Databases folder in SSMS of analysis
services. Then select source as STAGING SERVER and then click on OK. The
changes in the cube present in the Staging server will be copied to the
production server.
- What
is the toughest challenge you face in your Project?
There
are couple of this where we face difficulty.
- While
working on RELATIONSHIPS between Measure Groups and Dimensions.
- Working
on Complex calculations
- Performance
tuning
- How
you created Partitions of the cube in your Last Project?
Partitions
can be created on different data. Few people do it on PRODUCT NAME wise and
many prefer to do it on DATE data wise. you go with DATE wise.
In
dates, we can create MONTH wise,WEEK wise,QUARTER wise and some times YEAR
wise. This all depends on how much data you are coming per WEEK or MONTH or QUARTER
… If you are getting 50 lakhs records per month then tell you do MONTH wise.
- How
many dimensions in your last cube?
47
to 50.
- How
many measure groups in your last cube?
Total
10 and in that 4 are Fact tables and remaining 6 are Fact less fact tables.
- What
is the Schema of your last cube?
Snowflake
- Why
not STAR Schema ?
My data base design doesn’t support
STAR Schema.
- What
are the different relationships that you are used in your cube?
- Regular
- Referenced
- Many to
Many
- Fact
- No
Relationship
- Have
you created the KPI’s , If then Explain?
Don’t
add much to this as the questions in this will be tricky. Just tell that you
worked on couple of KPI and you have basic knowledge on this. (Don’t worry,
this is not MANDATORY)
- How
you define Aggregations in your Project?
We
defined the aggregations for MOST FREQUENTLY USED data in SSRS reports.
- Size
of SSAS team in your last Project?
Just 2 guys as we guys are really in
demand and lot of scarcity:)
- How
many Resources worked on same Cube in your Project?
Only
2 and one in morning shift and another in Evening shift.
- How
much time it take to Process the Cube?
This
is Very very important question. This again depends on the SIZE of
database,Complexity of the database and your server settings. For database with
50 cr transaction records, it generally takes 3.5 hrs.
- How
many Calculation you done in Your Project?
I
answer more than 5000 and if you tell the same then you are caught unless you
are super good in MDX. Best answer for you is “Worked on 50 calculations”.
Q: What is the difference between a
derived measure and a calculated measure?
The difference between a derived measure and a calculated measure is when the calculation is performed. A derived measure is calculated before aggregations are created, and the values of the derived measure are stored in the cube. A calculated measure is calculated after aggregations are created, and the values of a calculated measure aren’t stored in the cube. The primary criterion for choosing between a derived measure and a calculated measure is not efficiency, but accuracy.
The difference between a derived measure and a calculated measure is when the calculation is performed. A derived measure is calculated before aggregations are created, and the values of the derived measure are stored in the cube. A calculated measure is calculated after aggregations are created, and the values of a calculated measure aren’t stored in the cube. The primary criterion for choosing between a derived measure and a calculated measure is not efficiency, but accuracy.
Q: What is a partition?
A partition in Analysis Services is the physical location of stored cube data. Every cube has at least one partition by default. Each time we create a measure group, another partition is created. Queries run faster against a partitioned cube because Analysis Services only needs to read data from the partitions that contain the answers to the queries. Queries run even faster when partition also stores aggregations, the pre calculated totals for additive measures. Partitions are a powerful and flexible means of managing cubes, especially large cubes.
A partition in Analysis Services is the physical location of stored cube data. Every cube has at least one partition by default. Each time we create a measure group, another partition is created. Queries run faster against a partitioned cube because Analysis Services only needs to read data from the partitions that contain the answers to the queries. Queries run even faster when partition also stores aggregations, the pre calculated totals for additive measures. Partitions are a powerful and flexible means of managing cubes, especially large cubes.
Q: While creating a new calculated
member in a cube what is the use of property
called non-empty behavior?
Nonempty behavior is important property for ratio calculations. If the denominator Is empty, an MDX expression will return an error just as it would if the denominator Were equal to zero. By selecting one or more measures for the Non-Empty Behavior property, we are establishing a requirement that each selected measure first be evaluated before the calculation expression is evaluated. If each selected measure is empty, then The expression is also treated as empty and no error is returned.
called non-empty behavior?
Nonempty behavior is important property for ratio calculations. If the denominator Is empty, an MDX expression will return an error just as it would if the denominator Were equal to zero. By selecting one or more measures for the Non-Empty Behavior property, we are establishing a requirement that each selected measure first be evaluated before the calculation expression is evaluated. If each selected measure is empty, then The expression is also treated as empty and no error is returned.
Q: What is a RAGGED hierarchy?
Under normal circumstances, each level in a hierarchy in Microsoft SQL Server 2005 Analysis Services (SSAS) has the same number of members above it as any other member at the same level. In a ragged hierarchy, the logical parent member of at least one member is not in the level immediately above the member. When this occurs, the hierarchy descends to different levels for different drilldown paths. Expanding through every level for every drilldown path is then unnecessarily complicated.
Under normal circumstances, each level in a hierarchy in Microsoft SQL Server 2005 Analysis Services (SSAS) has the same number of members above it as any other member at the same level. In a ragged hierarchy, the logical parent member of at least one member is not in the level immediately above the member. When this occurs, the hierarchy descends to different levels for different drilldown paths. Expanding through every level for every drilldown path is then unnecessarily complicated.
Q: What are the roles of an Analysis Services Information Worker?
The role of an Analysis Services
information worker is the traditional "domain expert" role in
business intelligence (BI) someone who understands the data employed by a
solution and is able to translate the data into business information. The role
of an Analysis Services information worker often has one of the following job
titles: Business Analyst (Report Consumer), Manager (Report Consumer),
Technical Trainer, Help Desk/Operation, or Network Administrator.
Q: What are the different ways of
creating Aggregations?
We can create aggregations for faster MDX statements using Aggregation Wizard or thru UBO – Usage Based Optimizations. Always, prefer UBO method in realtime performance troubleshooting.
We can create aggregations for faster MDX statements using Aggregation Wizard or thru UBO – Usage Based Optimizations. Always, prefer UBO method in realtime performance troubleshooting.
Q: What is WriteBack? What are the
pre-conditions?
The Enable/Disable Writeback dialog box enables or disables writeback for a measure group in a cube. Enabling writeback on a measure group defines a writeback partition and creates a writeback table for that measure group. Disabling writeback on a measure group removes the writeback partition but does not delete the writeback table, to avoid unanticipated data loss.
The Enable/Disable Writeback dialog box enables or disables writeback for a measure group in a cube. Enabling writeback on a measure group defines a writeback partition and creates a writeback table for that measure group. Disabling writeback on a measure group removes the writeback partition but does not delete the writeback table, to avoid unanticipated data loss.
Q: What is processing?
Processing is a critical and resource intensive operation in the data warehouse lifecycle and needs to be carefully optimized and executed. Analysis Services 2005 offers a high performance and scalable processing architecture with a comprehensive set of controls for database administrators.
We can process an OLAP database, individual cube, Dimension or a specific Partition in a cube.
Processing is a critical and resource intensive operation in the data warehouse lifecycle and needs to be carefully optimized and executed. Analysis Services 2005 offers a high performance and scalable processing architecture with a comprehensive set of controls for database administrators.
We can process an OLAP database, individual cube, Dimension or a specific Partition in a cube.
Q: Name few Business Analysis
Enhancements for SSAS?
The following table lists the business intelligence enhancements that are available in Microsoft SQL Server 2005 Analysis Services (SSAS). The table also shows the cube or dimension to which each business intelligence enhancement applies, and indicates whether an enhancement can be applied to an object that was created without using a data source and for which no schema has been generated.
The following table lists the business intelligence enhancements that are available in Microsoft SQL Server 2005 Analysis Services (SSAS). The table also shows the cube or dimension to which each business intelligence enhancement applies, and indicates whether an enhancement can be applied to an object that was created without using a data source and for which no schema has been generated.
Enhancement
|
Type
|
Applied to
|
No data source
|
Time Intelligence
|
Cube
|
Cube
|
No
|
Account Intelligence
|
Dimension
|
Dimension or cube
|
No
|
Dimension Intelligence
|
Dimension
|
Dimension or cube
|
Yes
|
Custom Aggregation
|
Dimension
|
Dimension (unary operator) or cube
|
No
|
Semiadditive Behavior
|
Cube
|
Cube
|
Yes>
|
Custom Member Formula
|
Dimension
|
Dimension or cube
|
No
|
Custom Sorting and Uniqueness
Settings
|
Dimension
|
Dimension or cube
|
Yes
|
Dimension Writeback
|
Dimension
|
Dimension or cube
|
Yes
|
Q: What MDX functions do you most
commonly use?
This is a great question because you only know this answer by experience. If you ask me this question, the answer practically rushes out of me. “CrossJoin, Descendants, and NonEmpty, in addition to Sum, Count, and Aggregate. My personal favorite is CrossJoin because it allows me identify non-contiguous slices of the cube and aggregate even though those cube cells don’t roll up to a natural ancestor.” Indeed, CrossJoin has easily been my bread and butter.
This is a great question because you only know this answer by experience. If you ask me this question, the answer practically rushes out of me. “CrossJoin, Descendants, and NonEmpty, in addition to Sum, Count, and Aggregate. My personal favorite is CrossJoin because it allows me identify non-contiguous slices of the cube and aggregate even though those cube cells don’t roll up to a natural ancestor.” Indeed, CrossJoin has easily been my bread and butter.
Q: Where do you put calculated
members?
The reflexive answer is “in the Measures dimension” but this is the obvious answer. So I always follow up with another question. “If you want to create a calculated member that intersects all measures, where do you put it?” A high percentage of candidates can’t answer this question, and the answer is “In a dimension other than Measures.” If they can answer it, I immediately ask them why. The answer is “Because a member in a dimension cannot intersect its own relatives in that dimension.”
The reflexive answer is “in the Measures dimension” but this is the obvious answer. So I always follow up with another question. “If you want to create a calculated member that intersects all measures, where do you put it?” A high percentage of candidates can’t answer this question, and the answer is “In a dimension other than Measures.” If they can answer it, I immediately ask them why. The answer is “Because a member in a dimension cannot intersect its own relatives in that dimension.”
Q: How do I find the bottom 10 customers with the lowest sales in 2003 that were not null?
A: Simply using bottomcount will return customers with null sales. You will have to combine it with NONEMPTY or FILTER.
SELECT { [Measures].[Internet Sales Amount] } ON COLUMNS ,
BOTTOMCOUNT(
NONEMPTY(DESCENDANTS( [Customer].[Customer Geography].[All Customers]
, [Customer].[Customer Geography].[Customer] )
, ( [Measures].[Internet Sales Amount] ) )
, 10
, ( [Measures].[Internet Sales Amount] )
) ON ROWS
FROM [Adventure Works]
WHERE ( [Date].[Calendar].[Calendar Year].&[2003] ) ;
Q: How in MDX query can I get top 3
sales years based on order quantity?
By default Analysis Services returns members in an order specified during attribute design. Attribute properties that define ordering are "OrderBy" and "OrderByAttribute". Lets say we want to see order counts for each year. In Adventure Works MDX query would be:
SELECT {[Measures].[Reseller Order Quantity]} ON 0
, [Date].[Calendar].[Calendar Year].Members ON 1
FROM [Adventure Works];
Same query using TopCount:
SELECT
{[Measures].[Reseller Order Quantity]} ON 0,
TopCount([Date].[Calendar].[Calendar Year].Members,3, [Measures].[Reseller Order Quantity]) ON 1
FROM [Adventure Works];
By default Analysis Services returns members in an order specified during attribute design. Attribute properties that define ordering are "OrderBy" and "OrderByAttribute". Lets say we want to see order counts for each year. In Adventure Works MDX query would be:
SELECT {[Measures].[Reseller Order Quantity]} ON 0
, [Date].[Calendar].[Calendar Year].Members ON 1
FROM [Adventure Works];
Same query using TopCount:
SELECT
{[Measures].[Reseller Order Quantity]} ON 0,
TopCount([Date].[Calendar].[Calendar Year].Members,3, [Measures].[Reseller Order Quantity]) ON 1
FROM [Adventure Works];
1.What is Analysis
Services
1.Microsoft SQL Server 2012 Analysis Services (SSAS) delivers online analytical processing (OLAP) and data mining functionality for business intelligence applications.
2.Analysis Services supports OLAP by letting we design, create, and manage multidimensional structures that contain data aggregated from other data sources, such as relational databases.
3.For data mining applications, Analysis Services lets we design, create, and visualize data mining models that are constructed from other data sources by using a wide variety of industry-standar data mining algorithms
1.Microsoft SQL Server 2012 Analysis Services (SSAS) delivers online analytical processing (OLAP) and data mining functionality for business intelligence applications.
2.Analysis Services supports OLAP by letting we design, create, and manage multidimensional structures that contain data aggregated from other data sources, such as relational databases.
3.For data mining applications, Analysis Services lets we design, create, and visualize data mining models that are constructed from other data sources by using a wide variety of industry-standar data mining algorithms
2.What are the
features of Analysis Services
1.Ease of use with a lot of
wizards and designers.
2.Flexible data model creation and management
3.Scalable architecture to handle OLAP
4.Provides integration of administration tools, data sources, security, caching, and reporting etc.
2.Flexible data model creation and management
3.Scalable architecture to handle OLAP
4.Provides integration of administration tools, data sources, security, caching, and reporting etc.
5.Provides extensive support for
custom applications
3.What is the need for SSAS 2012 component?
1.Analysis Services is the only
component in SQL Server using which we can perform Analysis and Forecast
operations
2.Faster Analysis and Troubleshooting.
3.Ability to create and manage Data warehouses
4.Apply efficient Security Principles
2.Faster Analysis and Troubleshooting.
3.Ability to create and manage Data warehouses
4.Apply efficient Security Principles
4. What languages are used in SSAS
2012
1.Structured Query Language (SQL)
2.Multidimensional Expressions (MDX) - an industry standard query language orientated towards analysis
3.Data Mining Extensions (DMX) - an industry standard query language oriented toward data mining.
4.Analysis Services Scripting Language (ASSL) - used to manage Analysis Services database objects
2.Multidimensional Expressions (MDX) - an industry standard query language orientated towards analysis
3.Data Mining Extensions (DMX) - an industry standard query language oriented toward data mining.
4.Analysis Services Scripting Language (ASSL) - used to manage Analysis Services database objects
5.What is the difference between a
derived measure and a calculated measure?
1.
The difference between a derived
measure and a calculated measure is when the calculation is performed.
2.
A derived measure is calculated
before aggregations are created, and the values of the derived measure are
stored in the cube.
3.
A calculated measure is calculated
after aggregations are created, and the values of a calculated measure aren’t
stored in the cube.
4.
The primary criterion for choosing
between a derived measure and a calculated measure is not efficiency, but
accuracy
6. What is FASMI ?
A
database is called a OLAP Database if the database satisfies the FASMI
rules
1. Fast Analysis– is defined in the OLAP scenario in
five seconds or less.
2. Shared – Must support access to data by many
users in the factors of Sensitivity and Write Backs.
3. Multidimensional – The data inside the OLAP
Database must be multidimensional in structure.
4. Information – The OLAP database Must support large
volumes of data.
7.What is a partition?
1. A partition in Analysis
Services is the physical location of stored cube data.
2. Every cube has at least one partition
by default. Each time we create a measure group, another partition is
created.
3. Queries run faster against a partitioned
cube because Analysis Services only needs to read data from the partitions that
contain the answers to the queries.
4. Queries run even faster when partition
also stores aggregations, the pre calculated totals for additive measures.
Partitions are a powerful and flexible means of managing cubes, especially
large cubes
8. What are the roles of an Analysis Services
Information Worker?
The role of an Analysis Services
information worker is the traditional "domain expert" role in
business intelligence (BI) someone who understands the data employed by a
solution and is able to translate the data into business information.The role
of an Analysis Services information worker often has one of the following job
titles: Business Analyst (Report Consumer), Manager (Report Consumer),
Technical Trainer, Help Desk/Operation, or Network Administrator
9.Explain the TWO-Tier Architecture of
SSAS?
1. SSAS uses both server and client components
to supply OLAP and data mining functionality BI Applications.
2. The server component is implemented as a
Microsoft Windows service. Each instance of Analysis Services implemented as a
separate instance of the Windows service.
3. Clients communicate with Analysis Services
using the standard the XMLA (XML For Analysis) , protocol for issuing commands
and receiving responses, exposed as a web service
10. How many types of dimension are
possible in SSAS?
1. Bill of Materials
2. Currency
3. Channel
4. Customer
5. Geography
6. Organizations
7. Products
8. promotion
9. Regular
10. Scenario
11. Time
12. Unary
13. Accounts
1. Define Data Warehouse.
Ans. A data warehouse is a relational database that is designed for query and analysis rather than for transaction processing. It usually contains historical data derived from transaction data, but it can include data from other sources.
Ans. A data warehouse is a relational database that is designed for query and analysis rather than for transaction processing. It usually contains historical data derived from transaction data, but it can include data from other sources.
2. Which are three layers in which
data warehouse maintains it’s functions?
Ans. A data warehouse maintains its functions in staging, integration and access.
Ans. A data warehouse maintains its functions in staging, integration and access.
3. Can you elaborate it?
Ans. Staging: It is used to store raw data for use by developers.
Ans. Staging: It is used to store raw data for use by developers.
Integration: It is used to integrate data and to have a
level of abstraction from users.
Access: It is for accessing data out of data
warehouse for it’s users.
4. What is Data Mart?
Ans. A Data Mart is the access layer of the data warehouse environment that is used to make data accessible to it’s users. It is subset of Data Warehouse.
Ans. A Data Mart is the access layer of the data warehouse environment that is used to make data accessible to it’s users. It is subset of Data Warehouse.
5. What is dimensional approach for
storing data in data warehouse?
Ans. In dimensional approach transaction data are partitioned into either “facts”, which are generally numeric transaction data, or “dimensions”, which are the reference information that gives context to the facts.
Ans. In dimensional approach transaction data are partitioned into either “facts”, which are generally numeric transaction data, or “dimensions”, which are the reference information that gives context to the facts.
6. What is normalization approach for
storing data in data warehouse?
Ans. In normalization approach the data in the data warehouse are stored following, to a degree, database normalization rules. Tables are grouped together by subject areas that reflect general data categories (e.g., data on customers, products, finance, etc.).
Ans. In normalization approach the data in the data warehouse are stored following, to a degree, database normalization rules. Tables are grouped together by subject areas that reflect general data categories (e.g., data on customers, products, finance, etc.).
7. How we organize sales transactions
data into dimensional approach?
Ans. A sales transaction can be broken up into facts such as the number of products ordered and the price paid for the products, and into dimensions such as order date, customer name, product number, order ship-to and bill-to locations, and salesperson responsible for receiving the order.
Ans. A sales transaction can be broken up into facts such as the number of products ordered and the price paid for the products, and into dimensions such as order date, customer name, product number, order ship-to and bill-to locations, and salesperson responsible for receiving the order.
8. What is the main advantage in
using dimensional approach?
Ans. The main advantage in using dimensional approach is that the data warehouse is easier for the user to understand and to use. The retrieval of data from data warehouse tends to operate very quickly.
Ans. The main advantage in using dimensional approach is that the data warehouse is easier for the user to understand and to use. The retrieval of data from data warehouse tends to operate very quickly.
9. What are the main disadvantages of
using dimensional approach?
Ans. There are mainly two disadvantages of using dimensional approach to storing data in data warehouse.
Ans. There are mainly two disadvantages of using dimensional approach to storing data in data warehouse.
1. For maintaining integrity of facts and
dimensions, the process of loading the data from data warehouse from different
operational systems gets complicated.
2. If organization is adopting the dimensional
approach changes the way in which it does the business, it is difficult to
modify the data warehouse structure.
10. What are the advantages and
disadvantage of using normalization approach for storing data in data
warehouse?
Ans. Advantages 1. The main advantage is that it is easy to add information into the database.
Ans. Advantages 1. The main advantage is that it is easy to add information into the database.
Disadvantages
1. It is difficult to join data from different
sources into meaningful information
2. It is also difficult to access the information
without a precise understanding of the source of data and the data structure of
the data warehouse.
11. What is dimension?
Ans. It is a data element. It categorizes each item in a data set into non-overlapping regions. It aids to “slice and dice” data in data warehouse.
Ans. It is a data element. It categorizes each item in a data set into non-overlapping regions. It aids to “slice and dice” data in data warehouse.
12. Can you describe different types
of dimensions?
Ans. There are three types of dimensions are available in dimensional approach.
Ans. There are three types of dimensions are available in dimensional approach.
Conformed dimension: It is a set of data attributes which is
physically implemented in multiple database tables using same structure,
attributes, domain values, definitions and concepts in each implementation. It
cuts across many facts.
Dimensions are conformed when they are either
same(with keys) or one is perfect subset of other.
Junk Dimension: It is grouping of low cardinality flags and
indicators. These flags and indicators are removed from the fact table while
placing them into a useful dimensional framework.
Role Playing Dimension: Often dimensions are recycled for multiple
applications within the same database. i.e. “date” dimension can be used for
“sales date” or “hire date”. This is referred as role playing dimension.
13. What is cube?
Ans. Cube is a data structure that allows fast analysis of data. It is the capability of manipulating and analyzing data from different perspective.
Ans. Cube is a data structure that allows fast analysis of data. It is the capability of manipulating and analyzing data from different perspective.
14. What data cleansing(or data
scrubbing) will do?
Ans. It will remove errors, redundancies and inconsistencies in the data that is being imported into a data mart or data warehouse. (part of quality assurance process)
Ans. It will remove errors, redundancies and inconsistencies in the data that is being imported into a data mart or data warehouse. (part of quality assurance process)
15. What do you know about data
mining?
Ans. It is process of researching data marts and data warehouses to detect specific patterns in the data sets. It may be performed on databases and multi-dimensional data cubes with ad-hoc query tools and OLAP.
Ans. It is process of researching data marts and data warehouses to detect specific patterns in the data sets. It may be performed on databases and multi-dimensional data cubes with ad-hoc query tools and OLAP.
The queries and reports are designed to answer
specific questions to uncover trends or hidden relationships in the data.
16. What you know about drill-down
and drill-up?
Ans. Drill-down is the ability of a data-mining tool to move down into increasing levels of detail in a data mart, data warehouse or multi-dimensional data cube.
Ans. Drill-down is the ability of a data-mining tool to move down into increasing levels of detail in a data mart, data warehouse or multi-dimensional data cube.
Drill-up is the ability of a data-mining tool to move
back up into higher levels of details in a data mart, data warehouse or
multi-dimensional data cube.
17. What is a fact table?
Ans. It is primary table that contains measurements(mostly numeric data like grade, wages, etc).
Ans. It is primary table that contains measurements(mostly numeric data like grade, wages, etc).
18. What is OLAP(On-Line Analytical
Processing)?
Ans. Online retrieval and analysis of data to disclose business trends and statistics not directly visible in the data is known as OLAP.
Ans. Online retrieval and analysis of data to disclose business trends and statistics not directly visible in the data is known as OLAP.
19. What is OLTP(On-Line Transaction
Processing)?
Ans. It refers to a class of systems that facilitates and manage transaction-oriented applications, typically for data entry and retrieval transaction processing.
Ans. It refers to a class of systems that facilitates and manage transaction-oriented applications, typically for data entry and retrieval transaction processing.
20. What are advantages and
disadvantages of data warehouse?
Ans. Advantages of using data warehouse includes:
Ans. Advantages of using data warehouse includes:
1. Integrating data from multiple
sources.
2. Performing new types of analysis and reducing cost to access historical data.
3. Improving turnaround time for reporting and analysis.
4. Supporting ad-hoc query and reporting.
2. Performing new types of analysis and reducing cost to access historical data.
3. Improving turnaround time for reporting and analysis.
4. Supporting ad-hoc query and reporting.
Disadvantage
1. Long initial implementation time
and associated high cost.
2. Difficult to accommodate changes in data types and ranges, data source schema, indexes and queries.
3. Data owners lose control over their data, raising ownership, privacy and security issues.
4. Adding new data source takes time and associated high cost.
2. Difficult to accommodate changes in data types and ranges, data source schema, indexes and queries.
3. Data owners lose control over their data, raising ownership, privacy and security issues.
4. Adding new data source takes time and associated high cost.
Basic MDX Questions
1. What is MDX?
MDX is a language that articulates selections, calculations, and some metadata definitions against an Online Analytical Processing (OLAP) database, and provides some capabilities for specifying how query results are to be represented.
2. Where you use MDX in SQL Server 2005 OLAP?
MDX queries and expressions in SQL Server 2005 Analysis Services are used to do the following:
• Return data to a client application from a SQL Server 2005 Analysis Services cube.
• Format query results.
• Perform cube design tasks, including the definition of calculated members, named sets, scoped assignments, and key performance indicators (KPIs).
• Perform administrative tasks, including dimension and cell security.
3. What are different APIs that support MDX?
• OLE DB for OLAP
• ADO MD,
• ADOMD.Net,
• XMLA(XML for Analysis),
• Hyperion Sybase C and Java APIs,
4. What is Tuple in SSAS?
• A tuple is the intersection of one (and only one) member taken from one or several of the dimensions in the cube. A tuple identifies (or has the potential to identify) a single Cell in the multi-dimensional matrix.
• To compose a tuple with more than one dimension, you must wrap the members in parentheses (Tuple).
• You can’t compose an empty tuple. () is not a valid tuple.
5. Why don’t a tuple can have more than one member from same dimension?
• Well, if you do, it is inevitable that the ‘tuple’ that you create will end up pointing to more than one cell.
6. What is SET in SSAS?
• A set is a collection of tuples with the same dimensionality. It may have more than one tuple, but it can also have only one tuple, or even have zero tuples, in which case it is an empty set.
• If Set has three tuples with different dimensionality then we will not call that as SET.
• The curly braces { Tuple1, Tuple2 }, are used to denote the Set
7. What are query Axes in MDX?
• Query axes specify the edges of a cell set returned by a Multidimensional Expressions (MDX) SELECT statement.
• A query can have up to 128 query axes. The first five are named axes columns, Rows, pages, sections and chapters.
• However, a query that uses axis 1 must also use axis 0, and a query that uses axis 2 must also use axes 1 and 0. You cannot skip an axis in a query-you’ll get an error.
• The axis designations can come in any order.
• For axes beyond axis(4), you must use the numbers, because there are no names for them.
8. Explain the difference between Query Axis and Slicer Axis ?
Query Axis
MDX is a language that articulates selections, calculations, and some metadata definitions against an Online Analytical Processing (OLAP) database, and provides some capabilities for specifying how query results are to be represented.
2. Where you use MDX in SQL Server 2005 OLAP?
MDX queries and expressions in SQL Server 2005 Analysis Services are used to do the following:
• Return data to a client application from a SQL Server 2005 Analysis Services cube.
• Format query results.
• Perform cube design tasks, including the definition of calculated members, named sets, scoped assignments, and key performance indicators (KPIs).
• Perform administrative tasks, including dimension and cell security.
3. What are different APIs that support MDX?
• OLE DB for OLAP
• ADO MD,
• ADOMD.Net,
• XMLA(XML for Analysis),
• Hyperion Sybase C and Java APIs,
4. What is Tuple in SSAS?
• A tuple is the intersection of one (and only one) member taken from one or several of the dimensions in the cube. A tuple identifies (or has the potential to identify) a single Cell in the multi-dimensional matrix.
• To compose a tuple with more than one dimension, you must wrap the members in parentheses (Tuple).
• You can’t compose an empty tuple. () is not a valid tuple.
5. Why don’t a tuple can have more than one member from same dimension?
• Well, if you do, it is inevitable that the ‘tuple’ that you create will end up pointing to more than one cell.
6. What is SET in SSAS?
• A set is a collection of tuples with the same dimensionality. It may have more than one tuple, but it can also have only one tuple, or even have zero tuples, in which case it is an empty set.
• If Set has three tuples with different dimensionality then we will not call that as SET.
• The curly braces { Tuple1, Tuple2 }, are used to denote the Set
7. What are query Axes in MDX?
• Query axes specify the edges of a cell set returned by a Multidimensional Expressions (MDX) SELECT statement.
• A query can have up to 128 query axes. The first five are named axes columns, Rows, pages, sections and chapters.
• However, a query that uses axis 1 must also use axis 0, and a query that uses axis 2 must also use axes 1 and 0. You cannot skip an axis in a query-you’ll get an error.
• The axis designations can come in any order.
• For axes beyond axis(4), you must use the numbers, because there are no names for them.
8. Explain the difference between Query Axis and Slicer Axis ?
Query Axis
- Query axes specify the edges of a cellset
returned by a Multidimensional Expressions (MDX) SELECT statement.
- The Quesry axis is defined in the SELECT
clause of the SELECT statement in MDX.
- Query axes—the set of hierarchies from which
data is retrieved for multiple members
Slicer axis
- The slicer axis filters the data returned by
the Multidimensional Expressions (MDX) SELECT statement.
- The slicer axis is defined in the WHERE clause
of the SELECT statement in MDX.
- Slicer axis—the set of hierarchies from which
data is retrieved for a single member.
9.
Explain the MDX SELECT Statement?
• In MDX, the SELECT statement specifies a result set that contains a subset of multidimensional data that has been returned from a cube.
• The ON keyword is used with an axis name to specify where dimensions from your database are displayed
• Each Axe is made up of either one or more tuple or a Set.
• MDX uses curly braces, { and }, to enclose a set of elements from a particular dimension or set of dimensions.
• We can separate different elements with commas (,). Element names may be enclosed by [ ] characters, and may have multiple parts separated by dot (.) characters.
• The FROM clause in an MDX query names the cube from which the data is being queried.
• A WHERE clause that optionally determines which dimension or member to use as the slicer axis that restricts the extracting of data to a specific dimension or member.
10. What is the use of colon operator in MDX Select statement?
• You can specify a set as a range of members in that order by listing two members from the same level as endpoints and putting a colon between them to mean “These members and every member between them.”
• For example
SELECT
{ [Time].[Sep,2004] : [Time].[Mar,2005] } on columns,
{ [Product].[Tools] : [Product].[Home Audio] } on rows
FROM [Sales]
WHERE ([Customer].[Lubbock, TX], [Measures].[Unit Sales])
11. How to establish cube context in MDX query?
• Every MDX query runs within a specified cube context. In the SELECT statement, the FROM clause determines the cube context.
• Although the FROM clause specifies the cube context as within a single cube, this does not have to limit you from working with data from more than one cube at a time.
• You can use the MDX LookupCube function to retrieve data from cubes outside the cube context. Additionally, functions such as the Filter function, are available that allow temporary restriction of the context while evaluating the query.
12. What is use of .Members function in MDX?
• The .Members operator takes a dimension, hierarchy, or level on its left-hand side,
and returns a set of all members associated with that metadata scope. For example, [Customer].Members results in the set of all customers
13. What is the use of .Children Function in MDX?
• The Children function returns a naturally ordered set that contains the children of a specified member. If the specified member has no children, this function returns an empty set.
14. What is use of Descendants() function and specify the Flags usage?
• Returns the set of descendants of a member at a specified level or distance, optionally including or excluding descendants in other levels.
• Syntax :- Descendants(Member_Expression/Set_expressoin [,Level_Expression /Distance [ ,Desc_Flag ] ] )
15. What is use of NON EMPTY Keyword in MDX?
• The NON EMPTY Keyword is used to remove the entirely empty slices from the query result.
16. What precisely Dimensionality of Tuple refers to?
• The “dimensionality” of a tuple refers to the set of dimensions whose members compose it. The order in which dimensions appear in a tuple is an important part of a tuple’s dimensionality
17. Explain CrossJoin() function use in MDX?
• In many cases, you will want to take the cross-product of members (or tuples) in two different sets (that is, specify all of their possible combinations). The CrossJoin() function is the most direct way of combining the two sets in this way.
• The syntax is as follows: CrossJoin (set1, set2)
• CrossJoin() only takes two sets as inputs. If you want to take the CrossJoin() of three or more sets, such as times, scenarios, and products, you can do it by nesting calls to CrossJoin().
18. Explain the Filter() function use in MDX?
• Filter() lets you reduce a set by including in the resulting set only those elements that meet some criteria.
• Filter() takes one set and one Boolean expression as its arguments and returns that subset where the Boolean expression is true.
• For example, the expression
Filter ( { [Product].[Product Category].Members }, [Measures].[Dollar Sales] >= 500 )
19. Explain the Order() function use in MDX?
• To put the tuples in a set into a sequence based on associated data values, we need to use the Order() function.
• The syntax for the Order() function is: Order (set1, expression [, ASC DESC BASC BDESC])
20. Explain DIMENSION PROPERTIES function usage in MDX?
• DIMENSION PROPERTIES a function is used to retrieve the properties of the members in cube.
• In MDX, the SELECT statement specifies a result set that contains a subset of multidimensional data that has been returned from a cube.
• The ON keyword is used with an axis name to specify where dimensions from your database are displayed
• Each Axe is made up of either one or more tuple or a Set.
• MDX uses curly braces, { and }, to enclose a set of elements from a particular dimension or set of dimensions.
• We can separate different elements with commas (,). Element names may be enclosed by [ ] characters, and may have multiple parts separated by dot (.) characters.
• The FROM clause in an MDX query names the cube from which the data is being queried.
• A WHERE clause that optionally determines which dimension or member to use as the slicer axis that restricts the extracting of data to a specific dimension or member.
10. What is the use of colon operator in MDX Select statement?
• You can specify a set as a range of members in that order by listing two members from the same level as endpoints and putting a colon between them to mean “These members and every member between them.”
• For example
SELECT
{ [Time].[Sep,2004] : [Time].[Mar,2005] } on columns,
{ [Product].[Tools] : [Product].[Home Audio] } on rows
FROM [Sales]
WHERE ([Customer].[Lubbock, TX], [Measures].[Unit Sales])
11. How to establish cube context in MDX query?
• Every MDX query runs within a specified cube context. In the SELECT statement, the FROM clause determines the cube context.
• Although the FROM clause specifies the cube context as within a single cube, this does not have to limit you from working with data from more than one cube at a time.
• You can use the MDX LookupCube function to retrieve data from cubes outside the cube context. Additionally, functions such as the Filter function, are available that allow temporary restriction of the context while evaluating the query.
12. What is use of .Members function in MDX?
• The .Members operator takes a dimension, hierarchy, or level on its left-hand side,
and returns a set of all members associated with that metadata scope. For example, [Customer].Members results in the set of all customers
13. What is the use of .Children Function in MDX?
• The Children function returns a naturally ordered set that contains the children of a specified member. If the specified member has no children, this function returns an empty set.
14. What is use of Descendants() function and specify the Flags usage?
• Returns the set of descendants of a member at a specified level or distance, optionally including or excluding descendants in other levels.
• Syntax :- Descendants(Member_Expression/Set_expressoin [,Level_Expression /Distance [ ,Desc_Flag ] ] )
15. What is use of NON EMPTY Keyword in MDX?
• The NON EMPTY Keyword is used to remove the entirely empty slices from the query result.
16. What precisely Dimensionality of Tuple refers to?
• The “dimensionality” of a tuple refers to the set of dimensions whose members compose it. The order in which dimensions appear in a tuple is an important part of a tuple’s dimensionality
17. Explain CrossJoin() function use in MDX?
• In many cases, you will want to take the cross-product of members (or tuples) in two different sets (that is, specify all of their possible combinations). The CrossJoin() function is the most direct way of combining the two sets in this way.
• The syntax is as follows: CrossJoin (set1, set2)
• CrossJoin() only takes two sets as inputs. If you want to take the CrossJoin() of three or more sets, such as times, scenarios, and products, you can do it by nesting calls to CrossJoin().
18. Explain the Filter() function use in MDX?
• Filter() lets you reduce a set by including in the resulting set only those elements that meet some criteria.
• Filter() takes one set and one Boolean expression as its arguments and returns that subset where the Boolean expression is true.
• For example, the expression
Filter ( { [Product].[Product Category].Members }, [Measures].[Dollar Sales] >= 500 )
19. Explain the Order() function use in MDX?
• To put the tuples in a set into a sequence based on associated data values, we need to use the Order() function.
• The syntax for the Order() function is: Order (set1, expression [, ASC DESC BASC BDESC])
20. Explain DIMENSION PROPERTIES function usage in MDX?
• DIMENSION PROPERTIES a function is used to retrieve the properties of the members in cube.
What is Analysis Services? List out
the features?
Microsoft SQL Server 2005 Analysis Services (SSAS) delivers online analytical processing (OLAP) and data mining functionality for business intelligence applications. Analysis Services supports OLAP by letting we design, create, and manage multidimensional structures that contain data aggregated from other data sources, such as relational databases. For data mining applications, Analysis Services lets we design, create, and visualize data mining models that are constructed from other data sources by using a wide variety of industry-standard
data mining algorithms.
Analysis Services is a middle tier server for analytical processing, OLAP, and Data mining. It manages multidimensional cubes of data and provides access to heaps of information including aggregation of data. One can create data mining models from data sources and use it for Business Intelligence also including reporting features.
Analysis service provides a combined view of the data used in OLAP or Data mining. Services here refer to OLAP, Data mining. Analysis services assists in creating, designing and managing multidimensional structures containing data from varied sources. It provides a wide array of data mining algorithms for specific trends and needs.
Microsoft SQL Server 2005 Analysis Services (SSAS) delivers online analytical processing (OLAP) and data mining functionality for business intelligence applications. Analysis Services supports OLAP by letting we design, create, and manage multidimensional structures that contain data aggregated from other data sources, such as relational databases. For data mining applications, Analysis Services lets we design, create, and visualize data mining models that are constructed from other data sources by using a wide variety of industry-standard
data mining algorithms.
Analysis Services is a middle tier server for analytical processing, OLAP, and Data mining. It manages multidimensional cubes of data and provides access to heaps of information including aggregation of data. One can create data mining models from data sources and use it for Business Intelligence also including reporting features.
Analysis service provides a combined view of the data used in OLAP or Data mining. Services here refer to OLAP, Data mining. Analysis services assists in creating, designing and managing multidimensional structures containing data from varied sources. It provides a wide array of data mining algorithms for specific trends and needs.
Some of the key features are:
·
Ease of use with a lot of wizards and
designers.
·
Flexible data model creation and
management
·
Scalable architecture to handle OLAP
·
Provides integration of
administration tools, data sources, security, caching, and reporting etc.
·
Provides extensive support for custom
applications
What is UDM? Its significance in
SSAS?
The role of a Unified Dimensional Model (UDM) is to provide a bridge between the user and the data sources. A UDM is constructed over one or more physical data sources, and then the end user issues queries against the UDM using one of a variety of client tools, such as Microsoft Excel. At a minimum, when the UDM is constructed merely as a thin layer over the data source, the advantages to the end user are a simpler, more readily understood model of the data, isolation from heterogeneous backend data sources, and improved performance for summary type queries. In some scenarios a simple UDM like this is constructed totally automatically. With greater investment in the construction of the UDM, additional benefits accrue from the richness of metadata that the model can provide.
The role of a Unified Dimensional Model (UDM) is to provide a bridge between the user and the data sources. A UDM is constructed over one or more physical data sources, and then the end user issues queries against the UDM using one of a variety of client tools, such as Microsoft Excel. At a minimum, when the UDM is constructed merely as a thin layer over the data source, the advantages to the end user are a simpler, more readily understood model of the data, isolation from heterogeneous backend data sources, and improved performance for summary type queries. In some scenarios a simple UDM like this is constructed totally automatically. With greater investment in the construction of the UDM, additional benefits accrue from the richness of metadata that the model can provide.
The UDM provides the following
benefits:
� Allows the user model to be greatly enriched.
� Provides high performance queries supporting interactive analysis, even over huge data volumes.
� Allows business rules to be captured in the model to support richer analysis.
� Allows the user model to be greatly enriched.
� Provides high performance queries supporting interactive analysis, even over huge data volumes.
� Allows business rules to be captured in the model to support richer analysis.
Q: What
is the need for SSAS component?
·
Analysis Services is the only
component in SQL Server using which we can perform Analysis and Forecast
operations.
·
SSAS is very easy to use and
interactive.
·
Faster Analysis and Troubleshooting.
·
Ability to create and manage Data
warehouses.
·
Apply efficient Security Principles.
Explain the TWO-Tier Architecture of
SSAS?
·
SSAS uses both server and client
components to supply OLAP and data mining functionality BI Applications.
·
The server component is implemented
as a Microsoft Windows service. Each instance of Analysis Services implemented
as a separate instance of the Windows service.
·
Clients communicate with Analysis
Services using the standard the XMLA (XML For Analysis) , protocol for issuing
commands and receiving responses, exposed as a web service.
What are the components of SSAS?
·
An OLAP Engine is used for enabling
fast ad hoc queries by end users. A user can interactively explore data
by drilling, slicing or pivoting.
·
Drilling refers to the process of
exploring details of the data.
·
Slicing refers to the process of
placing data in rows and columns.
·
Pivoting refers to switching categories
of data between rows and columns.
·
In OLAP, we will be using what are
called as Dimensional Databases.
What is FASMI ?
A database is called a OLAP Database if the database satisfies the FASMI rules :
A database is called a OLAP Database if the database satisfies the FASMI rules :
·
Fast
Analysis� is defined in the OLAP scenario in five
seconds or less.
·
Shared � Must support access to data by many users
in the factors of Sensitivity and Write Backs.
·
Multidimensional
� The data inside the OLAP Database
must be multidimensional in structure.
·
Information
� The OLAP database Must support large
volumes of data..
What languages are used in SSAS ?
·
Structured Query Language (SQL)
·
Multidimensional Expressions (MDX) -
an industry standard query language orientated towards analysis
·
Data Mining Extensions (DMX) - an
industry standard query language oriented toward data mining.
·
Analysis Services Scripting Language
(ASSL) - used to manage Analysis Services database objects.
How Cubes are implemented in SSAS ?
·
Cubes are multidimensional models
that store data from one or more sources.
·
Cubes can also store aggregations
·
SSAS Cubes are created using the Cube
Wizard.
·
We also build Dimensions when
creating Cubes.
·
Cubes can see only the DSV( logical
View).
What is the difference between a
derived measure and a calculated measure?
The difference between a derived measure and a calculated measure is when the calculation is performed. A derived measure is calculated before aggregations are created, and the values of the derived measure are stored in the cube. A calculated measure is calculated after aggregations are created, and the values of a calculated measure aren�t stored in the cube. The primary criterion for choosing between a derived measure and a calculated measure is not efficiency, but accuracy.
The difference between a derived measure and a calculated measure is when the calculation is performed. A derived measure is calculated before aggregations are created, and the values of the derived measure are stored in the cube. A calculated measure is calculated after aggregations are created, and the values of a calculated measure aren�t stored in the cube. The primary criterion for choosing between a derived measure and a calculated measure is not efficiency, but accuracy.
What is a partition?
A partition in Analysis Services is the physical location of stored cube data. Every cube has at least one partition by default. Each time we create a measure group, another partition is created. Queries run faster against a partitioned cube because Analysis Services only needs to read data from the partitions that contain the answers to the queries. Queries run even faster when partition also stores aggregations, the pre calculated totals for additive measures. Partitions are a powerful and flexible means of managing cubes, especially large cubes.
A partition in Analysis Services is the physical location of stored cube data. Every cube has at least one partition by default. Each time we create a measure group, another partition is created. Queries run faster against a partitioned cube because Analysis Services only needs to read data from the partitions that contain the answers to the queries. Queries run even faster when partition also stores aggregations, the pre calculated totals for additive measures. Partitions are a powerful and flexible means of managing cubes, especially large cubes.
While creating a new calculated
member in a cube what is the use of property
called non-empty behavior?
Nonempty behavior is important property for ratio calculations. If the denominator Is empty, an MDX expression will return an error just as it would if the denominator Were equal to zero. By selecting one or more measures for the Non-Empty Behavior property, we are establishing a requirement that each selected measure first be evaluated before the calculation expression is evaluated. If each selected measure is empty, then The expression is also treated as empty and no error is returned.
called non-empty behavior?
Nonempty behavior is important property for ratio calculations. If the denominator Is empty, an MDX expression will return an error just as it would if the denominator Were equal to zero. By selecting one or more measures for the Non-Empty Behavior property, we are establishing a requirement that each selected measure first be evaluated before the calculation expression is evaluated. If each selected measure is empty, then The expression is also treated as empty and no error is returned.
What is a RAGGED hierarchy?
Under normal circumstances, each level in a hierarchy in Microsoft SQL Server 2005 Analysis Services (SSAS) has the same number of members above it as any other member at the same level. In a ragged hierarchy, the logical parent member of at least one member is not in the level immediately above the member. When this occurs, the hierarchy descends to different levels for different drilldown paths. Expanding through every level for every drilldown path is then unnecessarily complicated.
Under normal circumstances, each level in a hierarchy in Microsoft SQL Server 2005 Analysis Services (SSAS) has the same number of members above it as any other member at the same level. In a ragged hierarchy, the logical parent member of at least one member is not in the level immediately above the member. When this occurs, the hierarchy descends to different levels for different drilldown paths. Expanding through every level for every drilldown path is then unnecessarily complicated.
What are the roles of an Analysis Services Information Worker?
The role of an Analysis Services information worker
is the traditional "domain expert" role in business intelligence (BI)
someone who understands the data employed by a solution and is able to
translate the data into business information. The role of an Analysis Services
information worker often has one of the following job titles: Business Analyst
(Report Consumer), Manager (Report Consumer), Technical Trainer, Help
Desk/Operation, or Network Administrator.
What are the different ways of
creating Aggregations?
We can create aggregations for faster MDX statements using Aggregation Wizard or thru UBO � Usage Based Optimizations. Always, prefer UBO method in realtime performance troubleshooting.
We can create aggregations for faster MDX statements using Aggregation Wizard or thru UBO � Usage Based Optimizations. Always, prefer UBO method in realtime performance troubleshooting.
What is WriteBack? What are the
pre-conditions?
The Enable/Disable Writeback dialog box enables or disables writeback for a measure group in a cube. Enabling writeback on a measure group defines a writeback partition and creates a writeback table for that measure group. Disabling writeback on a measure group removes the writeback partition but does not delete the writeback table, to avoid unanticipated data loss.
The Enable/Disable Writeback dialog box enables or disables writeback for a measure group in a cube. Enabling writeback on a measure group defines a writeback partition and creates a writeback table for that measure group. Disabling writeback on a measure group removes the writeback partition but does not delete the writeback table, to avoid unanticipated data loss.
What is processing?
Processing is a critical and resource intensive operation in the data warehouse lifecycle and needs to be carefully optimized and executed. Analysis Services 2005 offers a high performance and scalable processing architecture with a comprehensive set of controls for database administrators.
We can process an OLAP database, individual cube, Dimension or a specific Partition in a cube.
Processing is a critical and resource intensive operation in the data warehouse lifecycle and needs to be carefully optimized and executed. Analysis Services 2005 offers a high performance and scalable processing architecture with a comprehensive set of controls for database administrators.
We can process an OLAP database, individual cube, Dimension or a specific Partition in a cube.
Name few Business Analysis
Enhancements for SSAS?
The following table lists the business intelligence enhancements that are available in Microsoft SQL Server 2005 Analysis Services (SSAS). The table also shows the cube or dimension to which each business intelligence enhancement applies, and indicates whether an enhancement can be applied to an object that was created without using a data source and for which no schema has been generated.
The following table lists the business intelligence enhancements that are available in Microsoft SQL Server 2005 Analysis Services (SSAS). The table also shows the cube or dimension to which each business intelligence enhancement applies, and indicates whether an enhancement can be applied to an object that was created without using a data source and for which no schema has been generated.
Enhancement
|
Type
|
Applied to
|
No data
source
|
Time
Intelligence
|
Cube
|
Cube
|
No
|
Account
Intelligence
|
Dimension
|
Dimension
or cube
|
No
|
Dimension
Intelligence
|
Dimension
|
Dimension
or cube
|
Yes
|
Custom
Aggregation
|
Dimension
|
Dimension
(unary operator) or cube
|
No
|
Semiadditive
Behavior
|
Cube
|
Cube
|
Yes>
|
Custom
Member Formula
|
Dimension
|
Dimension
or cube
|
No
|
Custom
Sorting and Uniqueness Settings
|
Dimension
|
Dimension
or cube
|
Yes
|
Dimension
Writeback
|
Dimension
|
Dimension
or cube
|
Yes
|
What MDX functions do you most
commonly use?
This is a great question because you only know this answer by experience. If you ask me this question, the answer practically rushes out of me. �CrossJoin, Descendants, and NonEmpty, in addition to Sum, Count, and Aggregate. My personal favorite is CrossJoin because it allows me identify non-contiguous slices of the cube and aggregate even though those cube cells don�t roll up to a natural ancestor.� Indeed, CrossJoin has easily been my bread and butter.
This is a great question because you only know this answer by experience. If you ask me this question, the answer practically rushes out of me. �CrossJoin, Descendants, and NonEmpty, in addition to Sum, Count, and Aggregate. My personal favorite is CrossJoin because it allows me identify non-contiguous slices of the cube and aggregate even though those cube cells don�t roll up to a natural ancestor.� Indeed, CrossJoin has easily been my bread and butter.
Where do you put calculated members?
The reflexive answer is �in the Measures dimension� but this is the obvious answer. So I always follow up with another question. �If you want to create a calculated member that intersects all measures, where do you put it?� A high percentage of candidates can�t answer this question, and the answer is �In a dimension other than Measures.� If they can answer it, I immediately ask them why. The answer is �Because a member in a dimension cannot intersect its own relatives in that dimension.�
The reflexive answer is �in the Measures dimension� but this is the obvious answer. So I always follow up with another question. �If you want to create a calculated member that intersects all measures, where do you put it?� A high percentage of candidates can�t answer this question, and the answer is �In a dimension other than Measures.� If they can answer it, I immediately ask them why. The answer is �Because a member in a dimension cannot intersect its own relatives in that dimension.�
How do I find the bottom 10 customers with the lowest sales in 2003 that were not null?
A: Simply using bottomcount will return customers with null sales. You will have to combine it with NONEMPTY or FILTER.
SELECT { [Measures].[Internet Sales Amount] } ON COLUMNS ,
BOTTOMCOUNT(
NONEMPTY(DESCENDANTS( [Customer].[Customer Geography].[All Customers]
, [Customer].[Customer Geography].[Customer] )
, ( [Measures].[Internet Sales Amount] ) )
, 10
, ( [Measures].[Internet Sales Amount] )
) ON ROWS
FROM [Adventure Works]
WHERE ( [Date].[Calendar].[Calendar Year].&[2003] ) ;
How in MDX query can I get top 3
sales years based on order quantity?
By default Analysis Services returns members in an order specified during attribute design. Attribute properties that define ordering are "OrderBy" and "OrderByAttribute". Lets say we want to see order counts for each year. In Adventure Works MDX query would be:
SELECT {[Measures].[Reseller Order Quantity]} ON 0
, [Date].[Calendar].[Calendar Year].Members ON 1
FROM [Adventure Works];
Same query using TopCount:
SELECT
{[Measures].[Reseller Order Quantity]} ON 0,
TopCount([Date].[Calendar].[Calendar Year].Members,3, [Measures].[Reseller Order Quantity]) ON 1
FROM [Adventure Works];
By default Analysis Services returns members in an order specified during attribute design. Attribute properties that define ordering are "OrderBy" and "OrderByAttribute". Lets say we want to see order counts for each year. In Adventure Works MDX query would be:
SELECT {[Measures].[Reseller Order Quantity]} ON 0
, [Date].[Calendar].[Calendar Year].Members ON 1
FROM [Adventure Works];
Same query using TopCount:
SELECT
{[Measures].[Reseller Order Quantity]} ON 0,
TopCount([Date].[Calendar].[Calendar Year].Members,3, [Measures].[Reseller Order Quantity]) ON 1
FROM [Adventure Works];
How do you extract first tuple from
the set?
Use could usefunction Set.Item(0)
Example:
SELECT {{[Date].[Calendar].[Calendar Year].Members
}.Item(0)}
ON 0
FROM [Adventure Works]
Use could usefunction Set.Item(0)
Example:
SELECT {{[Date].[Calendar].[Calendar Year].Members
}.Item(0)}
ON 0
FROM [Adventure Works]
How can I setup default dimension
member in Calculation script?
You can use ALTER CUBE statement. Syntax:
ALTER CUBE CurrentCube | YourCubeName UPDATE DIMENSION , DEFAULT_MEMBER='';
You can use ALTER CUBE statement. Syntax:
ALTER CUBE CurrentCube | YourCubeName UPDATE DIMENSION , DEFAULT_MEMBER='';
Q: What is Analysis Services? List out
the features?
Microsoft SQL Server 2014 Analysis Services (SSAS) delivers online analytical processing (OLAP) and data mining functionality for business intelligence applications. Analysis Services supports OLAP by letting us design, create, and manage multidimensional structures that contain data aggregated from other data sources, such as relational databases. For data mining applications, Analysis Services lets we design, create, and visualize data mining models that are constructed from other data sources by using a wide variety of industry-standard data mining algorithms.
Analysis Services is a middle tier server for analytical processing, OLAP, and Data mining. It manages multidimensional cubes of data and provides access to heaps of information including aggregation of data. One can create data mining models from data sources and use it for Business Intelligence also including reporting features.
Analysis service provides a combined view of the data used in OLAP or Data mining. Services here refer to OLAP, Data mining. Analysis services assists in creating, designing and managing multidimensional structures containing data from varied sources. It provides a wide array of data mining algorithms for specific trends and needs.
Microsoft SQL Server 2014 Analysis Services (SSAS) delivers online analytical processing (OLAP) and data mining functionality for business intelligence applications. Analysis Services supports OLAP by letting us design, create, and manage multidimensional structures that contain data aggregated from other data sources, such as relational databases. For data mining applications, Analysis Services lets we design, create, and visualize data mining models that are constructed from other data sources by using a wide variety of industry-standard data mining algorithms.
Analysis Services is a middle tier server for analytical processing, OLAP, and Data mining. It manages multidimensional cubes of data and provides access to heaps of information including aggregation of data. One can create data mining models from data sources and use it for Business Intelligence also including reporting features.
Analysis service provides a combined view of the data used in OLAP or Data mining. Services here refer to OLAP, Data mining. Analysis services assists in creating, designing and managing multidimensional structures containing data from varied sources. It provides a wide array of data mining algorithms for specific trends and needs.
Some of the key features are:
§ Ease of use with a lot of wizards and designers.
§ Flexible data model creation and management
§ Scalable architecture to handle OLAP
§ Provides integration of administration tools, data
sources, security, caching, and reporting etc.
§ Provides extensive support for custom applications
Q: What is UDM? Its significance in
SSAS?
The role of a Unified Dimensional Model (UDM) is to provide a bridge between the user and the data sources. A UDM is constructed over one or more physical data sources, and then the end user issues queries against the UDM using one of a variety of client tools, such as Microsoft Excel. At a minimum, when the UDM is constructed merely as a thin layer over the data source, the advantages to the end user are a simpler, more readily understood model of the data, isolation from heterogeneous backend data sources, and improved performance for summary type queries. In some scenarios a simple UDM like this is constructed totally automatically. With greater investment in the construction of the UDM, additional benefits accrue from the richness of metadata that the model can provide.
The role of a Unified Dimensional Model (UDM) is to provide a bridge between the user and the data sources. A UDM is constructed over one or more physical data sources, and then the end user issues queries against the UDM using one of a variety of client tools, such as Microsoft Excel. At a minimum, when the UDM is constructed merely as a thin layer over the data source, the advantages to the end user are a simpler, more readily understood model of the data, isolation from heterogeneous backend data sources, and improved performance for summary type queries. In some scenarios a simple UDM like this is constructed totally automatically. With greater investment in the construction of the UDM, additional benefits accrue from the richness of metadata that the model can provide.
The UDM provides the following
benefits:
• Allows the user model to be greatly enriched.
• Provides high performance queries supporting interactive analysis, even over huge data volumes.
• Allows business rules to be captured in the model to support richer analysis.
• Allows the user model to be greatly enriched.
• Provides high performance queries supporting interactive analysis, even over huge data volumes.
• Allows business rules to be captured in the model to support richer analysis.
Q:What
is the need for SSAS component?
§ Analysis Services is the only component in SQL
Server using which we can perform Analysis and Forecast operations.
§ SSAS is very easy to use and interactive.
§ Faster Analysis and Troubleshooting.
§ Ability to create and manage Data warehouses.
§ Apply efficient Security Principles.
Q: Explain the TWO-Tier Architecture of
SSAS?
§ SSAS uses both server and client components to
supply OLAP and data mining functionality BI Applications.
§ The server component is implemented as a Microsoft
Windows service. Each instance of Analysis Services implemented as a separate
instance of the Windows service.
§ Clients communicate with Analysis Services using
the standard XMLA (XML For Analysis) , protocol for issuing commands and
receiving responses, exposed as a web service.
Q: What are the components of SSAS?
§ An OLAP Engine is used for enabling fast adhoc
queries by end users. A user can interactively explore data by drilling,
slicing or pivoting.
§ Drilling refers to the process of exploring details
of the data.
§ Slicing refers to the process of placing data in
rows and columns.
§ Pivoting refers to switching categories of data
between rows and columns.
§ In OLAP, we will be using what are called as
Dimensional Databases.
Q: What is FASMI ?
A database is called an OLAP Database if the database satisfies the FASMI rules :
A database is called an OLAP Database if the database satisfies the FASMI rules :
§ Fast Analysis– is defined in the OLAP scenario in five seconds or
less.
§ Shared – Must
support access to data by many users in the factors of Sensitivity
and Write Backs.
§ Multidimensional – The data inside the OLAP Database must be multidimensional
in structure.
§ Information – The OLAP database Must support large volumes of
data..
Q: What languages are used in SSAS ?
§ Structured Query Language (SQL)
§ Multidimensional Expressions (MDX) - an industry
standard query language orientated towards analysis
§ Data Mining Extensions (DMX) - an industry standard
query language oriented toward data mining.
§ Analysis Services Scripting Language (ASSL) - used
to manage Analysis Services database objects.
Q: How Cubes are implemented in SSAS ?
§ Cubes are multidimensional models that store data
from one or more sources.
§ Cubes can also store aggregations
§ SSAS Cubes are created using the Cube Wizard.
§ We also build Dimensions when creating Cubes.
§ Cubes can see only the DSV( logical View).
Q: What is the difference between a
derived measure and a calculated measure?
The difference between a derived measure and a calculated measure is when the calculation is performed. A derived measure is calculated before aggregations are created, and the values of the derived measure are stored in the cube. A calculated measure is calculated after aggregations are created, and the values of a calculated measure aren’t stored in the cube. The primary criterion for choosing between a derived measure and a calculated measure is not efficiency, but accuracy.
The difference between a derived measure and a calculated measure is when the calculation is performed. A derived measure is calculated before aggregations are created, and the values of the derived measure are stored in the cube. A calculated measure is calculated after aggregations are created, and the values of a calculated measure aren’t stored in the cube. The primary criterion for choosing between a derived measure and a calculated measure is not efficiency, but accuracy.
Q: What is a partition?
A partition in Analysis Services is the physical location of stored cube data. Every cube has at least one partition by default. Each time we create a measure group, another partition is created. Queries run faster against a partitioned cube because Analysis Services only needs to read data from the partitions that contain the answers to the queries. Queries run even faster when partition also stores aggregations, the pre calculated totals for additive measures. Partitions are a powerful and flexible means of managing cubes, especially large cubes.
A partition in Analysis Services is the physical location of stored cube data. Every cube has at least one partition by default. Each time we create a measure group, another partition is created. Queries run faster against a partitioned cube because Analysis Services only needs to read data from the partitions that contain the answers to the queries. Queries run even faster when partition also stores aggregations, the pre calculated totals for additive measures. Partitions are a powerful and flexible means of managing cubes, especially large cubes.
Q: While creating a new calculated
member in a cube what is the use of property
called non-empty behavior?
Nonempty behavior is important property for ratio calculations. If the denominator Is empty, an MDX expression will return an error just as it would if the denominator Were equal to zero. By selecting one or more measures for the Non-Empty Behavior property, we are establishing a requirement that each selected measure first be evaluated before the calculation expression is evaluated. If each selected measure is empty, then The expression is also treated as empty and no error is returned.
called non-empty behavior?
Nonempty behavior is important property for ratio calculations. If the denominator Is empty, an MDX expression will return an error just as it would if the denominator Were equal to zero. By selecting one or more measures for the Non-Empty Behavior property, we are establishing a requirement that each selected measure first be evaluated before the calculation expression is evaluated. If each selected measure is empty, then The expression is also treated as empty and no error is returned.
Q: What is a RAGGED hierarchy?
Under normal circumstances, each level in a hierarchy in Microsoft SQL Server Analysis Services (SSAS) has the same number of members above it as any other member at the same level. In a ragged hierarchy, the logical parent member of at least one member is not in the level immediately above the member. When this occurs, the hierarchy descends to different levels for different drilldown paths. Expanding through every level for every drilldown path is then unnecessarily complicated.
Under normal circumstances, each level in a hierarchy in Microsoft SQL Server Analysis Services (SSAS) has the same number of members above it as any other member at the same level. In a ragged hierarchy, the logical parent member of at least one member is not in the level immediately above the member. When this occurs, the hierarchy descends to different levels for different drilldown paths. Expanding through every level for every drilldown path is then unnecessarily complicated.
Q: What are the roles of an Analysis
Services Information Worker?
The role of an Analysis Services
information worker is the traditional "domain expert" role in
business intelligence (BI) someone who understands the data employed by a
solution and is able to translate the data into business information. The role
of an Analysis Services information worker often has one of the following job
titles: Business Analyst (Report Consumer), Manager (Report Consumer),
Technical Trainer, Help Desk/Operation, or Network Administrator.
Q: What are the different ways of
creating Aggregations?
We can create aggregations for faster MDX statements using Aggregation Wizard or thru UBO – Usage Based Optimizations. Always, prefer UBO method in realtime performance troubleshooting.
We can create aggregations for faster MDX statements using Aggregation Wizard or thru UBO – Usage Based Optimizations. Always, prefer UBO method in realtime performance troubleshooting.
Q: What is WriteBack? What are the
pre-conditions?
The Enable/Disable Writeback dialog box enables or disables writeback for a measure group in a cube. Enabling writeback on a measure group defines a writeback partition and creates a writeback table for that measure group. Disabling writeback on a measure group removes the writeback partition but does not delete the writeback table, to avoid unanticipated data loss.
The Enable/Disable Writeback dialog box enables or disables writeback for a measure group in a cube. Enabling writeback on a measure group defines a writeback partition and creates a writeback table for that measure group. Disabling writeback on a measure group removes the writeback partition but does not delete the writeback table, to avoid unanticipated data loss.
Q: What is processing?
Processing is a critical and resource intensive operation in the data warehouse lifecycle and needs to be carefully optimized and executed. Analysis Services offers a high performance and scalable processing architecture with a comprehensive set of controls for database administrators.
We can process an OLAP database, individual cube, Dimension or a specific Partition in a cube.
Processing is a critical and resource intensive operation in the data warehouse lifecycle and needs to be carefully optimized and executed. Analysis Services offers a high performance and scalable processing architecture with a comprehensive set of controls for database administrators.
We can process an OLAP database, individual cube, Dimension or a specific Partition in a cube.
Q: Name few Business Analysis
Enhancements for SSAS?
The following table lists the business intelligence enhancements that are available in Microsoft SQL Server Analysis Services (SSAS). The table also shows the cube or dimension to which each business intelligence enhancement applies, and indicates whether an enhancement can be applied to an object that was created without using a data source and for which no schema has been generated.
The following table lists the business intelligence enhancements that are available in Microsoft SQL Server Analysis Services (SSAS). The table also shows the cube or dimension to which each business intelligence enhancement applies, and indicates whether an enhancement can be applied to an object that was created without using a data source and for which no schema has been generated.
Enhancement
|
Type
|
Applied
to
|
No data source
|
Time Intelligence
|
Cube
|
Cube
|
No
|
Account Intelligence
|
Dimension
|
Dimension or cube
|
No
|
Dimension Intelligence
|
Dimension
|
Dimension or cube
|
Yes
|
Custom Aggregation
|
Dimension
|
Dimension (unary operator) or cube
|
No
|
Semiadditive Behavior
|
Cube
|
Cube
|
Yes
|
Custom Member Formula
|
Dimension
|
Dimension or cube
|
No
|
Custom Sorting and Uniqueness
Settings
|
Dimension
|
Dimension or cube
|
Yes
|
Dimension Writeback
|
Dimension
|
Dimension or cube
|
Yes
|
Q: What MDX functions do you most
commonly use?
This is a great question because you only know this answer by experience. If you ask me this question, the answer practically rushes out of me. “CrossJoin, Descendants, and NonEmpty, in addition to Sum, Count, and Aggregate. My personal favorite is CrossJoin because it allows me identify non-contiguous slices of the cube and aggregate even though those cube cells don’t roll up to a natural ancestor.” Indeed, CrossJoin has easily been my bread and butter.
This is a great question because you only know this answer by experience. If you ask me this question, the answer practically rushes out of me. “CrossJoin, Descendants, and NonEmpty, in addition to Sum, Count, and Aggregate. My personal favorite is CrossJoin because it allows me identify non-contiguous slices of the cube and aggregate even though those cube cells don’t roll up to a natural ancestor.” Indeed, CrossJoin has easily been my bread and butter.
Q: Where do you put calculated members?
The reflexive answer is “in the Measures dimension” but this is the obvious answer. So I always follow up with another question. “If you want to create a calculated member that intersects all measures, where do you put it?” A high percentage of candidates can’t answer this question, and the answer is “In a dimension other than Measures.” If they can answer it, I immediately ask them why. The answer is “Because a member in a dimension cannot intersect its own relatives in that dimension.”
The reflexive answer is “in the Measures dimension” but this is the obvious answer. So I always follow up with another question. “If you want to create a calculated member that intersects all measures, where do you put it?” A high percentage of candidates can’t answer this question, and the answer is “In a dimension other than Measures.” If they can answer it, I immediately ask them why. The answer is “Because a member in a dimension cannot intersect its own relatives in that dimension.”
Q: How do I find the bottom 10
customers with the lowest sales in 2003 that were not null?
A: Simply using bottomcount will return customers with null sales. You will have to combine it with NONEMPTY or FILTER.
SELECT { [Measures].[Internet Sales Amount] } ON COLUMNS ,
BOTTOMCOUNT(
NONEMPTY(DESCENDANTS( [Customer].[Customer Geography].[All Customers]
, [Customer].[Customer Geography].[Customer] )
, ( [Measures].[Internet Sales Amount] ) )
, 10
, ( [Measures].[Internet Sales Amount] )
) ON ROWS
FROM [Adventure Works]
WHERE ( [Date].[Calendar].[Calendar Year].&[2003] ) ;
A: Simply using bottomcount will return customers with null sales. You will have to combine it with NONEMPTY or FILTER.
SELECT { [Measures].[Internet Sales Amount] } ON COLUMNS ,
BOTTOMCOUNT(
NONEMPTY(DESCENDANTS( [Customer].[Customer Geography].[All Customers]
, [Customer].[Customer Geography].[Customer] )
, ( [Measures].[Internet Sales Amount] ) )
, 10
, ( [Measures].[Internet Sales Amount] )
) ON ROWS
FROM [Adventure Works]
WHERE ( [Date].[Calendar].[Calendar Year].&[2003] ) ;
Q: How in MDX query can I get top 3
sales years based on order quantity?
By default Analysis Services returns members in an order specified during attribute design. Attribute properties that define ordering are "OrderBy" and "OrderByAttribute". Lets say we want to see order counts for each year. In Adventure Works MDX query would be:
SELECT {[Measures].[Reseller Order Quantity]} ON 0
, [Date].[Calendar].[Calendar Year].Members ON 1
FROM [Adventure Works];
Same query using TopCount:
SELECT
{[Measures].[Reseller Order Quantity]} ON 0,
TopCount([Date].[Calendar].[Calendar Year].Members,3, [Measures].[Reseller Order Quantity]) ON 1
FROM [Adventure Works];
By default Analysis Services returns members in an order specified during attribute design. Attribute properties that define ordering are "OrderBy" and "OrderByAttribute". Lets say we want to see order counts for each year. In Adventure Works MDX query would be:
SELECT {[Measures].[Reseller Order Quantity]} ON 0
, [Date].[Calendar].[Calendar Year].Members ON 1
FROM [Adventure Works];
Same query using TopCount:
SELECT
{[Measures].[Reseller Order Quantity]} ON 0,
TopCount([Date].[Calendar].[Calendar Year].Members,3, [Measures].[Reseller Order Quantity]) ON 1
FROM [Adventure Works];
Q: How do you extract first tuple from
the set?
Use could usefunctionSet.Item(0)
Example:
SELECT {{[Date].[Calendar].[Calendar Year].Members
}.Item(0)}
ON 0
FROM [Adventure Works]
Use could usefunctionSet.Item(0)
Example:
SELECT {{[Date].[Calendar].[Calendar Year].Members
}.Item(0)}
ON 0
FROM [Adventure Works]
Q: How can I setup default dimension
member in Calculation script?
You can use ALTER CUBE statement. Syntax:
ALTER CUBE CurrentCube | YourCubeName UPDATE DIMENSION , DEFAULT_MEMBER='';
You can use ALTER CUBE statement. Syntax:
ALTER CUBE CurrentCube | YourCubeName UPDATE DIMENSION , DEFAULT_MEMBER='';
Please go through below link’s
http://20fingers2brains.blogspot.com/2013/12/ssas-2012-interview-questions-with.html
http://www.sqlsaturday.com/downloadsessionfile.aspx?presentationid=6372
http://www.informit.com/articles/article.aspx?p=1439192
http://dwbi1.wordpress.com/category/analysis-services/
http://www.informit.com/articles/article.aspx?p=1439192
https://www.udemy.com/blog/ssas-interview-questions/
http://www.sqlserverquest.com/p/ssas-interview-questions.html
http://www.questions-interviews.com/database/sql-server-analysis-services-ssas.aspx
http://www.sqlserverquest.com/p/ssas-interview-questions.html
http://www.iqspdf.com/2014/03/msbi-ssis-ssas-ssrs-interview-questions-and-answers.html
http://stackoverflow.com/questions/18766323/ssas-adding-exisiting-cube-from-one-environment-to-another
http://help.zaptechnology.com/Zap%20CubeXpress/CurrentVersion/Content/Topics/HowTo/ImportCreatedCube.htm
http://panigorthi.blogspot.com/2012/06/how-to-get-list-of-ssas-databases.html
Part
1: An Introduction to SSAS Performance and SQL Sentry Performance Advisor for
Analysis Services
Over the past couple years,
since SQL Sentry introduced Performance Advisor for
Analysis Services, I’ve heard many people say they see SSAS as a “black box” on
their servers with no real visibility into troubleshooting performance issues.
Performance Advisor (PA) lifts the shroud from Analysis Services, but for some
it may not be obvious how to interpret all the information gained from this new
visibility.
There are many resources out there that cover OLAP architecture and MDX, but much less in the way of an explanation to all the various metrics that are available and how they correlate with each other. In response, I developed a presentation I’ve given several times now at SQL Saturdays that I believe serves to “bridge the gap” for DBA’s that may be new to SSAS, or possibly even developers who lack a dedicated administrator and need to be able to tell where the bottleneck might be on their SSAS server. I was also privileged to be invited to an interview on RunAs Radio where I spent some time discussing this with Richard Campbell and Greg Hughes.
There are many resources out there that cover OLAP architecture and MDX, but much less in the way of an explanation to all the various metrics that are available and how they correlate with each other. In response, I developed a presentation I’ve given several times now at SQL Saturdays that I believe serves to “bridge the gap” for DBA’s that may be new to SSAS, or possibly even developers who lack a dedicated administrator and need to be able to tell where the bottleneck might be on their SSAS server. I was also privileged to be invited to an interview on RunAs Radio where I spent some time discussing this with Richard Campbell and Greg Hughes.
Finally I’ve had the time to
put much of this same advice down here. This is the first part of a couple
posts I’ll dedicate to the topic. Anyone working with SSAS should find this
useful, but users of PA for SSAS should find themselves even more proficient in
troubleshooting SSAS performance issues. We’ll start with a very basic
introduction to the service itself and how a query is handled, and move into
specific metrics and identifying bottlenecks shortly.
Basic Architecture
It’s
important to understand that SSAS is its own engine. The executable is
msmdsrv.exe. It’s licensed with SQL Server, but in large production
environments you aren’t likely to be running it on the same machine as a SQL
Server. There are two primary types of activity that occur here; querying and
processing.
Types of Activity
Querying can be done by various
methods, but the most common type of queries are Multidimensional Expressions
(MDX). This is just a standard language, similar to SQL, optimized for
calculations. I won’t go into the how’s and why’s of MDX itself as there are
many resources already available on its mastery. While this series won’t tell
you how to write MDX, by the time we’re done you should be able to tell, with a
few basic pieces of information described later, whether there is room for
query optimization or if the bottleneck lies elsewhere.
Processing is essentially the
maintenance activity of your cubes. It’s how cubes are created and updated with
new data and/or how existing data is recalculated and reorganized. Just as with
SQL Server, what level of processing you perform and when is essential to
ensuring proper performance without interfering with peak querying times. We’ll
dive into those options later.
The Formula Engine
Under
the covers there are two primary “engines” within SSAS, the Formula Engine (FE)
and the Storage Engine(SE). The FE accepts MDX requests and parses the query to
process. It will send requests to the SE when data is needed. It then performs
calculations on the retrieved data to provide a final result set. An
interesting thing about the FE is that it is single threaded. If you are
watching the CPU’s of your multi-core server with SSAS while a request is being
handled and you only see one core spike, it’s probably because the FE was doing
the work at that time.
The Storage Engine
The SE
on the other hand is multi-threaded. The SE reads and writes data to and from
the file system for the server. This can include working against the Windows
file cache. This is an important distinction between Analysis Services and the
relational engine in SQL Server. SQL Server’s relational engine has an internal
cache, and then goes to disk for any other needed IO. It does not utilize the
Windows file cache for database files. SSAS, on the other hand, does. This
means that when SSAS is interacting with the file system on your server, it
does not necessarily mean this will result in physical disk IO. SSAS utilizes
its own caches as well as the file system cache to get what it needs.
SSAS Caches
So how
does SSAS cache data? Each engine within SSAS has its own set of caches. The FE
caches store flat values (the Flat Cache) as well as calculated data (the
Calculation Cache). It’s important to understand that these caches are often
scoped, which means the data might only be available for a limited time or only
for a particular user, even in cache. The flat cache in particular is in fact
restricted to a max of 10% of the TotalMemoryLimit property in SSAS, which we
will discuss later.
So, that said, when we talk about SSAS cache optimization, warming, etc., we are more often dealing with the SE caches. I know that’s a bit of a simplification, but this is an introduction after all. ;)
So the SE consists of a Dimension Cache and a Measure Group Cache to store the data retrieved from the corresponding sources in the file system itself.
So, that said, when we talk about SSAS cache optimization, warming, etc., we are more often dealing with the SE caches. I know that’s a bit of a simplification, but this is an introduction after all. ;)
So the SE consists of a Dimension Cache and a Measure Group Cache to store the data retrieved from the corresponding sources in the file system itself.
Anatomy of an MDX Query
So now that we understand the basic components under the covers in SSAS, how are they used to handle a request? First, the query is accepted and the FE parses it. If the appropriate data is already available in the proper scope in the FE cache, some or all may come from there. The FE then requests any other needed data from the SE. The SE then retrieves whatever it can from its caches. If still more data is needed, the SE queries the file system on the server. The data from the file system may, at least in part, come from the Windows file cache. Remember we mentioned this distinction earlier. Anything else results in physical disk IO. These steps may be iterated as needed while data is moved from SE caches to FE caches, etc., but you get the picture. Another graphical representation of this process can be found in Appendix A of the Microsoft Best Practices Article “Identifying and Resolving MDX Query Performance Bottlenecks in SQL server 2005 Analysis Services”, authored by Carl Rabeler and Eric Jacobsen. I highly recommend reviewing this article for a very in-depth look into this topic. Even though the title mentions SQL Server 2005, it’s still very relevant and useful.
What can impact MDX
performance?
So now
that we have a good understanding of how SSAS handles MDX requests, the question
we need to answer is, “What can impact MDX performance, and how can we tell”?
That is the question we will answer in the next post.