Saturday, 11 March 2017

When to use Many to Many Relationships in Multi-dimensional SSAS

In SSAS, there is one thing that is really quite confusing: many-to-many relationships. This is where you have two dimensions that have a many-to-many relationship between them, for example, a Student may have many classes, and a class may have many students. If you want to relate class name to student name in a pivot table, how can you do that? Well, you need a bridge table, also called a Factless Fact table, and then you need to import this as a fact table, create a measure group, which is hidden, called Count, and then in the Dimension Usage tab of the designer, define a many-to-many relationship between the two tables.

Many to many relationships cannot be created between two fact tables, in SSAS. However, you could import those Fact tables as dimensions, along with a bridge table of the kind mentioned above, and treat your facts as dimension attributes. The problem with this is that you won't be able to do summing and all the usual aggregations that you can in a fact table.

Types of Hierarchy in a Data Warehouse

In Data Modeling, there are various types of hierarchy (such as a balanced hierarchy), and then there are various ways to implement a particular type of hierarchy in a database (such as a parent-child hierarchy). This post aims to explain both.

Types of Hierarchy

Balanced Hierarchy

In a balanced Hierarchy all the branches descend to the same level.

Example 1: Year > Quarter > Month > Day
Example 2: Division > Branch > Sales person 

Here are two representations of a balanced hierarchy (both are balanced): 

Un-Balanced Hierarchy

In an un-balanced hierarchy all the branches do not descend to the same level.

Example: A director with two employees who report to him, one of whom is a manager with multiple employees reporting to him, whereas the other has no employees reporting to him.



Ragged Hierarchy

In a ragged hierarchy, branches extend to different levels.

Example: In a company where everybody's pay is set according to what pay band they fall into, a CEO might be in pay band A, and he has Directors reporting to him who are in pay band B, and he also has a PA reporting to him in pay band C.


Multi-Parent Hierarchy

Many may not consider the following to be an example of a real hiearchy, but since we might refer to it as a hierarchy in the real world, I thought I would include it. In a multi-parent hierarchy, a lower level has multiple parents.

Example: Division > Branch > Sales Person, where the same sales person works at two branches.


Hierarchy Implementations

That's all very well and good, but how do we implement these in the database?

Parent-Child Hierarchy Structure

Here is an example of how you might implement any of the types of hierarchy listed above, except a multi-parent hierarchy. This is called a Parent-Child hierarchy, and I have chosen to implement the ragged hierarchy above:

Parent Child
A A
A B1
A B2
A1 C1
B1 C1
B1 C2

To implement a multi-parent hierarchy, you would need to have a multiple rows for the same member with multiple parents.

Natural Hierarchy Structure

A Natural Hierarchy structure cannot represent a multi-parent or a ragged hierarchy. Obviously, you would usually not start with Level 1, since they are all the same. So if Level 1 was the company, and Level 2 was the regional divisions withing the company, and Level 3 was the branch within the regional division, you would just start with Level 2, the regional divisions.

Level 1 Level 2 Level 3
A B1 C1
A B1 C2
A B2 C3
A B2 C4

Many-to-Many Hierarchy Structure

To represent a multi-parent hierarchy, you need a many-to-many structure, and in cases where a node has multiple parents, you need three three tables, one for the parents, one for the child, and one connecting both together with foreign keys from both (this is called a Factless Fact or a Bridge table, in Kimball methodology).

In SSAS, you import a bridge table into SSAS as a Fact table, and use a many-to-many relationship to link together the two dimensions, with a Measure Group of Count, which is hidden.

Facts, Dimensions, Measures, Attributes, Members

With all the dimensional modeling terms out there, it can be easy to get confused. Here is a quick guide to some of them for people who think in tables (which, I'm guessing, is most people who work with databases):

Fact can either refer to a fact table, or a row within that table, or a column within that table, or even a cell within that table. It is a very broad term. But with relation to the data within the table, a fact is always a number: 'a numeric fact about an event that happened'. Fact tables have foreign keys to the primary keys within dimension tables, which link together to describe the facts. So if the fact table records at each row a sale line, the fact might be the line total, and to find out what branch the sale line was recorded at, you'd have to link to the Branch dimension.

Measures are measurements of facts. So if your fact table records sale lines, then the measure might be $4.00 for a particular sale line, or $400,000 for all sale lines this week, i.e. a roll-up of all the sale lines. So a measure can refer to either a particular cell within the fact table, or a roll-up of cells.

Dimension can refer to either a dimension table, or a column within that table. It means something to group your facts into. So if your fact is a sale line, then a dimension might be the branch that sale line was recorded at.

Attributes refer to the columns within the dimension table.

Members refer to the cells within a particular attribute.

Degenerate Dimensions are columns within a fact table that are not numeric or foreign keys. They would normally go in a dimension of their own, but because there are so many of them, the 'carnality' (i.e. the number of distinct values within the column) is so high (i.e. many unique values compared to the number of rows within the column) they are put in the fact table, to save an expensive join to the dimension tables. Joins only work well when the number of rows in one table result set (table expression) is much smaller than the number of rows in another.

Example

So when a manager comes to you and says they want to see total sales by branch by year, they are telling you they want one measure (Total Sales), which means a 'Fact' will have to exist in a 'Fact table' called Sales, and this can be at any grain (sale line, sale, sale person, branch sales) so long as you can sum it up to give the total amount sales in a branch. They are also telling you they want two dimensions, which in our case will be represented in two dimension tables, a date table, and a branch table.

But note, your foreign key in your Fact table will have to join to these two dimension tables, so you need to ensure that your dimension tables go down to a low enough grain. If your fact contains the sales person FK, not the branch FK, your Branch dimension will need to go down to the Sales person grain.

Reasons Not to Use Junk Dimensions

Junk dimensions can be a pain in the neck. Often people will say that it is better just to add extra columns to the fact table, especially if the junk dimension represented only 'Yes' or 'No'. Here are some reasons why:

  • If you ever want to update junk dimensions, it can involve a change in the meaning of existing members (cells) in the junk dimension.
  • It is hard to maintain a surrogate key for a junk dimension
  • If you ever want to move an attribute (column) of a junk dimension into a fully fledged dimension, this will break all the reports that previously used that key from the junk dimension.
Nevertheless, junk dimensions do have these advantages:
  • Junk dimensions reduce the number of dimensions within a cube, making it easier to browse
  • Reducing the number of dimensions means the cube performs better during querying

When to Snowflake

According to standard Kimbal methodology, Star schemas should really be used for everything, except when it is impossible to represent the data in a Star structure, such as when you have many-to-many relationships. Here you need a Factless Fact (or Bridge) table, made up only of Foreign Keys.

The other time you might need an almost Factless Fact table is to represent the current status of data in a Star schema, such as when you want to represent a current list of all products available. This would require a column ProductAvailable in your Fact table, so that it can be summed, and a foreign key relationship to the Product Dimension. You may as well stick this column into the Product Dimension and then import that table into SSAS as both a Fact and a Dimension.