Saturday, 11 March 2017

Facts, Dimensions, Measures, Attributes, Members

With all the dimensional modeling terms out there, it can be easy to get confused. Here is a quick guide to some of them for people who think in tables (which, I'm guessing, is most people who work with databases):

Fact can either refer to a fact table, or a row within that table, or a column within that table, or even a cell within that table. It is a very broad term. But with relation to the data within the table, a fact is always a number: 'a numeric fact about an event that happened'. Fact tables have foreign keys to the primary keys within dimension tables, which link together to describe the facts. So if the fact table records at each row a sale line, the fact might be the line total, and to find out what branch the sale line was recorded at, you'd have to link to the Branch dimension.

Measures are measurements of facts. So if your fact table records sale lines, then the measure might be $4.00 for a particular sale line, or $400,000 for all sale lines this week, i.e. a roll-up of all the sale lines. So a measure can refer to either a particular cell within the fact table, or a roll-up of cells.

Dimension can refer to either a dimension table, or a column within that table. It means something to group your facts into. So if your fact is a sale line, then a dimension might be the branch that sale line was recorded at.

Attributes refer to the columns within the dimension table.

Members refer to the cells within a particular attribute.

Degenerate Dimensions are columns within a fact table that are not numeric or foreign keys. They would normally go in a dimension of their own, but because there are so many of them, the 'carnality' (i.e. the number of distinct values within the column) is so high (i.e. many unique values compared to the number of rows within the column) they are put in the fact table, to save an expensive join to the dimension tables. Joins only work well when the number of rows in one table result set (table expression) is much smaller than the number of rows in another.

Example

So when a manager comes to you and says they want to see total sales by branch by year, they are telling you they want one measure (Total Sales), which means a 'Fact' will have to exist in a 'Fact table' called Sales, and this can be at any grain (sale line, sale, sale person, branch sales) so long as you can sum it up to give the total amount sales in a branch. They are also telling you they want two dimensions, which in our case will be represented in two dimension tables, a date table, and a branch table.

But note, your foreign key in your Fact table will have to join to these two dimension tables, so you need to ensure that your dimension tables go down to a low enough grain. If your fact contains the sales person FK, not the branch FK, your Branch dimension will need to go down to the Sales person grain.

No comments:

Post a Comment