Saturday, 11 March 2017

Reasons Not to Use Junk Dimensions

Junk dimensions can be a pain in the neck. Often people will say that it is better just to add extra columns to the fact table, especially if the junk dimension represented only 'Yes' or 'No'. Here are some reasons why:

  • If you ever want to update junk dimensions, it can involve a change in the meaning of existing members (cells) in the junk dimension.
  • It is hard to maintain a surrogate key for a junk dimension
  • If you ever want to move an attribute (column) of a junk dimension into a fully fledged dimension, this will break all the reports that previously used that key from the junk dimension.
Nevertheless, junk dimensions do have these advantages:
  • Junk dimensions reduce the number of dimensions within a cube, making it easier to browse
  • Reducing the number of dimensions means the cube performs better during querying

When to Snowflake

According to standard Kimbal methodology, Star schemas should really be used for everything, except when it is impossible to represent the data in a Star structure, such as when you have many-to-many relationships. Here you need a Factless Fact (or Bridge) table, made up only of Foreign Keys.

The other time you might need an almost Factless Fact table is to represent the current status of data in a Star schema, such as when you want to represent a current list of all products available. This would require a column ProductAvailable in your Fact table, so that it can be summed, and a foreign key relationship to the Product Dimension. You may as well stick this column into the Product Dimension and then import that table into SSAS as both a Fact and a Dimension.

Sunday, 26 February 2017

Indexes over simplified

There is a ton of literature on how SQL Server indexes work, and I won't repeat all that here. But I want to over-simplify it to help beginners understand what kinds of indexes they should be adding to their queries.

Lets say you have a query like this:

SELECT BusinessEntityID
     , FirstName
FROM Person.Person
WHERE LastName 'Smith'

What type of index should you add to this query? 

Well, because its looking at three columns (BusinessEntityID, FirstName and LastName), you'll want an index to cover all three. 

But what type? There are two types of covering index:
  • Key Column cover: A covering index where the key to the index is made up of all the columns
  • Include Column cover: A covering index where the key to the index is made up of only one column (you can specify more), but the rest are included.
A Key column index over Lastname and FirstName (the order is important) looks like this:
  • Jones
    • Abraham
    • Isaac
  • Smith
    • Bob
    • Trudy
An include index with a key of Lastname including Firstname, looks like this:
  • Jones - Isaac
  • Jones - Abraham
  • Smith - Bob
  • Smith - Trudy
In short, a Key Column index is sorted by both LastName, and then FirstName. But it includes the firstname within the index structure itself, so that after having found the key (LastName), it doesn't have to go back to the table to find the other data (FirstName).

So for the following query, the ideal index would have its key on LastName, since that is what SQL Server will have to search through the data to find first, since we're filtering on it, and then once its found all rows with LastName of 'Smith', SQL Server will have to fetch the FirstName and BusinessEntityID for these rows, so we can include them in the index, to make it faster for SQL Server to return them.

SELECT BusinessEntityID
     , FirstName
FROM Person.Person
WHERE LastName 'Smith'

We could also have created an index with three columns LastName>FirstName>BusinessEntityID. This would be just as fast. In fact, it would help out in other scenarios too! For example, if we added an "ORDER BY LastName, FirstName" clause to the end. Because the FirstName would be indexed (i.e. arranged in order), not merely included.

But what if we change the query to this?

SELECT LastName 
     , FirstName
FROM Person.Person
WHERE FirstName 'Bob'

Because the key column index is ordered by LastName first, but the query needs to retrieve rows with FirstName first, the query will have to scan through every row of the index to find the data, making it less efficient.

But even though this is filtering on FirstName, it can actually still use the rowcover index we created. Of course, this won't be as fast as the query before, but SQL Server will still be able to make use of this index, making it a better option than having a key on Lastname and including firstname, which would not be able to be used if we put Firstname in the WHERE clause.

Friday, 3 February 2017

Database Corruption

Got database corruption? Want to check if you've got database corruption? Want to know what to do about database corruption? Here is the link for you!

This link is amazing! After following the link, scroll down to the bottom to find additional links to the following 10 parts:

Part I: What Is Corruption?
Part II: Simulating Corruption
Part III: Preventing Corruption
Part IV: CHECKSUM Page Verification
Part V: Storage Problem Alerts
Part VI: Regular Corruption Checks
Part VII: Backups
Part VIII: The Importance of Backups
Part IX: Responding to Corruption
Part X: Page Level Retore Operations
Part XI: Full Recovery Operations
Part XII: Recovery Sample
Part XIII: Recap

Here is the link:

http://sqlmag.com/blog/sql-server-database-corruption-part-i-what-corruption

Tuesday, 31 January 2017

Power BI Dashboards - No Drill Down - Aghhh!

A new feature was added to Power BI last year: drill down in charts. Microsoft didn't give us drill down in tables, but they did in charts.

But even with charts, there's a snag: the feature disappears when you pin a report to a dashboard. To access it, you need to click the Title of the report in the Dashbord, which will take the user through to see the report. 

Saturday, 14 January 2017

Simple Speak Vs Business Speak

The man said, “I am a digital and social media strategist. I deliver programs, products and strategies to our corporate clients across the spectrum of communications functions.”

When asked by the journalist what this meant, he replied, “I teach big companies how to use facebook.”

I like to use simple speak rather than its bloated counterpart, because people actually understand me when I do. This even applies to technical terminology. For example, I call a Sharepoint Document Library, a folder. I like to call a precedent constraint in SSIS, a line.

"Business Intelligence" itself is too much of a mouthful for me. I prefer to say, "Turning data into information" or, "making data easier to understand."

Why use simple speak in the context of business intelligence, you ask? Because I like to make data easier to understand.

Here is a video outlining the matter perfectly:

Tuesday, 3 January 2017

Agile, Waterfall or Neither for SSRS Reports?

Anyone who is familiar the Waterfall project methodology knows that part of it involves collecting all requirements up front. Compare that to Agile where the project consumes requirements in a more segmented way, through 'user stories'. In my view, these methodologies may have their use in other forms of programming, such as web-development, but when it comes to report writing, neither are ideal (though Agile comes closest).

In BI requirements always change. In fact, its unfair to demand that business user know exactly what they want. The probably do not know the data structure behind the data. It may be that what they want cannot be produced, and this is not realized until later in the development. Once the user has seen the result of the initial database query, they may realize (as they investigate the result) that they needed something quite different, or something extra. They simply couldn't have known that before. BI is all about investigation. That makes it different from other types of development.

In my view, those 'BI Professionals' who demand a REQUIREMENTS DOCUMENT up front, are amateurs. They have often come from development backgrounds, and have great skills in coding. But they don't know about the business realities of BI.

Being a great BI Professional is about so much more than just coding. You have to be able to work with the business; find out what they want, why they want it, and be good at explaining data realities to them, translating technical realities about the data into understandable language.

BI development is a learning process - that's why its called Business Intelligence, not Business Reporting or Business Dashboarding. Its not as simple as creating a report, a dashboard, a cube, a model, or writing a query. That often comes at the end of the process, once the data, usually copied into Excel and looked over by the user several times, has been verified and agreed to. Only then should it be instantiated into a report (turned into a report) or dashboard or tabular model or cube.