Saturday, 24 December 2016

A Quick Comment On Data Vault

If you don't know what Data Vault is, this blog post is not for you.

If you are considering using it in your business, here is my reaction:

Data Vault 2.0, as advanced in the major work Building a Scalable Data Warehouse with Data Vault 2.0 by Dan Linstedt (with a forward by none other than Bill Inmon) is a major investment for any business. If you thought the Inmon method was over-engineered, you ain't seen nothing yet.

Data Vault basically replaces your ODS layer providing an even more complicated data storage solution, which attempts to keep a record of literally everything (all data, regardless of how clean it is) that ever gets sent to your database. Data proceeds from the Data Vault through your ETL solution to your presentation layer, where your Star and Snowflake schemas are based.

For the vast, vast majority of businesses, Data Vault is the wrong route to go down. It will seriously complicate your ETL. It will be hard to hard to find developers who are familiar with the methodology. It will slow down the development of your data solutions.

For a select few businesses, who have a need to store almost everything that ever gets sent to their database, Data Vault is a breakthrough methodology that will help you out in a big way.


My SSRS Wish List

I have often felt that SSRS recieves the least love from Microsoft. Even Multidimensional Cubes received enough attention while being developed to allow them to become a mature product. I do not feel the same way about SSRS.

Yes, with SQL Server 2016 they have Mobile Reports now, but frankly, there are quite a few limitations of Mobile Reports, and I would say Mobile Reports have received even less love than Paginated reports.

Here are the features I would most like to see in SSRS:

  • Far more integration with Excel. I would like to be able to turn features on and off in excel when exporting from SSRS to Excel, and even publish data from Excel back to the SQL Server data source through the SSRS defined dataset.
  • Dynamic slicing and drill downs with linked charts! SSRS Mobile Reports can do it. Power BI can do it. Why, O, why, don't we have this feature in Microsoft's enterprise level reporting tool?
  • The ability to set the order in which parameters and data sets are loaded. I want to be able to load datasets before parameters, parameters before other datasets. Parameters and datasets are two of the most crucial concepts in any reporting tool. I would like to determine how they are ordered.

The relationship between Power BI and SQL Server

Power BI is constantly being updated. Therefore, if this post is out of date very soon, please forgive me.


Power BI and SQL Server are the two major data products sold by Microsoft. They share many of the same features and functionality, but work in different ways. This raises some questions:
  • How do Power BI and SQL Server relate to each other? 
  • How do Power BI and SQL Server compare against one another?
  • When should Power BI and SQL Server be used?
In this blog post I attempt to answer these questions, but be warned, my answers will not be comprehensive. There is so much more that could be said, but here are a few of the main things I have noticed:

How Power BI and SQL Server relate to each other

Power BI and SQL Server can both do a lot of the same things. If your organisation wants to take data from various source systems, transform it and use it in dashboards, both Power BI and SQL Server can do that. In fact, SQL Server Reporting Services will soon be able to integrate with Power BI (the rollout is planned late 2017; this post is being made late 2016).

SQL Server can do some things Power BI cannot, and Power BI can do some things SQL Server cannot. For example, SQL Server Integration Services can output data in formats other than dashboards. It can output data as XML files, CSV files, Excel files. It can also extract data from sources that Power BI can't, such as looping through a folder and extracting file names. But more on that in the next section.

Power BI is designed for power-users (non-technical users with a considerable knowledge of the product). It is good for use within a single department, as part of a single project. Any time that only a very limited number of users will need to access reporting, Power BI is ideal.

SQL Server, on the other hand, is designed as an enterprise level data tool. We're talking large-scale roll out to the entire business, huge amounts of data, data that has to conform to set business standards and government regulations.

Power BI grew out of Microsoft Excel. The Power BI team is separate now, but it still integrates with Excel well and brings many of the usability features of Office to the sphere of data manipulation and presentation.

SQL Server is still Microsoft's main data offering for business. It is a more mature product than Power BI, offering many more features and use cases than its younger brother.

How Power BI and SQL Server compare against one another

Reporting

Generally speaking, Power BI offers a better chart experience than SQL Server Reporting Services, with dynamic slicing, dynamic drill-downs and the ability to link charts together for slicing and drill down. Power BI also offers more charting 3rd party options to download. There is one feature of Power BI that I personally do not think is as helpful as some in the BI industry seem to think it is: that is, search. Power BI lets users construct their own reports in the search bar, but to do this they have to be pretty familiar with how to use it, and I'm not confident they are. 

SSRS, however, allows for better presentation of tables and matrices. It also allows for more customization in the way the report appears.

Data Integration

ETL in Power BI is a dream. It is so easy, one wonders why Microsoft didn't do ETL this way in SSIS. However, Power BI does not have all the features of SSIS. Pretty much everything that can be done in Power BI can be done in SSIS, but it cannot do many of the things SSIS can do, such as For Each Loops.

Moreover, SSIS is an enterprise level solution that can feed data into any data destination, whereas the Power BI ETL tool only ever works with Power BI reports, which rules it out for many uses, such as loading a SQL Server database. 

Power BI does not offer any scripting module, so one cannot do advanced things with it like scanning a folder structure for data about files.

Data Storage

Power BI is an in-memory database, whereas SQL Server offers a number of disk based alternatives, including the SQL Server Database Engine and SSAS Multidimentional Cubes. This is not so bad, but the biggest drawback to Power BI is that other data tools, such as SQL Server, cannot connect to it to extract the data! This is probably the worst feature of using Power BI, and the main reason to stick with SQL Server to create your data-sets. You can then import them into Power BI to create dashboards.

SQL Server boasts far more security features and customization than Power BI, although Power BI does have some, such as row based security. 

When should Power BI and SQL Server be used?

Because of the limitations of each product, Power BI and SQL Server should be used together. It is important that Microsoft BI professionals get to grips with both products. 

For most enterprise level needs, SQL Server will still be the first port of call for the BI professional. This is especially true with regards to data storage and data integration. However, this should be supplemented with the advanced reporting and dashboarding features in Power BI such as dynamic slicing in charts.

Thursday, 1 December 2016

What NOT to Use A Multi-Dimensional Cube For

The whole point of a Multidimensional Cube is make it easier for business users to slice aggregated numbers about their business. This means two things:
  1. If the data your business users request contains no numbers, a Cube is not the best option for your solution.
  2. If your business users do not want aggregations, a Cube is not the best option for your solution.
I feel this needed stating, as I have come across the sentiment that all reporting should come from a Cube, and that is simply not so. 

Saturday, 19 November 2016

What does Deterministic mean in SQL?

In reading SQL Server books, you may come across the term, "deterministic" without it being explained. So what does it mean?

Deterministic means that the SQL query determines two things about the result of the query:

  1. Firstly, it has to determine which rows you get back, but only if you're not selecting all rows. Let's say you had a table (dbo.Table) with three rows, and your query was SELECT * FROM dbo.Table. Your query would in this case be determining your result set, because its determining that it will return all rows from dbo.Table. Even if the data in the table changes (let's say a row was added) your query is still deterministic in that its determining that you'll get back all rows. But let's say we did SELECT TOP 1 * FROM dbo.Table. In this case, your query is not deterministic because its selecting a limited number of rows (in this case, 1 row) but it's not determining which row (out of the three in the table). Of course, if you only had one row in your table, it would be deterministic, because it would be selecting all rows. Again, lets take the query SELECT * FROM dbo.Table WHERE ID = 3. This query is deterministic even though there might be multiple rows with an ID of 3. This is because the query is telling SQL Server to return all rows where the ID is 3. So its determining the result set in that sense. But if you had SELECT TOP 1 * FROM dbo.Table WHERE ID = 3 then your query will only be deterministic if there is a single row with an ID of 3. If there are multiple rows with an ID of 3, then your query will only return one of those rows, but it hasn't determined which one, so it is called 'indeterministic'.
  2. Secondly, it has to determine the order in which those rows are presented. To do this in SQL you add an ORDER BY clause, though that is not always enough.

Making A Query Fully Deterministic

Lets say you have a table with three records in it:

   SELECT [Name], [Age] FROM [dbo].[NameAgeTable]

Name Age
John 45
Peter 23
Mike 87

How do you determine which results you get back?

In your query you have not specified a TOP clause, so your code has already determined which rows you get back, because it returns all of them! To determine the order in which the rows appear, you simply use ORDER BY, like this:

   SELECT [Name], [Age] FROM [dbo].[NameAgeTable] ORDER BY [Age]

Name Age
Peter 23
John 45
Mike 87

The ORDER BY clause sorts out the row order in-determinism, and it also makes sure that if you were to use a TOP clause, your query would determine which rows you got back! This is because TOP works in conjunction with ORDER BY where it does the ordering first, and then selects the top n records. So now you have sorted both row-order in-determinism and rows returned in-determinism, just by using the ORDER BY clause.

But there is a complication: what if the data in your ORDER BY column(s) had multiple instances of the same value in it, like this:

   SELECT [Name], [Age] FROM [dbo].[NameAgeTable] ORDER BY [Age]

NameAge
Peter45
John45
Mike45

If your ORDER BY column(s) has more than one instance of the same value, your query becomes in-deterministic again, because it has failed to determine the order in which the rows are returned.

So, to make your query fully deterministic, you need two things:
  1. An ORDER BY clause
  2. A unique result for each row returned by the ORDER BY list of columns. This can be achieved in various ways:
    1. By adding columns to the ORDER BY column list
    2. Using a SQL Server constraint
    3. Using the WITH TIES option. Don't know what one of those is? Read on...

Making A TOP Query Partially Deterministic without a unique ORDER BY list, using WITH TIES

Helpfully, if you are trying to make your TOP query deterministic, and you've set the ORDER BY column list you want, but the ORDER BY column list is not unique, there is still a way to determine which rows are returned by your query, without having a unique ORDER BY column list. The trick is to use the WITH TIES option, as follows. This gets rid of Rows Returned In-Determinism.

To ensure your query always returns the same rows (provided the data doesn't change) even when the ORDER BY list produces non-unique results, add the WITH TIES option. For example:

 SELECT TOP 3 WITH TIES [CityName] FROM [dbo].[Address] ORDER BY [CityName]

But be careful! The way this works is by returning all the duplicates from the table you're querying that appear in your result set. This means that if you do a TOP 2, you might more than 2 rows returned.

For example, lets say your data looked like this:

   SELECT * FROM [dbo].[LetterNumber]


Letter
A
B
B
C

If you used TOP with WITH TIES you would end up with the following:

   SELECT TOP 2 WITH TIES [Letter] FROM [dbo].[LetterNumber]

Letter
A
B
B

not:

Letter
A
B

Moreover, unless your ORDER BY list is unique, you will still have Row Order In-Determinism, because multiple rows with the same value (or combination of values over multiple columns) in the ORDER BY list will still have no determined order. So your query will only be partially deterministic. But of course, since it doesn't matter which way around your rows appear if they are identical, this doesn't matter much.

Now you know what deterministic means and how to make your queries deterministic.

Saturday, 12 November 2016

Where to type your DAX queries in SSMS and Excel

New to DAX querying and wondering where you are meant to type in your code? This is the post for you.

SSMS prior to 2017

Before one of the update to SSMS in 2017, you would have to type your DAX query into the MDX query window. Odd, huh? After the update, SSMS provided you with a seperate DAX query window. But if you still have an old version of SSMS, here is what you do:

In SSMS, click the Connect button in Object Explorer > right click the database you want to query > click New Query > select MDX > now start typing your DAX code. Yup, you read it right. You type DAX into the MDX query window and click Execute (or press F5)

Be careful with the Cube pane on the left - if you drag columns over from this pane the code will appear in the MDX language, not DAX. To query with DAX you have to type it manually.

Excel 2016

In Excel 2016, select the Data tab on the ribbon > click New Query > click From Database > click From SQL Server Analysis Services Database > type the server name and the database name > select the drop down arrow beneath to allow you to type your DAX query.

To edit your DAX query in Excel 2016, select the Data tab on the ribbon > click Show Queries > hover the mouse over the query (do not click) > click Edit > click the cog symbol next to Source on the right under Applied Steps > edit your DAX query.


Microsoft Database Query Languages


Microsoft provides various different types of database software with various compatible query languages:

Database Compatible Query Language
Microsoft Access SQL
Microsoft SQL ServerTSQL
Microsoft SQL Server Analysis Services Multidimensional Cubes MDX
Microsoft SQL Server Analysis Services Tabular Models DAX / MDX
Microsoft Power BI*
Microsoft Power Query for Excel*
DAX*

* With Power BI and Power Query for Excel one can query the data model stored in these programs using the DAX language. However, to manipulate the data as its being brought into the model, these programs allows you to use a language called Power Query Formula Reference (informally known as "M").


Friday, 28 October 2016

Cool SSMS trick you've probably never heard of! (Quick Tip)

Here's something you may never have tried before: ever wanted to indent multiple lines with spaces rather than tabs at the same time in SQL Server Management Studio (SSMS)? Now you can:

Here's how:

--Copy this to SSMS, hold Alt, click and drag your pointer *FROM HERE  then start typing
--Copy this to SSMS, hold Alt, click and drag your pointer             then start typing
--Copy this to SSMS, hold Alt, click and drag your pointer             then start typing
--Copy this to SSMS, hold Alt, click and drag your pointer             then start typing
--Copy this to SSMS, hold Alt, click and drag your pointer             then start typing
--Copy this to SSMS, hold Alt, click and drag your pointer             then start typing
--Copy this to SSMS, hold Alt, click and drag your pointer             then start typing
--Copy this to SSMS, hold Alt, click and drag your pointer    TO HERE* then start typing

--   ||  Now hold alt, click and drag a line between the two pipes and press space
--   ||  Now hold alt, click and drag a line between the two pipes and press space
--   ||  Now hold alt, click and drag a line between the two pipes and press space
--   ||  Now hold alt, click and drag a line between the two pipes and press space
--   ||  Now hold alt, click and drag a line between the two pipes and press space
--   ||  Now hold alt, click and drag a line between the two pipes and press space
--   ||  Now hold alt, click and drag a line between the two pipes and press space

--Cool, huh? I learnt this one from Itzik Ben Gan, SQL Master. Also, if you want to copy and paste in blocks, you can do the same. Try holding alt and not holding alt when you place your cursor (it does different things).