In reading SQL Server books, you may come across the term, "deterministic" without it being explained. So what does it mean?
Deterministic means that the SQL query determines two things about the result of the query:
SELECT [Name], [Age] FROM [dbo].[NameAgeTable]
How do you determine which results you get back?
In your query you have not specified a TOP clause, so your code has already determined which rows you get back, because it returns all of them! To determine the order in which the rows appear, you simply use ORDER BY, like this:
SELECT [Name], [Age] FROM [dbo].[NameAgeTable] ORDER BY [Age]
The ORDER BY clause sorts out the row order in-determinism, and it also makes sure that if you were to use a TOP clause, your query would determine which rows you got back! This is because TOP works in conjunction with ORDER BY where it does the ordering first, and then selects the top n records. So now you have sorted both row-order in-determinism and rows returned in-determinism, just by using the ORDER BY clause.
But there is a complication: what if the data in your ORDER BY column(s) had multiple instances of the same value in it, like this:
SELECT [Name], [Age] FROM [dbo].[NameAgeTable] ORDER BY [Age]
If your ORDER BY column(s) has more than one instance of the same value, your query becomes in-deterministic again, because it has failed to determine the order in which the rows are returned.
So, to make your query fully deterministic, you need two things:
To ensure your query always returns the same rows (provided the data doesn't change) even when the ORDER BY list produces non-unique results, add the WITH TIES option. For example:
SELECT TOP 3 WITH TIES [CityName] FROM [dbo].[Address] ORDER BY [CityName]
But be careful! The way this works is by returning all the duplicates from the table you're querying that appear in your result set. This means that if you do a TOP 2, you might more than 2 rows returned.
For example, lets say your data looked like this:
SELECT * FROM [dbo].[LetterNumber]
If you used TOP with WITH TIES you would end up with the following:
SELECT TOP 2 WITH TIES [Letter] FROM [dbo].[LetterNumber]
not:
Moreover, unless your ORDER BY list is unique, you will still have Row Order In-Determinism, because multiple rows with the same value (or combination of values over multiple columns) in the ORDER BY list will still have no determined order. So your query will only be partially deterministic. But of course, since it doesn't matter which way around your rows appear if they are identical, this doesn't matter much.
Now you know what deterministic means and how to make your queries deterministic.
Deterministic means that the SQL query determines two things about the result of the query:
- Firstly, it has to determine which rows you get back, but only if you're not selecting all rows. Let's say you had a table (dbo.Table) with three rows, and your query was SELECT * FROM dbo.Table. Your query would in this case be determining your result set, because its determining that it will return all rows from dbo.Table. Even if the data in the table changes (let's say a row was added) your query is still deterministic in that its determining that you'll get back all rows. But let's say we did SELECT TOP 1 * FROM dbo.Table. In this case, your query is not deterministic because its selecting a limited number of rows (in this case, 1 row) but it's not determining which row (out of the three in the table). Of course, if you only had one row in your table, it would be deterministic, because it would be selecting all rows. Again, lets take the query SELECT * FROM dbo.Table WHERE ID = 3. This query is deterministic even though there might be multiple rows with an ID of 3. This is because the query is telling SQL Server to return all rows where the ID is 3. So its determining the result set in that sense. But if you had SELECT TOP 1 * FROM dbo.Table WHERE ID = 3 then your query will only be deterministic if there is a single row with an ID of 3. If there are multiple rows with an ID of 3, then your query will only return one of those rows, but it hasn't determined which one, so it is called 'indeterministic'.
- Secondly, it has to determine the order in which those rows are presented. To do this in SQL you add an ORDER BY clause, though that is not always enough.
Making A Query Fully Deterministic
Lets say you have a table with three records in it:SELECT [Name], [Age] FROM [dbo].[NameAgeTable]
Name | Age |
John | 45 |
Peter | 23 |
Mike | 87 |
How do you determine which results you get back?
In your query you have not specified a TOP clause, so your code has already determined which rows you get back, because it returns all of them! To determine the order in which the rows appear, you simply use ORDER BY, like this:
SELECT [Name], [Age] FROM [dbo].[NameAgeTable] ORDER BY [Age]
Name | Age |
Peter | 23 |
John | 45 |
Mike | 87 |
The ORDER BY clause sorts out the row order in-determinism, and it also makes sure that if you were to use a TOP clause, your query would determine which rows you got back! This is because TOP works in conjunction with ORDER BY where it does the ordering first, and then selects the top n records. So now you have sorted both row-order in-determinism and rows returned in-determinism, just by using the ORDER BY clause.
But there is a complication: what if the data in your ORDER BY column(s) had multiple instances of the same value in it, like this:
SELECT [Name], [Age] FROM [dbo].[NameAgeTable] ORDER BY [Age]
Name | Age |
Peter | 45 |
John | 45 |
Mike | 45 |
If your ORDER BY column(s) has more than one instance of the same value, your query becomes in-deterministic again, because it has failed to determine the order in which the rows are returned.
- An ORDER BY clause
- A unique result for each row returned by the ORDER BY list of columns. This can be achieved in various ways:
- By adding columns to the ORDER BY column list
- Using a SQL Server constraint
- Using the WITH TIES option. Don't know what one of those is? Read on...
Making A TOP Query Partially Deterministic without a unique ORDER BY list, using WITH TIES
Helpfully, if you are trying to make your TOP query deterministic, and you've set the ORDER BY column list you want, but the ORDER BY column list is not unique, there is still a way to determine which rows are returned by your query, without having a unique ORDER BY column list. The trick is to use the WITH TIES option, as follows. This gets rid of Rows Returned In-Determinism.To ensure your query always returns the same rows (provided the data doesn't change) even when the ORDER BY list produces non-unique results, add the WITH TIES option. For example:
SELECT TOP 3 WITH TIES [CityName] FROM [dbo].[Address] ORDER BY [CityName]
But be careful! The way this works is by returning all the duplicates from the table you're querying that appear in your result set. This means that if you do a TOP 2, you might more than 2 rows returned.
For example, lets say your data looked like this:
SELECT * FROM [dbo].[LetterNumber]
Letter |
A |
B |
B |
C |
If you used TOP with WITH TIES you would end up with the following:
SELECT TOP 2 WITH TIES [Letter] FROM [dbo].[LetterNumber]
Letter |
A |
B |
B |
not:
Letter |
A |
B |
Moreover, unless your ORDER BY list is unique, you will still have Row Order In-Determinism, because multiple rows with the same value (or combination of values over multiple columns) in the ORDER BY list will still have no determined order. So your query will only be partially deterministic. But of course, since it doesn't matter which way around your rows appear if they are identical, this doesn't matter much.
Now you know what deterministic means and how to make your queries deterministic.