Home About Us SQL Interview Book Contact Us RSS
Articles
Tools
Code Snippets
Links
Tips & Tricks
FAQ
Resources
Articles
Code Snippets
Links
FAQ
Resources

Saravana Kumar
Vinod Kumar


Order By Clause – insiders perspective

In this article I will walk through some of the aspects in using Order by from a developers stand point. There are tons of pitfals and advantages in optimizing queries. And more often than not we say the query has got Order by hence cannot be optimized much. Here let me walk through some of the techniques you can adopt to optimize and improve performance even whist using Order by clause.

Introduction

Sorting data for output is a fundamental requirement for any application. This is often thought about as an requirement driven. But that is NOT the only requirement. Even the DBMS engine might choose for sorting for its operations like DISTINCT, GROUP BY, UNION clauses to identitfy duplicate records etc. Hence sorting is also used to effictiveness by the SQL Server engine also for its processing. The query engine on the contrary also use the sorting technique to merge two columns for comparision whist using it with inner join / outer joins. We will not deal with this implementation here but we will try to address the same on later articles.

The Insiders Note

Order by as I described earlier has been a requirement driven by business process more often. Rule #1 is the effect of number of rows the query performance. Increase in the #Number of rows and then applying the Order by is directly propotional. More the number of rows more the time it would take to return. Fundamentally we need to understand, to apply the order by clause the query engine needs all the records eligible for the criteria before starting to resolve the same. Order by clause cannot operate on a streaming basis. This is not possible. A tentative trend for the option discussed would look like:

Now Rule #2, The effect of column length is a big debate next. Now sorting a column on a Integer column Vs a Char column is the focus here. I would rate and the tests show that a sorting on an integer column is relatively faster compared to the counterpart strings. Most of our present applications are 32 bits and so are our hardwares tuned for 32 bit applications. So even as a comparision to a smallInt to a conventional Int column. We will get great benefits in sorting our Int columns over the conventional smallint columns. Here we are tapping the capabilities of the OS rather than the limitation / processing power of the application using it. Working on the same lines. If you can think of ordering rows by an integer column and then by a varchar column then the query would perform faster compared to a mere varchar column sorting with large keys.

Rule #3, this is an interesting note that I came across recently. Most of our database design revolve around using the Varchar columns rather than the conventional Char columns. The advantage of using the same is that Varchar columns are of variable in lenght an occupy space required for the data alone. Whist Char columns would Pad spaces behind the data to take up the space defined in its definition. Great, understandable but here is the tip for its usage with Order by clause. Using Varchar columns is fine. But columns length definition during creation does matter in the Order by clause. For example, below is an example of three tables (test1, test2, test3), all the three tables have a column name called name (defined as varchar(10), varchar(20) and Varchar(30) respectively). All the tables have around 80000 rows and all the name column contains a randomly generated name of four characters. All the tables contain the same data. The results are interesting ain't it?

Rule #4, More the cardinality of the rows and more the uniqueness of the rows more easier it is to Order them. Here take for instance we are ordering by a Varchar name column. If there are duplicate rows or if there are name in which the first few characters are the same then there is an overhead in Ordering the same. On the contrary if the records are completely unique in nature then the order by clause would be very fast.

Rule #5, This is a fairly simple rule in sorting. Using a clustered index key for sorting is very advantageous. This is because the keys are sorted logically and stored in this order itself. Hence the sorting algorithm neednot have to sort the same again. It is a fairly simple process internally. But having said that we might not be able to have clustered indexes on such columns always. But having an index of such columns that are used frequently in the Where Clause and the Order by Clause would be benificial for sure. The query engine would try to capitalize on the fact that the index pages are sorted.

Order by clause - Usages Scenarios

1. I am sure we all are aware of usage of Order by in our day-to-day applications. But just thought to add a line on its uniqueness too here in this article. Order clause is the only caluse that can refence an expression or an alias.

For Example:

Select name [myname] from test1 Order by myName

Select name [myname], salary * bonus [latestSalary] from test1 Order by latestSalary

2. Order by clause can also have an expression to do the sorting. Something on the lines of the example given below. Internally it is implemented as a pseudo column just before the sorting takes place. Once the sorting is done then this pseudo column is removed before the query engine can start displaying the output.

For Example:

Select name [myname], salary * bonus [latestSalary] from test1 Order by salary * bonus

3. Another interesting implementation is the usage using an index value from select list. This is one of the uniqueness of using Order by clause. If you are sure of the positions of the columns in the select clause then you can sort them in the index key of that order.

For Example:

Select id, name from test1 Order by 2

The above example will sort the tables data sorting by the name column. This is an interesting implementation.

4. I've been asked sometimes to sort data randomly. There can be multiple ways to achieve the same. But the best around is using the newID() trick. You saw in my usage scenario #2 above that we used an expression in the Order by caluse. It is an extention to the same. We use the NewID() function to generate a random GUID and then sort the rows internally using the GUID's. Hence we can be sure that it will sort randomly every time the query is hit.

For Example:

Select name from test1 Order by NewID()

Conclusion

I think we did have an insiders perspective to using Order by clause and its usage. This article gives some of the best practices you can incorporate while using Order By and also takes a look into how database designs can impact in performance. Applications come with requirements but it the design and the architecture of usage that decide on how these can scale out and perform to the expectations of the end-users.