Wednesday, October 13, 2010

How to dynamically number rows in a SELECT Transact-SQL statement

In a Microsoft Support article available at http://support.microsoft.com/kb/186133, a technique is mentioned that enables a developer to dynamically number rows in a SELECT Transact-SQL statement. This technique is handy for paginated display of data. 

To achieve the same functionality, typically, developers resort to techniques like copying data (or primary keys) to a temporary table with an identity column and then selecting data for the requested page from the temporary table (or joining it to the source table if only primary keys were copied). Both of these (and other) approaches are tremendously slower compared to the technique outlined in the article mentioned above.
As an example, consider the following query:

select row_number() over (order by a.au_id) as sequence, a.au_id, a.au_lname, a.au_fname, a.city
from pubs.dbo.authors a

sequence             au_id       city
-------------------- ----------- --------------------
1                    172-32-1176 Menlo Park
2                    213-46-8915 Oakland
3                    238-95-7766 Berkeley
4                    267-41-2394 San Jose
5                    274-80-9391 Oakland
6                    341-22-1782 Lawrence
7                    409-56-7008 Berkeley
8                    427-17-2319 Palo Alto
9                    472-27-2349 Covelo
10                   486-29-1786 San Francisco
11                   527-72-3246 Nashville
12                   648-92-1872 Corvallis
13                   672-71-3249 Walnut Creek
14                   712-45-1867 Ann Arbor
15                   722-51-5454 Gary
16                   724-08-9931 Oakland
17                   724-80-9391 Oakland
18                   756-30-7391 Oakland
19                   807-91-6654 Rockville
20                   846-92-7186 Palo Alto
21                   893-72-1158 Vacaville
22                   899-46-2035 Salt Lake City
23                   998-72-3567 Salt Lake City

(23 row(s) affected)

As you can see, there is now a dynamically generated sequence in the result set. The best thing is that this sequence number is generated by sorting on a specific column (or a set thereof) and, so, any given row is guaranteed to always have the same sequence number provided that it’s sorted on a primary key (typically a numeric, auto-incrementing one, though, any primary key incrementing in a natural sort order would work).

Now imagine you want to paginate over this result set. To make the example simple, let’s just say that you want to select rows 5 through 10 (in a real world application, you’d calculate start and end sequence numbers based on the following information: total number of pages in result set, currently requested page, and rows per page). Here’s how you’d do that:

select *
from (
    select row_number() over (order by a.au_id) as sequence, a.au_id, a.city
    from pubs.dbo.authors a
) as ResultSet
where sequence >= 5 and sequence <= 10

sequence             au_id       city
-------------------- ----------- --------------------
5                    274-80-9391 Oakland
6                    341-22-1782 Lawrence
7                    409-56-7008 Berkeley
8                    427-17-2319 Palo Alto
9                    472-27-2349 Covelo
10                   486-29-1786 San Francisco

(6 row(s) affected)

As these examples demonstrate, with 2005 and later versions of SQL Server, there is no need to use temporary tables or complex cursors for paginating through large sets of data.

Happy coding.