SQL: Selecting n rows from each category
A common requirement for websites is to display the 'top' items from different categories in a list. This could be determined by date, price, popularity, other metrics or an aggregate function.
Most people would use a series of individual queries for this, but here we show how a single more complex query can return these results.
Table definitions
For the following example we're using two tables defined as follows:
CREATE TABLE categories (
id serial,
name varchar UNIQUE NOT NULL,
description text,
sort int NOT NULL DEFAULT 0,
PRIMARY KEY (id)
);
CREATE TABLE articles (
id serial,
categoryid int NOT NULL REFERENCES categories(id),
title varchar NOT NULL,
contents text,
PRIMARY KEY (id)
);
As you can see each article must be assigned to a category by way of the id reference.
Each table also contains a sort field. When displaying the data we want to sort categories in ascending order and articles in descending order.
The old way
In order to display the first n articles in each category using standard SQL we need to use multiple queries. First to select the categories and then a separate query for each category to get the desired subset:
$categories = `SELECT id AS categoryid, name AS category FROM categories ORDER BY sort;`
foreach($categories) {
/* display Category heading */
$articles = `SELECT id, title FROM articles WHERE categoryid='$categoryid' ORDER BY sort DESC LIMIT 5;`
foreach($articles) {
/* display Article title and link */
}
}
So if there are ten different categories it requires eleven separate calls to the database. There has to be a better way.
Using Window Functions
Using a window function we can partition results within a single query, grouping and sorting articles by category and then taking the top n from each via a subquery on the derived table.
Yes, that sounds like gibberish, but lets start with the subquery:
SELECT
c.name AS category,
a.id,
a.title,
row_number() OVER (PARTITION BY categoryid ORDER BY a.sort DESC) AS row
FROM
articles a INNER JOIN categories c ON (a.categoryid=c.dboid)
ORDER BY
c.sort;
This returns a list of all your articles, grouped by category and sorted, with an extra value row indicating the position of the article in its category. Note that we're sorting both the category and the articles in different parts of the query.
A sample of this output can be seen below:
category | id | title | row
----------------------------+-----+----------------------------------------+-----
Breakfast | 169 | Good Morning Green Smoothie | 1
Breakfast | 168 | Morning Energy Cocktail | 2
Breakfast | 167 | Quinoa Porridge | 3
Breakfast | 166 | Scrambled Tofu with Pesto & Tomatoes | 4
Breakfast | 165 | Super Food Smoothie | 5
Breakfast | 121 | Fresh Fruit Plate | 6
Breakfast | 110 | Coconut Oil & Lemon Juice Detox | 7
Breakfast | 115 | Morning Energy Cocktail | 8
Smoothies & Juices | 129 | The Coconut Cleanse Fertility Smoothie | 1
Smoothies & Juices | 128 | Berry Magic Smoothie | 2
Smoothies & Juices | 127 | Chocolate Dream | 3
Smoothies & Juices | 126 | Green Juice | 4
Smoothies & Juices | 125 | Green Peace Smoothie | 5
Smoothies & Juices | 124 | Green Smoothie Magic | 6
Smoothies & Juices | 123 | Green & Beet Juice | 7
Smoothies & Juices | 122 | Liver Flush | 8
Smoothies & Juices | 108 | Candida Drink | 9
The outer query then just filters out articles with a row count greater than n (as illustrated in the above table). In this case taking just the first five articles from each category, and only the columns we need (category, id and title) are returned:
$articles = `SELECT category, id, title FROM (
SELECT c.name AS category, a.id, a.title, row_number() OVER (PARTITION BY categoryid ORDER BY a.sort DESC) AS row
FROM articles a INNER JOIN categories c ON (a.categoryid=c.dboid)
ORDER BY c.sort
) AS foo WHERE row <= 5;`
$lastcategory = "";
foreach($articles) {
if($category != $lastcategory) {
/* display Category heading */
$lastcategory = $category;
}
/* display Article title and link */
}
This is a huge improvement over the original model as we can use a single query to return only the information we want to display. In some cases you might use another query to retrieve more details at the category level but that's still a maximum of two queries.
If you also cache query results, as you should, the page will require only a single file access instead of potentially dozens of separate database calls.
References
- PostgreSQL: PostgreSQL: Window Functions
- PostgreSQL: PostgreSQL: Table Expressions
Related Articles - Window functions
- SQL Selecting rows before and after
- SQL Selecting n rows from each category