skip to content

SQL: Selecting rows before and after

The introduction of WITH queries (Common Table Expressions or CTEs) in PostgreSQL gives us a chance to revisit and revise some past practices.

Using CTEs in combination with Window Functions allows us to better manipulate data at the SQL level rather than relying on PHP or other scripting languages.

Preparding our dataset

For the following discussion we're using this temporary data table in PostgreSQL:

CREATE TABLE tmp_events ( id serial, date date not null, name varchar not null, PRIMARY KEY(id) ); CREATE TABLE

With data imported using:

COPY tmp_events (date, name) FROM stdin; 2018-01-01 New Year's Day 2018-04-02 Easter Monday 2018-05-10 Ascension Day 2018-03-30 Good Friday 2018-05-21 Whit Monday 2018-05-27 Mother's Day 2018-06-17 Father's Day 2018-07-14 Bastille Day 2018-05-01 Labour Day 2018-08-15 Assumption Day 2018-11-01 All Saints Day 2018-11-11 Armistice Day 2018-12-26 St Stephens Day 2018-05-08 V-E Day 2018-12-25 Christmas Day \. COPY 15

Our goal is to extract a list of events before and after the highlighted "All Saints Day" event. Notice that we've shuffled the order of the events we're importing to make it more realistic.

Now obviously we can select all events ordered by date:

# SELECT * FROM tmp_events ORDER BY date; id | date | name ----+------------+----------------- 16 | 2018-01-01 | New Year's Day 19 | 2018-03-30 | Good Friday 17 | 2018-04-02 | Easter Monday 24 | 2018-05-01 | Labour Day 29 | 2018-05-08 | V-E Day 18 | 2018-05-10 | Ascension Day 20 | 2018-05-21 | Whit Monday 21 | 2018-05-27 | Mother's Day 22 | 2018-06-17 | Father's Day 23 | 2018-07-14 | Bastille Day 25 | 2018-08-15 | Assumption Day 26 | 2018-11-01 | All Saints Day 27 | 2018-11-11 | Armistice Day 30 | 2018-12-25 | Christmas Day 28 | 2018-12-26 | St Stephens Day (15 rows)

With this we can import the data into a PHP array, locate the 'current' event, and take a slice, but that means loading all the data into an array which gets more and more inefficient as the dataset grows.

What we'd actually like is a query to give us just the events either side of the 'current' event.

The old approach

Without CTAs we need two separate queries - one to look backwards in time and one to look forwards:

<?PHP $current_id = 26; $current_date = '2018-11-01'; // select previous two events SELECT * FROM tmp_events WHERE ( date <= '$current_date' AND id != '$current_id' ) ORDER BY date DESC, name DESC LIMIT 2 // select next two events SELECT * FROM tmp_events WHERE ( date >= '$current_date' AND id != '$current_id' ) ORDER BY date, name LIMIT 2 ?>

So we're making two queries, which each return two events, and then combining and displaying them on the page. Workable, but messy.

The CTE approach

The first step is to sort our events and assign a rank/number to each:

# SELECT id, date, name, row_number() OVER (ORDER BY date) FROM tmp_events; id | date | name | row_number ----+------------+-----------------+------------ 16 | 2018-01-01 | New Year's Day | 1 19 | 2018-03-30 | Good Friday | 2 17 | 2018-04-02 | Easter Monday | 3 24 | 2018-05-01 | Labour Day | 4 29 | 2018-05-08 | V-E Day | 5 18 | 2018-05-10 | Ascension Day | 6 20 | 2018-05-21 | Whit Monday | 7 21 | 2018-05-27 | Mother's Day | 8 22 | 2018-06-17 | Father's Day | 9 23 | 2018-07-14 | Bastille Day | 10 25 | 2018-08-15 | Assumption Day | 11 26 | 2018-11-01 | All Saints Day | 12 27 | 2018-11-11 | Armistice Day | 13 30 | 2018-12-25 | Christmas Day | 14 28 | 2018-12-26 | St Stephens Day | 15 (15 rows)

The new row_number column is now in date order, making it easier to identify the rows we want to extract. But we only have the id value, which is not entirely helpful. We need another (sub-)query to convert our id value into a row number:

SELECT row_number FROM (output of previous query) WHERE id = '$current_id'; row_number ------------ 12 (1 row)

The final step is to select the rows within ±2 rows of the current event - in this case rows 10 to 14:

SELECT * FROM (first query), (second query) WHERE ABS((first query).row_number - (second query).row_number) <= 2 ORDER BY date;

Now if you're not familiar with common table expressions, you might be confused at this point as to how this all comes together. Actually it's not that difficult. What we end up with is:

WITH cte AS ( SELECT id, date, name, row_number() OVER (ORDER BY date) FROM tmp_events ), current AS ( SELECT row_number FROM cte WHERE id = '$current_id' ) SELECT cte.* FROM cte, current WHERE ABS(cte.row_number - current.row_number) <= 2 ORDER BY cte.row_number;

The WITH statements create tempory in-memory tables with the results of the contained query for later reference. In this case the virtual tables cte and current.

Which gives us exactly the result we want:

id | date | name | row_number ----+------------+-----------------+------------ 23 | 2018-07-14 | Bastille Day | 10 25 | 2018-08-15 | Assumption Day | 11 26 | 2018-11-01 | All Saints Day | 12 27 | 2018-11-11 | Armistice Day | 13 30 | 2018-12-25 | Christmas Day | 14 (5 rows)

Not only have we reduced our code to a single query, we also now only need the id of the current events, and no longer the date as all sorting is done in the SQL.

< SQL

Post your comment or question
top