Monday 10 September 2018

"Randomly" order data across multiple pages with MySQL

Last week I looked at how to randomly order a MySQL result set by using ORDER BY RAND(). This is useful for a one-off query in which you need to draw a prize winner or similar but isn't so useful if you need to page through the resultset across several pages, because the order will change as you move from page to page.
Other issues with using ORDER BY RAND() when paging are:
  • if you page forward or back to the pages you were aready on the data will be different from the first time you looked at that page
  • you may see the same records on the next page as you saw on the previous page
  • ordering using the RAND function is slow because it has to apply the RAND function to every row in the resultset which can be slow if it's large
There are a couple of fairly simple solutions which are detailed below.

Seed RAND with a value

Seeding the RAND function with a value in MySQL will result in a repeatable sequence of column values. Each time the query is executed the order of records will be the same. The seed value could then be passed from page to page to ensure the same order is used.
For example, where the seed value is 12:
SELECT ...
FROM ...
WHERE ...
ORDER BY RAND(12)
An advantage of using this solution is that it doesn't require any additional columns or periodic updates to the database, and you could have a different random resultset for each visitor; the seed value could be randomly selected in code on the first page and then passed from page to page.
The obvious disadvantage is that this will be slow for large resultsets.

Store a random number in the database which is updated regularly

Another option is to instead have an additional indexed column in the table which stores a randomly generated value. This is then updated periodically e.g. once a day.
The query to get data would then be along these lines, where my_random_column is the column that stores the random value:
SELECT ...
FROM ...
WHERE ...
ORDER BY my_random_value
A periodic process to update the table would do this:
UPDATE ...
SET my_random_value = RAND()
The advantage of using this method is that it isn't as slow as using RAND() every time the query is run and can use the index to help sort the data.
The disadvantages are that it requires an additional column in the table; an automatic process to re-randomize the value on a periodic basis; additional code to create the random value whenever a new record is created; and the additional process may cause extended locks on the table while the update is running depending on your storage engine.

Related posts:

0 comments:

Post a Comment