A Brief Introduction to Pagination
For many software engineers, endpoints for retrieving a list of entities are probably their favorite to implement–all they need to do is build an SQL statement that grabs all the available items, along with some other minor tasks. It’s all fun and games until the number of entities in the database becomes a bottleneck, where each GET request returns tens of thousands of items. You start to see server-side logs bombarded with error messages highlighted in red. Your dear users, previously satisfied with your application, begin to complain about the increasingly lackluster performance. Your product manager is on the brink of losing it and taking out their frustration on you. What?! You say you want to prevent all this from happening? Pagination is what you need! It can drastically improve your endpoints’ performance and save your servers from overload. Today, I will define pagination and introduce the most common pagination techniques.
What Is Pagination?
Let’s start with a basic overview of pagination. Pagination is a technique that seeks to break a huge result set into smaller and more manageable chunks. In simple terms, pagination is the division of large amounts of data into several pieces, or “pages,” as the name suggests. Some of you might want to ask, “Why do we need to complicate our code by including pagination? Why can’t we take the easy way out by giving our users all the records and call it a day?” There are mainly two reasons. First, it is easier for users to spot the information they need in a paginated table, thus improving the overall user experience. Second, pagination helps decrease the average number of returned items. Not only does it increase the speed of data delivery, but it also helps save resources for transferring large amounts of data.
Frontend Pagination vs. Backend Pagination
Pagination can take place in both the frontend and the backend. Before we jump into how we should approach pagination, I have to differentiate between frontend and backend pagination. Frontend pagination focuses on improving users’ visual perception by showing them information in easily digestible portions, which is more closely related to UI/UX design. On the other hand, the main purpose of backend pagination is to avoid overwhelming backend servers and enhance the efficiency of data transfer. Compared to its frontend counterpart, backend pagination is more relevant to us engineers. In the next section, I will mainly introduce backend pagination techniques. (If you are also interested in frontend pagination, please feel free to discuss it with the UI/UX designers of your company. They are probably more than glad to enlighten you on this topic.)
How to Implement Pagination on the Backend?
There are many different ways of implementing pagination for your endpoints. Here, I will cover some of the most mainstream pagination techniques.
Offset-based pagination is one of the most simple pagination techniques. Endpoint users leverage offset and limit to communicate to the endpoints which page of data they need. The offset indicates the number of records to skip in front of the target data. The limit determines the maximum number of records on one page.
When constructing SQL statements for retrieving paginated data, we can directly utilize these two parameters. Let’s say that we want to retrieve data from the
users table. We want to get the third page of data, and each page contains ten records. In this case, we can write the following SQL statement to get our desired data.
SELECT * FROM users
ORDER BY id ASC
The limit is 10 because for each page we only retrieve 10 records from the
users table. The offset is 20 because we need to skip the first two pages, each containing ten rows, to get the third page of 10 records.
Pros of Offset-Based Pagination:
Offset-based pagination is easy to implement because we can directly use
limit for building SQL statements. There isn’t any complex logic involved.
Users of an endpoint with offset-based pagination implemented can choose to jump to any page. You might consider this as a universal quality shared by all pagination techniques, but some pagination techniques do not have this quality, as I will demonstrate later.
Cons of Offset-Based Pagination:
Offset-based pagination suffers from poor performance, especially when the offset is a large number. The reason is simple: the database has to query the rows (offset) to be skipped along with the target data, only to discard them eventually, which creates a waste of computing resources.
Offset-based pagination is prone to returning inconsistent results. I will use the following two examples to demonstrate why:
a. Data Deletion: Assume a user is viewing the first page when someone deletes a record from the first page. Then the first record on the second page will become the last record on the first page because of the deleted record. When the user navigates to the second page, they won’t see the original first record.
b. Data Insertion: Assume a user is viewing the first page when someone inserts a record into the first page. Then the last record on the first page will shift backwards to the second page. When the user navigates to the second page, they will see the last record from the first page again, which is not supposed to happen.
Page-based pagination is essentially the same as offset-based pagination. The only difference is that page-based pagination uses page numbers to keep track of the current page that a user is viewing. We can easily convert a page number to an offset using the following formula:
offset = (page number - 1) * limit
Cursor-based pagination is a pagination technique that uses cursors, or pointers, to keep track of the last record that endpoint users see using cursors. A cursor usually contains the following information:
The columns used to identify the cursor location and their corresponding values (There has to be at least one unique column to ensure the uniqueness of the cursor)
In what order (ascending or descending) are the results sorted based on these columns
Whether the endpoint user requests results in front of or behind the cursor
Here is an example:
This cursor allows endpoint users to retrieve data sorted by nickname and ID in an ascending order behind the row with the following column values:
nickname column: "Shawn"
ID column: "23bf6f87-a4f5-4a9f-b441-1c98f7027e04"
Please note that only including the nickname column might be insufficient here because the nickname column is likely to contain duplicated values. Without a unique cursor, the database cannot locate the requested data with certainty. Adding the ID column as a secondary identifier ensures the uniqueness of the cursor.
The endpoints need to calculate and return cursors for going backward and forward along with the paginated data so that endpoint users can navigate through records in your data source with ease.
Pros of Cusor-Based Pagination:
Cursor-based pagination usually has the best performance among all pagination techniques, especially when there are a lot of records in the database. The reason is that, unlike offset-based pagination, cursor-based pagination does not need to query unwanted data only to throw them away. The usage of cursors enables cursor-based pagination to efficiently pinpoint and retrieve the requested records.
Cursor-based pagination is resilient to shifting rows because it does not use offset to determine which rows to return. Even with old rows removed or new rows added, cursor-based pagination won’t skip a row or display the same row on two different pages.
Cons of Cursor-Based Pagination
The implementation of cursor-based pagination is rather complicated. (It is so complicated that I feel the need to cover it in a separate technical blog to prevent this one from getting too lengthy. Please stay tuned!)
Endpoint users can’t jump to a specific page since all the navigation relies on forward and backward cursors.
Which Pagination Technique Should I Use?
The rule of thumb is to implement offset-based (or page-based) pagination when you do not expect the number of records to grow exponentially as the number of users increases or when it is vital to enable users to jump to any page. Otherwise, cursor-based pagination would be a better choice because it has superior performance.
In this article, I started by explaining what pagination does and why it is necessary. Then, I introduced some of the most commonly used pagination techniques and their respective advantages and drawbacks. After that, I gave some advice on when to use which pagination technique. I hope this article has given you enough information to make informed decisions on whether and how to implement pagination for your endpoints.