Storage Patterns: Pagination
This post will show how to support pagination for returning large data sets from a smart contract. It builds on concepts introduced in my post on doubly-linked lists.
When to Paginate
When large amounts of data are involved, it may not be possible for a smart contract to return an entire data set at once. Pagination is a way for a caller to make multiple smaller requests to retrieve the data. Some callers may not need the entire data set, in which case it’s useful to be able to retrieve a subset.
There are two potential limits to how much data a smart contract can return at once: gas limits and execution time. For a function call that’s part of a transaction (e.g. from another smart contract), gas is often a limiting factor. Each byte of data returned from the call consumes gas, as does iterating through the data set. Even if the caller is willing to supply a huge amount of gas, there is a block gas limit, which is the maximum amount of gas a single block can consume. If a transaction exceeds this limit, it can never be mined into a block.
For view
functions being called from outside the EVM (e.g. from JavaScript in a web app), gas is not a limiting factor because there is no transaction being executed. The node processing the call does the computation locally and returns the result. Each node gets to set its own processing limits—typically limiting execution time. If the call takes too long, it will fail.
When a data set may be large enough to approach either of these limits, use pagination.
Using Cursors
There are many valid API designs for pagination, but using a cursor is probably the most flexible. When a client requests a page of data, the server—in this case a smart contract—returns the requested data as well as a cursor. The cursor is an opaque value that the client needs to pass in its next request. This opaque value specifies where enumeration should continue.
Paging Through an Array
The following code supports appending to an array and enumerating that array using a very simple cursor: the index of the first item being requested.
Here’s a brief explanation of the fetchPage
function:
howMany
indicates how many items should be returned. If there aren’t enough remaining items in the array, the function will return fewer items.cursor
is the aforementioned cursor. It simply indicates the starting index for enumeration. The first call should pass0
, and subsequent calls should pass the returnednewCursor
.- If
cursor
is outside the bounds of the array, the array access will throw an error.
In the array example, the cursor is just an index into the array. The client could ignore the returned newCursor
and simply calculate the right cursor by counting how many items it had already received. In the next section, I’ll show an example where this is not the case, and the cursor truly needs to be returned from the smart contract.
Paging Through a Linked List
In a linked list, each node stores a link to the next node in the list. There is no way to jump straight to the nth item in the list. In my post about doubly-linked lists in Solidity, I built a doubly-linked list where nodes are stored in an array, and the node’s index in the array is its unique identifier. This means it’s possible to directly access a node by ID.
A node’s ID a good candidate for a cursor. When a page is requested, the returned cursor will be the ID of the next node in the list (the first node the client has not yet seen).
The following code is adapted from the doubly-linked list post but with one small change. Instead of using bytes
as the payload type, I’ve used bytes32
.1 The last function, fetchPage
, is new:
Here’s a brief explanation of the fetchPage
function:
- Like in the array example,
fetchPage
accepts parameterscursor
andhowMany
. - A
cursor
of 0 indicates that the client wants to start at the beginning of the list. That means starting atnodes[0].next
, which is the head of the doubly-linked list. - Arrays kept in memory cannot be resized, so it’s necessary to declare their length up front. Before enumeration, it’s impossible to know the correct length, so an array of size
howMany
is returned along with an extra return valuelength
that indicates how much of the array is valid data. - The
newCursor
return value is the index of the next node in the list. A client passes that to resume enumeration.
Getting a Consistent Snapshot
Pagination often has a downside when it comes to concurrency. It can be difficult for clients to deal with data changing during the course of enumeration. For example, in the linked list case, the node indicated by the cursor could be deleted, making the next call to fetchPage
fail.
Ethereum provides a nice workaround for these issues. When calling a view
function from client software, it’s possible to specify a block number to execute the call in the context of that block. By making each call with the same block number, a client can work with a consistent snapshot of the data.
Summary
- Pagination is necessary when gas or execution time limits prevent a client from reading the entirety of a large data set at once.
- Cursors are a flexible way to support pagination.
- Clients should treat cursors as opaque.
- A cursor should allow enumeration to be resumed efficiently.
- To guarantee a consistent snapshot, clients should consider specifying the same block number for all calls while paging through data.
- Solidity doesn’t (without an experimental new encoder) support returning dynamically-sized arrays of dynamically-sized arrays, making a return type of
bytes[]
impossible. ↩