Implementing Relay-Style Paginated Fields in Ruby: An Opinionated Guide

Aug 12, 2023

Introduction

If you are using GraphQL as the API layer, one of the common performance issues are the unpaginated fields. These fields return an unbounded list of nodes, which increases the size of network payloads and leads to slow loading pages. GraphQL clients have no choice but to fetch the entire list of nodes, even if they may not need all of them.

If you dig deeper, the problem is because of a lack of well-documented pagination practices and tooling to support that. When implementing a new API field, engineering teams default to the (unbounded) plurals fields, rather than the (paginated) slicing approach.

In this blog post, we will dive into the world of GraphQL pagination.The lack of well-documented pagination practices and tools can be a daunting challenge for engineering teams. You can use this as a guide on implementing the Relay-style cursor-based paginated GraphQL fields with graphql-ruby connections.

Drawing from my own experiences of building a GraphQL backend using graphql-ruby and supporting a React/NextJS client, we aim to lower the conceptual barriers and make pagination adoption a breeze for your team. Also, this guide can help teams avoid implementing their own variants of pagination and move towards a consistent codebase.

Background: What does a paginated field look like?

From the GraphQL client’s perspective, a paginated field is a special type of object with subfields like edge, node, etc.

For example, you can query a list of onboarding employees of a company:

{
  company {
    onboardingEmployees(first: 10, after: "RW1wbG95ZWU6NmEyMmEzMDMtZGY5Ny00NDdjLTg2ZGEtOTE5ZmExZTc5NTVi") {
      edges {
        cursor
        node {
          name
        }
      }
      pageInfo {
        hasNextPage
      }
    }
  }
}

And it returns data with such a shape:

{
  "data": {
    "company": {
      "onboardingEmployees": {
        "edges": [
          {
            "cursor": "RW1wbG95ZWU6NmEyMmEzMDMtZGY5Ny00NDdjLTg2ZGEtOTE5ZmExZTc5NTVi",
            "node": {
              "name": "John Snow",
            }
          },
          {
            "cursor": "RW1wbG95ZWU6NGNjM2I3OGEtOGFhOS00NzAzLThlODQtYWYyNDkyZWQ1YWUx",
            "node": {
              "name": "Jane Doe",
            },
          }            
        ],
        "pageInfo": {
          "hasNextPage": false,
        },
      },
    }
  }
}

The actual people’s data live under each node which is nested under edges list. GraphQL clients (e.g., a React.js component) would have logic to read the data shape accordingly.

Edges and nodes are the graph theory concepts. Apollo has an interesting article if you are curious about that.

The actual pagination part is the query onboardingEmployees(first: 10, after: "<opaque-relay-cursor>"). You can pass in pagination arguments like first/after or last/before. This is better than the unbounded list fields because it allows you to control _how many _and _where _of lists you get.

In terms of cursors, they are up to the backend to parse and resolve to a record in the model. Usually, it is a uuid appended with a type. More on that in later sections.

Also, there is a pageInfo field that returns the page information data, including whether the current page has a previous or next page. Although the official spec does not prohibit it, we discourage API implementers to return aggregate fields under pageInfo , like total number of pages.

Implementing a new paginated GraphQL field

Overview

The overall steps to implement a new paginated field are as follows:

  • Add a new field with connection type
  • Decide the pagination algorithms (meaty)
  • Implement the connection class with methods

Field and type naming convention

To make our paginated field names more consistent as well as leverage what graphql-ruby does for us, we use the following naming convention:

Thing to name Convention Example
Graphql field Noun in plural. We don’t need modifiers since we can infer from the field type that it is a connection field that accepts pagination arguments. field :onboarding_employees
Connection class name Ends with Connection OnboardingEmployeesConnection

When the connection class type ends with Connection, we get the first/after/before/last pagination arguments defined for us for free, and we don’t have to add the connection: true option to the field definition

To the put the pieces together, we suggest defining such a field in our schema:

field :onboarding_employees,   Connections::OnboardingPeopleConnection.connection_type, null: false do
  argument :my_awesome_filter, String
end

# Implement the field and wire up the custom Connection class
def onboarding_employees(my_awesome_filter:, first:, after:, last:, before:)
  Connections::OnboardingPeopleConnection.new(
    context: context,
    # … other args
  )
end

Here, we assume you build custom connections by default. We don’t encourage using the built-in ActiveRecordRelationConnection because it requires exposing ActiveRecord models, which breaks modularity and does not work well in a scaled-up Rails app.

graphql-ruby also provides the GraphQL::Pagination::ArrayConnection helper class and convenience methods. One implementation choice is to Inherit the ArrayConnection. Doing so gives us a shorter class implementation. But you should consider the performance implications of having to deal with a long list. Ruby is keeping that list in memory. Also, ArrayConnection’s cursor is array index based, not uuid based. We should consider that if we want a unified cursor across different paginated fields.

Pagination field arguments

Standard arguments: first/after and last/before

The arguments usually come in pairs. You can supply either or both pairs. The back end will return the page results based on pagination algorithms discussed below.

The order must be consistent between first/after and last/before pairs.

Opinion: Returning the last page with before:null

Some UI components require the backend to support jumping to the last page, for example, Material data tables.

To support this use case, we recommend a work-around: Clients can pass last: <PageSize> and before: null to the field, and API returns the last page of the result set.

This is slightly diverging from the Relay connection specs, which defines: When before is null, we don’t apply it to the result set. We think this is a reasonable compromise.

Additional field arguments

Besides pagination arguments, you can define more custom arguments alongside them. Relay connection spec does not limit the number of arguments for the field, so you can support as many arguments as you see fit.

We often see arguments like filter, sorting order to further control the returned data, e.g., onboardingPeoplePaginated(includedPersonTypes: ["employee", "contractor"])

However, custom pagination arguments don’t come for free. You need to write code in your GraphQL resolver to read them and handle the results accordingly. It is worth bearing in mind that custom arguments could result in more complexity in your connection class definition.

Connection class structure

From the graphql-ruby doc, a custom connection class needs to define four methods:

  • #nodes, which returns a paginated slice of @items based on the given arguments
  • #has_next_page, which returns true if there are items after the ones in #nodes
  • #has_previous_page, which returns true if there are items before the ones in #nodes
  • #cursor_for(item), which returns a String to serve as the cursor for item

We recommend following the same structure in your Connection class:

image

Notes:

  • Most interface method are just thin wrapper and returns instance variables like @node, @has_next_page
  • Pagination algorithms are implemented in private method #load_nodes
  • We usually cache the results of #load_nodes to avoid recomputing the nodes

Pagination algorithms

Conceptually, pagination means cutting a set of things into slices and returning one of them.

Zooming in. The cutting or slicing part is basically applying the first / after / last / before / arguments into:

  • the set we operate on (filtered in what way, ordered by what column, both are optional)
  • Where we start the cut
  • Where we end the cut

First & after is one pair of arguments, and last & before is another. Clients can supply either one pair of arguments.

For the actual implementation, there are generally two choices:

  1. Slicing in ActiveRecord or SQL
  2. Slicing an Ruby array
  3. Hybrid of 1) and 2)

We discuss their pros and cons separately.

1) Slicing in ActiveRecord or SQL

For 1), it boils down to constructing a ActiveRecord query based on the arguments we pass in.

The cursors will be translated into SQL limit and offset params.

With this approach, the implementation is usually a bit long. But it is more performant because we are leveraging the database to do the heavy lifting for us, which can have all the help like indexing and caching. And it is usually much faster than approach 2).

2) Slicing an Ruby array

This implementation basically gets back the entire set of records from database or other sources. Then we apply sorting, filtering, and slicing like we operate on any other Ruby array.

The advantage of doing so is that the algorithm is relatively shorter and easier to maintain.

The disadvantage is that we don’t have the Database to do the heavy lifting for us. This is especially problematic when the set gets long and we have to keep everything in memory. Therefore, this approach only works for a relatively stable set that we know won’t grow long.

Hybrid of 1) and 2)

The hybrid implementation usually comes in the form of filtering / orderging in database, getting back a list, and then performing array slicing. So it shares the same concerns as Approach 2).

Cursor encoding in base64

We propose to follow the Relay style and encode the cursor in base64. For example, Shopify uses such an ID in their paginated API:

eyJsYXN0X2lkIjo3MDE3MjQ0MTY0MTUyLCJsYXN0X3ZhbHVlIjoiNzAxNzI0NDE2NDE1MiJ9

After decoded, it is:

{"last_id":7017244164152,"last_value":"7017244164152"}

The encoded data is usually implementation specific to the field. That means we can define our own data structure and encode it as a base64 string at the end.

Note that it’s okay for developers to decode relay cursors for troubleshooting purposes. But the intention of encoding is to make it opaque to API clients, meaning the cursor should not mean anything other than its value. In other words, clients should not decode it while building UIs.

Finally, graphql-ruby provides the base64 encoding helpers by default, so we could use #encode like below:

def cursor_for(employee)
  # base64 encoded. The payload format is custom made
  encode("employee:#{employee.uuid}")
end

Summary

GraphQL fields that return an unbounded list of items is problematic because it gives API clients no choice but to always fetch the entire list. Engineering teams may be aware of the problem, but there is generally a lack of documentation about how to build paginated fields.

This post is a hands-on guide about how to implement the Relay-style cursor-based paginated GraphQL fields with graphly-ruby. The guide covers a range of implementation topics, including field naming conventions, connection class structure, pagination algorithms and cursor encoding. These guidelines should reduce the adoption barriers and make it easier for engineering teams to build performant yet flexible GraphQL APIs!




 Share: