2023-07-12

REST API design: lessons learned from going down the wrong (and right) paths

Recently I had a chance to participate a sort of retrospective at work on how we organize and write REST API endpoints. That made me look back at some API design decisions I’ve made (and seen) over the years and realize that some of these choices are not at all obvious and that you often don’t see the tradeoffs until you’ve done it both ways and seen the results.

So here are some of the less obvious REST API design lessons that I’ve learned over the years.

1. Do not nest objects in URL paths

Suppose you have task objects that always belong to project objects. What would should be the URL to get information about a task? Should it be this:

1	GET /api/projects/{projectId}/tasks/{taskId}

or this?

1	GET /api/tasks/{taskId}

Having gone both ways, I now confidently think that the bottom choice is the best one. If you have a taskId, then projectId is redundant, and by including projectId in the URL you’re making the caller look up a piece of information that they may not have or need otherwise. And what happens when the caller provides the wrong projectId? Should you return 404 Not Found even though you can unambiguously locate the task? You’ll also have to add extra validation code to check that the task is really in the right project—something that you wouldn’t need to do with the shorter version of the URL.

You may ask: “But how do you add a task to a project then?”

Well, how about this?

POST /api/tasks
...
{
    "projectId": 1234567,
    ...
}

You may then ask, “But how do you look up tasks for a project if you can’t do GET /api/project/{id}/tasks?”

Like this:

1	GET /api/tasks?projectId=<projectId>

As a bonus, you get an endpoint to list all the tasks irrespective of which project they’re in. And this approach treats projectId as one of multiple optional filters that may be applied on /api/tasks, alongside maybe status, assignedTo, or any others.

2. Of the 2xx HTTP status codes, you only need 200 OK

This one wasn’t obvious to me until a coworker recently pointed out that he’s never seen a case where the caller had to make a decision based on the type of 2xx return code they got back.

And indeed, although RFC 9116 — the current HTTP spec — defines seven different 2xx success codes, in practice, the client-side code almost never needs to make a decision based on whether it receives e.g. 200 OK or 201 Created. Alright, in very rare cases it’s convenient when the server returns 204 No Content when you’re debugging and trying whether the request is really supposed to be missing a body. But you can accomplish the same thing with Content-Length header, or by some other means.

I have, however, seen developers spend hours going back and forth on which 2xx code should be used for a specific endpoint. So to save everyone’s time, let’s just use 200 OK and forget about the rest.

3. Encode timestamps in a human-readable format

If you’re sending or receiving timestamps, it may be tempting to use something like the Unix epoch time in milliseconds (the number of milliseconds since 00:00 UTC on Jan 1, 1970). It’s not a bad choice, and it’s quite convenient to work with.

Except it’s not human readable. Something like ISO 8601 (e.g. 2023‐07‐12T23:57:23Z) is easier to read for a human, and you can even drop the colons (2023‐07‐12T235723Z) to make it URL-compatible.

It may take a bit extra work to configure your framework to emit times in the right format, but it will make using (and debugging) the API a lot easier.

By the way, don’t forget to include time zone in your timestamps, even if it’s just a Z at the end.

There are broadly two ways to do pagination:

Page number and page size: e.g. https://.../things?page=10&pageSize=20. This will work alright up to maybe page 10,000 or so, at which point, if the server is fetching the results from a database, you’ll start running into a quirk with OFFSET that can substantially degrade performance. So while this pagination approach is convenient when the results go into a paginated UI, it’s a bad idea to use it for an endpoint that will be called by export jobs or by anything else that will ask for a large number of pages with high offsets. Instead, it’s much better to use…
WHERE-based pagination: where the caller provides something akin to a where clause, like https://.../things?fromId=12345678. If this argument corresponds to a DB index, the response times should stay constant-ish as fromId argument increases, unlike OFFSET whose response times tend to increase linearly.

Oh, and if anyone will be iterating over multiple pages of responses, it really helps to include the URL of the next page in the response body:

{
    "data": [ ... ],
    "metadata": {
        "nextPage": "https://..."
    }
}

1. Do not nest objects in URL paths

2. Of the 2xx HTTP status codes, you only need 200 OK

3. Encode timestamps in a human-readable format

4. Pagination: avoid offsets and provide a link to the next page

Subscribe to email notifications