Adding a new API version

Roshan Paiva
3 min readNov 18, 2015

We’re going through the process of adding a new api version (v2). The tricky part is that this new api version includes a refactor of our existing database schema, which means breaking changes to our old api version (v1).

Here’s how we’re approaching this.

Goals:

- We want to maintain 100% uptime

- We want to maintain backward compatibility — v1 needs to operate as though nothing has changed

There are many ways to approach this:

  • We could serve v2 from a different schema, keeping the v1 schema intact, and migrating data from v1 over to v2.

The problem with this is that it could very quickly become a nightmare trying to keep both schema’s in sync with data. Meaning, if someone was to move from v1 to v2, their systems should immediately have access to all the old previous data. And so we’ll be having to continuously keep both schemas up-to-date.

  • We could migrate the v1 schema to use the new schema. Thus leaving us with just one schema to worry about.

This is what we chose to do. This left us with a cleaner db schema, ridding us of the old (mis-named) columns etc and having all our data in one schema.

Here’s our plan:

Phase 1 — v2 ready to deploy, v1 reads from new, writes to old and new columns

  1. Run the schema update with the caveat, we only make additions (non destructive changes). Meaning, we only include new columns/new tables. Deleting them come later.
  2. Develop the new v2 api version. This will use the updated schema (the schema that we will end up with).
  3. Update v1 such that it reads from the updated columns, and writes to both old and new columns.
  4. Ensure we have good test coverage for both v1 and v2.
  5. Run a migration script to update the new columns with data from the old (where applicable)

We run this through our deployment pipeline and let it bake for a while, giving us a chance to find errors etc.

The benefit of this is, if something breaks, we will still have our old columns, and data is still being written to them, and it’s very easy for us to revert back. The v1 code will still work.

Phase 2 — v1 reads and writes only to new columns

Once we have confirmed phase 1 is good,

  1. We update v1 code to read and write only to the new columns.

We run this through our deployment pipeline. All our tests should still pass as though nothing changed. v1 and v2 should still be operational, thus achieving our backward compatibility goal.

Phase 3 — Clean up

Once Phase 2 has baked for a while,

  1. Its now safe to delete the old columns from our schema.

Again, all our tests should still pass as though nothing changed.

Maintaining uptime

In order to maintain 100% uptime, we do the following,

  1. Phase 1 is deployed as a canary release first. We test it out internally, then to a subset of our users. We monitor the logs, error rates, API performance and throughput and system health.
  2. Then when we felt comfortable, we gradually roll out Phase 1 to all the servers in a rolling deployment form, while still maintaining uptime.
  3. Since we only perform db additions, no db downtime was necessary, cause v1 (non-canary servers) still continued to function.
  4. Similarly, phase 2 is also a canary release initially. Once it is rolled out to all servers and everything is confirmed to be operational, we run phase 3 a few weeks later — after confirming nothing was being written to them.

We’re still midway through this process. Let me know your thoughts on this approach.

--

--

Roshan Paiva

Engineering at @atlassian. Previously @cirrent and @docusign. @uclcs alumnus, budding photographer, sports enthusiast. Love solving problems