Database Primary Key

Choosing a database primary key between big integers, ordered UUID, and nanoid.

Database Primary Key
Photo by Niklas Ohlrogge / Unsplash

First, let's talk about natural and surrogate keys and why the latter is superior.

In real-life situations, not many circumstances require using a natural key as the database's primary key. Natural keys are usually real-world observable data that carry business meanings or intelligence. Examples of natural keys include SSNs, car plate numbers, employee numbers, parts numbers, ISBN, or any prefix + number, i.e., cus_8764.

A surrogate key is a piece of meaningless data value and has no relationship to the data associated with that key. No doubt, surrogate keys are a popular choice for PK because they can be unique, immutable, and do not directly reveal information.

An excellent DB PK should be as compact as possible for better performance since it will be the index of the database table. Here are simple comparisons between the top 3 most popular primary key choices.

Big Integers Ordered UUIDv4 nanoid
Example 18446744073709551615 70E2E8DE-500E-4630-B3CB-166131D35C21 FYaebyAFTGca
Data Type BIGINT BINARY(16) CHAR(12)
Size 8 bytes 16 bytes 12 bytes
Sequential Yes Yes No
Max Records 2⁶⁴ 2¹²² 64¹² (about 2⁷²)
Expose timestamp No Yes No
Expose total records Yes No No
Generation at Database App App

Integer

The advantage of using auto incremental big integers is that they have the most compact value and are in the most human-readable format, which means they can be read, understood, and remembered. Another advantage is that you can quickly double-click the value to copy. Unlike the alternatives, they often have symbols in between. Simple numbers are considered a plus in debugging, for example: where id = '123'.

On the downside, running numbers give away the total count of your data, and the first, following, or previous records are easily guessable. They also required a roundtrip to the database to get the next value.

UUID / GUID

UUID or GUID (Globally Unique Identifier) is a popular choice for most software engineers. They are the most useful when you want to merge records between tables or across databases or systems without updating all the foreign keys for all the dependent tables. Note that the algorithm to generate version 1 UUIDs involves the timestamp and the device's MAC address.

The downside of using UUID is when inserting a new row in a large table, and it will cause a significate impact on the performance because UUIDs are not sequential. However, there are ways to make UUID sequential to address this fragmentation issue.

NEWSEQUENTIALID for Microsoft SQL Server or "ordered UUIDs" might be worth consideration. For example, an ordered UUID (swap timestamp bytes to the beginning position) consists of 48-bit timestamp data and still has 72-bit randomness. Thus we can sort it from the latest or oldest without calling other columns. While it has less randomness, it is still very reliable since the odds of collision are one in 35 billion.

Ordered UUID
Ordered UUID structure

Honorable mentions

Other alternatives include Snowflake ID by Twitter, Sonyflake by Sony, and ULID (Universally Unique Lexicographically Sortable Identifier). They are all trying to solve the same issue to be compact and sequential.

Twitter Snowflake
Twitter's Snowflake structure

Conclusion

What's my take? Why not both? Get the best from both worlds:

  • Use integers as primary keys, which makes it easy for table joins.
  • Keep this PK for internal use.
  • Add the UUID/ULID column and use it for the front end and APIs.
  • Add Another column for a slug to care about SEO.

Leave the performance issue to the cloud database provider, e.g., Amazon RSD, PlanetScale, etc.

References

  1. https://web.archive.org/web/20150511162734/http://databases.aspfaq.com/database/what-should-i-choose-for-my-primary-key.html
  2. https://tomharrisonjr.com/uuid-or-guid-as-primary-keys-be-careful-7b2aa3dcb439
  3. https://itnext.io/laravel-the-mysterious-ordered-uuid-29e7500b4f8
  4. https://www.informit.com/articles/printerfriendly/25862

Update: Found a more comprehensive page here: https://medium.com/geekculture/the-wild-world-of-unique-identifiers-uuid-ulid-etc-17cfb2a38fce

Update: Found another comparison with UUIDv7 and ULID: https://blog.daveallie.com/ulid-primary-keys

Subscribe to Jewei.toString()

Don’t miss out on the latest issues. Sign up now to get access to the library of members-only issues.
[email protected]
Subscribe