When Postgres stores physical rows, it aligns fields in 8-byte chunks 1. For 8-byte data types like a UUID or
bigserial you don’t have to think about field ordering because it doesn’t matter, but when a type’s size isn’t a multiple of eight, space is lost to padding unless fields are tetris’ed to maximize storage efficiency.
Say we have
CREATE TABLE foo (a int4, b int8, c int4).
int4s are 4 bytes, so they’re padded to get to 8-byte alignment:
This can be avoided by reordering the
int4s so they’re together (
CREATE TABLE foo (a int4, c int4, b int8):
Most of the time it’s not worth thinking about. Storage may not be maximally efficient, but it’s efficient enough, and the largest values in a row like
jsonb are stored out-of-band anyway. However, occasionally you have a situation where you want to store zillions of rows, and doing a little work to optimize tuple size yields returns (I came across a case like this recently which is why I’m writing about it). Eight wasted bytes is nothing, but what about eight times a million?
2nd Quadrant’s already written in good detail on column ordering (“On Rocks and Sand” – I’m giving them the lifetime award for most dramatic Postgres title) so I’ll avoid saying too much more on the subject, but I wanted to call out one of their queries that’s a handy way to reveal the type size of each column in a table:
SELECT a.attname, t.typname, t.typalign, t.typlen FROM pg_class c JOIN pg_attribute a ON (a.attrelid = c.oid) JOIN pg_type t ON (t.oid = a.atttypid) WHERE c.relname = 'user_order' AND a.attnum >= 0 ORDER BY t.typlen DESC; attname | typname | typalign | typlen -------------+-------------+----------+-------- id | int8 | d | 8 user_id | int8 | d | 8 order_dt | timestamptz | d | 8 ship_dt | timestamptz | d | 8 receive_dt | timestamptz | d | 8 item_ct | int4 | i | 4 order_type | int2 | s | 2 is_shipped | bool | c | 1 tracking_cd | text | i | -1 ship_cost | numeric | i | -1 order_total | numeric | i | -1
typalign value meanings are
c = char alignment,
s = short alignment,
i = int alignment (4 bytes on most machines),
d = double alignment (8 bytes).)
You’ll generally start by putting the larger 8-byte stuff at the beginning, then work in descending order of type size, and leaving the variable-length values at the end, so this query gives you an easy starting point.
Another factor to consider is nullability.
NULL values in Postgres aren’t stored as a value, but rather as a flag in a “null bitmap” located right after a heap tuple’s header. So if you’re really trying to optimize for size, it’d make sense to order nullable fields whose values are often omitted after non-null fields.