It’s probably obvious that Postgres is my favorite
database. One minor grievance that I have with the project
is that its documentation is almost entirely optimized for
people who ultimately are users of the database rather than
developers of it. An unfortunate side effect of this is
that none of the repository’s standard files (e.g.
README
) give much insight into how to get started with
the source code.
In numerous places, some references in files and in errors
generated by make tasks are actively misleading in that
they’ll reference an INSTALL
for further instructions.
Some investigation will reveal that INSTALL
doesn’t
actually exist on master
; it’s only generated as part of
a release.
The excellent Postgres docs of course contain all the information needed to get started with development, but if its has one weakness, it’s that its overwhelming verbosity tends to obscure information.
Here I’ve tried to assemble some succinct instructions for getting started that are useful and more importantly, succinct. I don’t expect most of them to change all that much, but I’ll try to keep the document up-to-date in case they do.
It’s often desirable to have a stable release of Postgres running on your machine for day-to-day work along with your experimental build, so you may want to choose a non-standard install directory, data directory, and port for development.
A prefix is passed during configure
to specify the
target install directory. I use ./build
in the current
directory and name it $PG_BUILD_DIR
. I call my data
directory $PG_DATA_DIR
.
A port can be overridden with a command line argument
to a server or client command like psql
. It can also be
overridden for an entire session by setting the PGPORT
environmental variable. I’ve chosen 5433
as my port.
I use the excellent direnv to manage these variables. It
reads them out of an .envrc
in the source directory:
export PG_BUILD_DIR="$PWD/build"
export PG_DATA_DIR="$PWD/data/primary"
export PGPORT=5433
(Be sure to direnv allow
after saving the file.)
Clone the repository:
git clone https://github.com/postgres/postgres.git
Run configure with a prefix
pointing to your chosen
target build directory. Also, to save you some time later,
we’ll pass a few other useful options that will enable us
to debug with tools like gdb
:
./configure --enable-cassert --enable-debug --prefix $PG_BUILD_DIR CFLAGS="-ggdb -Og -g3 -fno-omit-frame-pointer"
Then build it. The -j
option gives you some parallelism
which will speed things up for any computer that’s still
running today.
make -j16 -s
The options passed are:
-j
: Build in parallel. Pick a number based off the
number of cores your computer has. I’m using an iMac Pro
with 8 cores, each of which is hyper-threaded, so I
specify a parallelism of 16.
-s
: Build quietly. Normally build commands produce a
lot of output which can obscure warnings emitted higher
up in the trace. Using -s
prevents this and produces
cleaner output.
Install the result to the prefix
configured above:
make install -j16 -s
Initialize a data directory and start an instance of
Postgres right in your terminal. This is convenient because
you can see any logging that it emits and you can restart
it easily with Ctrl+C
.
mkdir -p $PG_DATA_DIR
# initialize a data directory
$PG_BUILD_DIR/bin/initdb -D $PG_DATA_DIR
# start the server
$PG_BUILD_DIR/bin/postgres -D $PG_DATA_DIR -p $PGPORT
Now create a database and connect to it:
$PG_BUILD_DIR/bin/createdb -p $PGPORT brandur-test
$PG_BUILD_DIR/bin/psql -p $PGPORT brandur-test
Postgres doesn’t have much in the way of standard unit testing, but instead relies heavily on a thorough regression suite. Run it with:
make check
The command will start a new server, set it up, run the suite, and then tear it down. This is a reliable way to get consistent results, but is somewhat slow. A faster version is also provided which can use a server that you already have running elsewhere:
# requires $PGPORT to be set in the environment
make installcheck
There’s also a parallel version available to further improve speed (you should basically always prefer this variant):
# requires $PGPORT to be set in the environment
make installcheck-parallel
Building and testing Postgres is already pretty fast (with
parallel commands, make
for me takes ~30s from scratch
and running the test suite takes ~15s), but if you’re going
to be working with it heavily, you might want to take a few
steps to make it even faster.
ccache is a clever little program that pretends to be your compiler target and caches results so that they can be returned immediately the next time it’s run with the same inputs.
It’s trivial to install (on Mac OS, I use a simple brew
install ccache
) and causes very few problems, so it’s a
pretty easy enhancement.
Use it by telling configure
that you want ccache as your
C compiler:
./configure --enable-cassert --enable-debug --prefix $PG_BUILD_DIR --with-CC="ccache gcc" CFLAGS="-ggdb -Og -g3 -fno-omit-frame-pointer"
After warming up ccache by building once, then doing a
make clean -j16 -s
and building again, my runtime drops
from 30s to less than 5s. Incremental compiles are even
faster.
If you’re on Linux, you can try the gold linker, which is faster than the GNU linker. Unfortunately, it only supports ELF, so it’s not available to Mac OS users.
Just export it in your $CFLAGS
before running
configure
:
export CFLAGS="-fuse-ld=gold"
./configure ...
Postgres has a slightly unusual tradition of code
indentation which seems to have evolved to maximize the
number of bytes saved at a time when that mattered, and
which continues through to this day. A program similar to
Go’s gofmt
called pgindent
ships with the Postgres
source to help automatically reformat source files that are
inconsistent.
You may be asked to run pgindent
if someone notices that
your patch isn’t compliant, and it’s generally a good idea
to run it on any sources files that you changed before
producing a patch anyway.
A few dependencies need to be installed before pgindent
can run. The most up-to-date instructions on how to do that
can be found in its README (and hint:
perltidy
has a Homebrew formula).
After that’s done it can simply be run like so on a C file (where our current directory is the Postgres source root):
src/tools/pgindent/pgindent src/backend/utils/adt/mac.c
Given that pgindent
is brittle Perl code and appears to
have no test coverage, I’d recommend committing changes
before using it on any of your code.
Changes to Postgres are submitted as patch file email
attachments to the PG Hackers mailing list.
Traditionally, Postgres required that patches were in a
particular style called “context format” (as generated by
the diff
tool’s -c
option), but that constraint has
since loosened a bit as the “unified diff” (probably what
you’re used to seeing from programs like git diff
) has
become widely considered to more legible.
One good method for producing a patch that will be
acceptable on the mailing list is the use of git
format-patch
1. This command formats each commit as a
separate file named based on the commit message, and
includes each entire commit message within the files for
extra context. For example:
$ git format-patch master...
0001-Implement-SortSupport-for-macaddr-data-type.patch
Regardless of the tool you use, good commit hygiene is
still of paramount importance, so remember to squash and
fix using git rebase -i
before producing patch files.
If you need to test with a replica, it’s pretty easy to set that up by running a second Postgres instance listening on a different port and tweaking some configuration. Here’s a script that demonstrates how to do that:
#!/bin/sh
set -e
export PG_DIR="$PWD"
export PRIMARY_PORT=5433
export REPLICA_PORT=5434
read -p "Will delete $PG_DIR/data/{primary,replica}. Okay? [Ctrl+C cancels]" yn
rm -rf $PG_DIR/data/primary
rm -rf $PG_DIR/data/replica
# Initialize a new data directory for the primary, then use a bit of a shortcut
# by just copying it for use by the replica.
$PG_DIR/bin/initdb -D $PG_DIR/data/primary/
cp -r $PG_DIR/data/primary/ $PG_DIR/data/replica/
cat <<EOT >> $PG_DIR/data/primary/postgresql.conf
port=$PRIMARY_PORT
EOT
cat <<EOT >> $PG_DIR/data/replica/postgresql.conf
port=$REPLICA_PORT
shared_buffers=500MB
hot_standby=on
hot_standby_feedback=on
EOT
cat <<EOT >> $PG_DIR/data/replica/recovery.conf
standby_mode=on
primary_conninfo='host=127.0.0.1 port=$PRIMARY_PORT user=$USER'
EOT
cat <<EOT >> /dev/stdout
READY!
======
Start primary:
$PG_DIR/bin/postgres -D $PG_DIR/data/primary
Start replica:
$PG_DIR/bin/postgres -D $PG_DIR/data/replica
Create a database:
$PG_DIR/bin/createdb -p $PRIMARY_PORT mydb
Connect to primary:
$PG_DIR/bin/psql -p $PRIMARY_PORT mydb
Connect to replica:
$PG_DIR/bin/psql -p $REPLICA_PORT mydb
EOT
1 Note that git format-patch
is not officially endorsed and so your mileage
with its usage may vary.
Did I make a mistake? Please consider sending a pull request.