PostgreSQL Logical Replication (on a MacBook) example POC

This text based blog borrows from and somewhat enhances the example shown in: https://www.youtube.com/watch?v=OvSzLjkMmQo

The enhancement is showing you how to create these replications in custom schemas – not just public – and some gotch’yas tips when creating multiple instances on Mac OS. Regardless many thanks to the folks that created the YouTube above, it helped!

Assuming you already have postgresql installed – in this case on Mac. And my binaries were here:

/Library/PostgreSQL/15/bin

A little background on postgresql logical replication. It is way more flexible compared to physical replication (shipping WALs). It just ships the SQL, and the table to be replicated must exist on both sides.

You can replicate one table, all tables in a schema, or many schemas/tables or even the entire database. PostgreSQL logical replication does NOT replicate DDL. create table, create index, create sequence need to be kept in sync yourself manually.

The table on the target/replica side can be different than the source but should contain matching fields for the fields actually being replicated. You should have a matching primary key on both sides for the objects being replicated. While PostgreSQL logical replication will likely work without a PK, it will not be efficient. Just use a PK…

create a postgresql instance for the source, and one for the target

First on my Mac I logged in (su’d) as the postgres user

sudo su - postgres

point the locked postgres account at your postgres binaries (export to path) – on a Mac at least the EDP installed created the locked postgres account, and is not gonna let you create a .bash_profile so just add the path right before you run initdb (creates the instance for both the pub and the sub)

sudo su - postgres
export PATH=$PATH:/Library/PostgreSQL/15/bin

initdb -D /tmp/pub_db -l /tmp/pub_db/logfile
initdb -D /tmp/sub_db -l /tmp/sub_db/logfile

Using vi or nano, Turn on logical replication by setting turning on logical replication and customizing the client port in: /tmp/pub_db/postgresql.conf to 5433

wal_server=logical

port=5433

using vi or nano, customize the subscription client port in: /tmp/sub_db/postgresql.conf to 5434

port=5433
# don't need to do anything with wal_server on the sub side - only the pub side

start the publishing and the subscription instance ports 5433 and 5434 respectively

pg_ctl -D /tmp/pub_db -l /tmp/pub_db/logfile start
pg_ctl -D /tmp/sub_db -l /tmp/sub_db/logfile start

ps -ef | grep -i postgres

you can connect to the pub instance from any user that has access to psql as follows:

psql -p 5433 postgres

Still on the source/pub side – create objects to replicate – create a publication to replicate from

I created a new database & schema, then connected to the pub database and created a table in the db/schema myschema named table_1

-- create the pub database
postgres=# create database pub;
CREATE DATABASE
-- connect to the pub database
postgres=# \c pub
You are now connected to database "pub" as user "postgres".
-- create a schema to keep your test table in - otherwise it will likely end up in the "public" schema and eventually you will have a mess...
pub=# create schema myschema;
CREATE SCHEMA
-- create a table with a primary key - logical replication needs a PK for efficient replication
pub=# create table myschema.table_1(id int primary key, name varchar);
CREATE TABLE
-- insert some dummy data
pub=# insert into myschema.table_1 values(generate_series(1,10),'data'||generate_series(1,10));
INSERT 0 10
pub=#

-- messing with the search_path... so you do not have to specify schema

select * from pg_catalog.pg_database;

  oid  |  datname  | datdba | encoding | datlocprovider | datistemplate | datallowconn | datconnlimit | datfrozenxid | datminmxid | dattablespace | datcollate | datctype | daticulocale | datcollversion |               datacl
-------+-----------+--------+----------+----------------+---------------+--------------+--------------+--------------+------------+---------------+------------+----------+--------------+----------------+-------------------------------------
     5 | postgres  |     10 |        0 | c              | f             | t            |           -1 |          716 |          1 |          1663 | C          | C        |              |                |
 16388 | pub       |     10 |        0 | c              | f             | t            |           -1 |          716 |          1 |          1663 | C          | C        |              |                |
     1 | template1 |     10 |        0 | c              | t             | t            |           -1 |          716 |          1 |          1663 | C          | C        |              |                | {=c/postgres,postgres=CTc/postgres}
     4 | template0 |     10 |        0 | c              | t             | f            |           -1 |          716 |          1 |          1663 | C          | C        |              |                | {=c/postgres,postgres=CTc/postgres}
(4 rows)

-- defines the default schema for the database
alter database pub set search_path to pub;
ALTER DATABASE
-- defines the default schema for the current session - over-rides database
set search_path=myschema;
pub=# show search_path;
   search_path
-----------------
 "$user", public

To support efficient replication a “replication identifier” should be defined – normally the primary key.

Syntax to create a publication is straight forward enough:

pub=# select * from pg_tables where schemaname='myschema';
 schemaname | tablename | tableowner | tablespace | hasindexes | hasrules | hastriggers | rowsecurity
------------+-----------+------------+------------+------------+----------+-------------+-------------
 myschema   | table_1   | postgres   |            | t          | f        | f           | f
...

create publication pub_t1 for table myschema.table_1;
CREATE PUBLICATION
pub=#

Now on to the target / subscription side:

psql -p 5434 postgres
create database sub;
\c sub
create schema myschema;
set search_path=myschema;
create table myschema.table_1(id int primary key, name varchar);

-- another way to sync DDL from pub to sub by table is to pipe it like this:
pg_dump -p 5433 -t myschema.table_1 -s pub | psql -p 5434 sub

alter database set search_path=myschema;

create subscription mysub connection 'dbname=pub host=localhost user=postgres port=5433' publication pub_t1;
NOTICE:  created replication slot "mysub" on publisher
CREATE SUBSCRIPTION


-- note data is already here
sub=# select * from myschema.table_1;
 id |  name
----+--------
  1 | data1
  2 | data2
  3 | data3
  4 | data4
  5 | data5
  6 | data6
  7 | data7
  8 | data8
  9 | data9
 10 | data10

back to pub – lets add 10 more rows

insert into myschema.table_1 values(generate_series(11,20),'data'||generate_series(11,20));

back to sub

-- Voila! we have data on the sub side...

sub=# select * from myschema.table_1;
 id |  name
----+--------
  1 | data1
  2 | data2
  3 | data3
  4 | data4
  5 | data5
  6 | data6
  7 | data7
  8 | data8
  9 | data9
 10 | data10
 11 | data11
 12 | data12
 13 | data13
 14 | data14
 15 | data15
 16 | data16
 17 | data17
 18 | data18
 19 | data19
 20 | data20
(20 rows)

You can monitor and or troubleshoot replication from the publisher side with:

select * from pg_stat_replication;

Now insert a lot of data on the “pub” side and check the tablesize on the “sub” side – it will take a second or two to replicate.

Pub side insert:

delete from table_1;
-- insert a couple million rows
insert into myschema.table_1 values(generate_series(210000,2000000),'data'||generate_series(210000,2000000));
-- it will take a few seconds - you can check pg_stat_replication if they data is not propogating - and you should see a lag...

Sub side:

select count(*) from table_1;
SELECT pg_size_pretty(pg_relation_size('table_1'));
SELECT TO_CHAR(count(*), 'fm999,999,999') from table_1;

back to “pub”

-- delete vacuum and analyze
delete from table_1;
vacuum analyze table_1;
-- or truncate is propogated from pub to sub as well ...

So which ever way you got rid of the pub data – other than drop table – pure DDL is not propagated on the sub side:

select count(*) from table_1;
SELECT pg_size_pretty(pg_relation_size('table_1'));
-- should show zero / 0 rows

Post Views: 15

Leave a Comment Cancel Reply