This text based blog borrows from and somewhat enhances the example shown in: https://www.youtube.com/watch?v=OvSzLjkMmQo
The enhancement is showing you how to create these replications in custom schemas – not just public – and some gotch’yas tips when creating multiple instances on Mac OS. Regardless many thanks to the folks that created the YouTube above, it helped!
Assuming you already have postgresql installed – in this case on Mac. And my binaries were here:
/Library/PostgreSQL/15/bin
A little background on postgresql logical replication. It is way more flexible compared to physical replication (shipping WALs). It just ships the SQL, and the table to be replicated must exist on both sides.
You can replicate one table, all tables in a schema, or many schemas/tables or even the entire database. PostgreSQL logical replication does NOT replicate DDL. create table, create index, create sequence need to be kept in sync yourself manually.
The table on the target/replica side can be different than the source but should contain matching fields for the fields actually being replicated. You should have a matching primary key on both sides for the objects being replicated. While PostgreSQL logical replication will likely work without a PK, it will not be efficient. Just use a PK…
create a postgresql instance for the source, and one for the target
First on my Mac I logged in (su’d) as the postgres user
sudo su - postgres
point the locked postgres account at your postgres binaries (export to path) – on a Mac at least the EDP installed created the locked postgres account, and is not gonna let you create a .bash_profile so just add the path right before you run initdb (creates the instance for both the pub and the sub)
sudo su - postgres export PATH=$PATH:/Library/PostgreSQL/15/bin initdb -D /tmp/pub_db -l /tmp/pub_db/logfile initdb -D /tmp/sub_db -l /tmp/sub_db/logfile
Using vi or nano, Turn on logical replication by setting turning on logical replication and customizing the client port in: /tmp/pub_db/postgresql.conf to 5433
wal_server=logical port=5433
using vi or nano, customize the subscription client port in: /tmp/sub_db/postgresql.conf to 5434
port=5433 # don't need to do anything with wal_server on the sub side - only the pub side
start the publishing and the subscription instance ports 5433 and 5434 respectively
pg_ctl -D /tmp/pub_db -l /tmp/pub_db/logfile start pg_ctl -D /tmp/sub_db -l /tmp/sub_db/logfile start ps -ef | grep -i postgres
you can connect to the pub instance from any user that has access to psql as follows:
psql -p 5433 postgres
Still on the source/pub side – create objects to replicate – create a publication to replicate from
I created a new database & schema, then connected to the pub database and created a table in the db/schema myschema named table_1
-- create the pub database postgres=# create database pub; CREATE DATABASE -- connect to the pub database postgres=# \c pub You are now connected to database "pub" as user "postgres". -- create a schema to keep your test table in - otherwise it will likely end up in the "public" schema and eventually you will have a mess... pub=# create schema myschema; CREATE SCHEMA -- create a table with a primary key - logical replication needs a PK for efficient replication pub=# create table myschema.table_1(id int primary key, name varchar); CREATE TABLE -- insert some dummy data pub=# insert into myschema.table_1 values(generate_series(1,10),'data'||generate_series(1,10)); INSERT 0 10 pub=# -- messing with the search_path... so you do not have to specify schema select * from pg_catalog.pg_database; oid | datname | datdba | encoding | datlocprovider | datistemplate | datallowconn | datconnlimit | datfrozenxid | datminmxid | dattablespace | datcollate | datctype | daticulocale | datcollversion | datacl -------+-----------+--------+----------+----------------+---------------+--------------+--------------+--------------+------------+---------------+------------+----------+--------------+----------------+------------------------------------- 5 | postgres | 10 | 0 | c | f | t | -1 | 716 | 1 | 1663 | C | C | | | 16388 | pub | 10 | 0 | c | f | t | -1 | 716 | 1 | 1663 | C | C | | | 1 | template1 | 10 | 0 | c | t | t | -1 | 716 | 1 | 1663 | C | C | | | {=c/postgres,postgres=CTc/postgres} 4 | template0 | 10 | 0 | c | t | f | -1 | 716 | 1 | 1663 | C | C | | | {=c/postgres,postgres=CTc/postgres} (4 rows) -- defines the default schema for the database alter database pub set search_path to pub; ALTER DATABASE -- defines the default schema for the current session - over-rides database set search_path=myschema; pub=# show search_path; search_path ----------------- "$user", public
To support efficient replication a “replication identifier” should be defined – normally the primary key.
Syntax to create a publication is straight forward enough:
pub=# select * from pg_tables where schemaname='myschema'; schemaname | tablename | tableowner | tablespace | hasindexes | hasrules | hastriggers | rowsecurity ------------+-----------+------------+------------+------------+----------+-------------+------------- myschema | table_1 | postgres | | t | f | f | f ... create publication pub_t1 for table myschema.table_1; CREATE PUBLICATION pub=#
Now on to the target / subscription side:
psql -p 5434 postgres create database sub; \c sub create schema myschema; set search_path=myschema; create table myschema.table_1(id int primary key, name varchar); -- another way to sync DDL from pub to sub by table is to pipe it like this: pg_dump -p 5433 -t myschema.table_1 -s pub | psql -p 5434 sub alter database set search_path=myschema; create subscription mysub connection 'dbname=pub host=localhost user=postgres port=5433' publication pub_t1; NOTICE: created replication slot "mysub" on publisher CREATE SUBSCRIPTION -- note data is already here sub=# select * from myschema.table_1; id | name ----+-------- 1 | data1 2 | data2 3 | data3 4 | data4 5 | data5 6 | data6 7 | data7 8 | data8 9 | data9 10 | data10
back to pub – lets add 10 more rows
insert into myschema.table_1 values(generate_series(11,20),'data'||generate_series(11,20));
back to sub
-- Voila! we have data on the sub side... sub=# select * from myschema.table_1; id | name ----+-------- 1 | data1 2 | data2 3 | data3 4 | data4 5 | data5 6 | data6 7 | data7 8 | data8 9 | data9 10 | data10 11 | data11 12 | data12 13 | data13 14 | data14 15 | data15 16 | data16 17 | data17 18 | data18 19 | data19 20 | data20 (20 rows)
You can monitor and or troubleshoot replication from the publisher side with:
select * from pg_stat_replication;
Now insert a lot of data on the “pub” side and check the tablesize on the “sub” side – it will take a second or two to replicate.
Pub side insert:
delete from table_1; -- insert a couple million rows insert into myschema.table_1 values(generate_series(210000,2000000),'data'||generate_series(210000,2000000)); -- it will take a few seconds - you can check pg_stat_replication if they data is not propogating - and you should see a lag...
Sub side:
select count(*) from table_1; SELECT pg_size_pretty(pg_relation_size('table_1')); SELECT TO_CHAR(count(*), 'fm999,999,999') from table_1;
back to “pub”
-- delete vacuum and analyze delete from table_1; vacuum analyze table_1; -- or truncate is propogated from pub to sub as well ...
So which ever way you got rid of the pub data – other than drop table – pure DDL is not propagated on the sub side:
select count(*) from table_1; SELECT pg_size_pretty(pg_relation_size('table_1')); -- should show zero / 0 rows