How to migrate a RDS Postgresql database to an on premise with Pglogical
In this article, I'll explain how to migrate large databases from RDS to an on premise PostgreSQL with very low downtime.
🙌 I'll ask to do your own research regarding the settings I advised you to set and how to set under the Optional settings sections. While other settings are mandatory in order to get it working, those aren't mandatory but settings you should pay attention to. The way I recommend you to set those settings is the result of several researchs mainly on the PostgreSQL official documentation, postgresqlco.nf and my own experiences. It doesn't mean it is the best way, I only wish to give you a starting point.
Notes
- I've done this migration from a Postgres 10.6 on RDS to a Postgres 11 on premise (our own servers)
- The on premise server is a Debian 10 buster
- All our tables have a Primary Key.
Pglogical
doesn't support the replication of tables withoutPrimary Key
(See the native Postgresql's logical replication) - Our apps are Multi-Tenant which means we use multiple
schemas
in Postgresql - All configurations are tuned in order to replicate 9 databases between the two hosts
- ⚠️ It can't replicate to Postgresql 12 as the version
2.3.0
(which support Postgresql 12) and2.2.0
(RDS's pglogical version) can't work together. - ⚠️ This tutorial doesn't show how to replicate the DDL commands
CREATE TABLE
,ALTER TABLES
. If you wish to, take a look the Pglogical documentation - ⚠️ Pglogical can replicate sequences but not in real time. If you wish to configure a multimaster cluster, this tutorial won't enough, again take a look at Pglogical documentation
Tips - check your logs 👋
Check your logs ! They are your best friend and they provide a lot of useful informations. Wether the PostgreSQL isn't working or when some settings are wrongly set.
RDS
You can find them in the AWS console, to find them:
- Go to
RDS
>databases
>your_database
>logs & events
⚠️ By default the latest logs are on the last page, since it is filtered by name ASC and not DESC.
On premise
To find your log folder
-- psql connected as your admin user on any database
-- Show the directory where logs are stored
postgres=> SHOW log_directory;
-- If the value was relative,
-- it is relative to the data directory
-- to find it:
postgres=> SHOW data_directory;
💡 If the value is relative like
log
and not absolute like/var/log/postgresql
. It is relative to the data directory:{{ data_directory }}/log
For PostgreSQL 11 on **Debian 10 **you can also find the logs at /var/log/postgresql/postgresql-11-main.log
Requirements
In order to follow this tutorial you'll need:
- A RDS database hosted by AWS
- PostgreSQL version > 10
- A user with
rds_superuser
permission granted to him
- An on premise PostgreSQL
- PostgreSQL version > 10 & < 12
- A user with the
SUPERUSER
flag granted to him
RDS PostgreSQL (master)
We will setup first the RDS PostgreSQL as the publisher. We wish to transfer data from RDS to the on premise.
Parameter group
We need to ajust some settings in its parameter group in order to enable the logical replication.
Mandatory settings
- Set
rds.logical_replication
to 1. This will enable WAL logs at logical level. - In
shared_preload_libraries
addpglogical
to list of values.
Optional settings
those settings depends on how much tables you wish to replicate and your RDS instance in some cases, the default values are good, in anothers you should adjust them.
👋 Tip: Check your logs ! PostgreSQL will tell you if a parameter is wrongly set in the logs. In a case where I wished to replicate 9 databases between 2 hosts, I did set max_wal_senders to 15 and in RDS logs I found that PostgreSQL automatically rose it to 20 as it "wasn't enough / incompatible". See postgresqlco.nf for more informations regarding your variables
Specifies the maximum number of concurrent connections from standby servers or streaming base backup clients (i.e., the maximum number of simultaneously running WAL sender processes).
- Set it to twice as many replication process you expect, if you expect more than 5 replications otherwise left it to the default value.
- < 5 standby servers: Leave the default value 10
- Example: 9 databases: Set it to 18
Specifies the maximum number of replication slots (see streaming-replication-slots) that the server can support.
- Set it to twice as many replication process you expect, if you expect more than 5 replications otherwise left it to the default value.
- < 5 standby servers: Leave the default value 10
- Example: 9 databases: Set it to 18
Sets the maximum number of background processes that the system can support.
- Set it to the number of cores you have, if you have more than 8.
- <= 8 vCPU: Leave the default value 8
- Example: 32 vCPU: Set it to 32
Finally
- Reboot the database to take effects
Create the replication user
We'll create the role being ussed for the replication it self.
- Create a user, let's call it
rds_replication_user
_ - GRANT
rds_replication
andrds_superuser
to it - For each database you wish to replicate, GRANT its owner, the role which have access to the data to the rds_replication_user to have the desired rights on the database, all tables and all schemas.
-- Psql connected as the admin user (has rds_superuser) on any database
-- Create a user, let's call it *replication_user*
postgres=> CREATE ROLE rds_replication_user WITH LOGIN ENCRYPTED PASSWORD 'xxxx';
-- Grant permissions to replicate
postgres=> GRANT rds_replication TO rds_replication_user;
-- Needed for pglogical, see doc:
-- "Currently pglogical replication and administration requires superuser privileges. It may be later extended to more granular privileges."
postgres=> GRANT rds_superuser TO rds_replication_user;
-- Grant database's owner to replication_user to have the desired rights on the database, all tables and all schemas. Present and future.
postgres=> GRANT mydb_owner TO rds_replication_user;
Pglogical setup
Provider node
Connect to the database you wish to replicate with our rds_replication_user. We will setup Pglogical and the provider node, the node holds all the informations on how to use pglogical for a specific database.
-- Psql connected as replication_user on the database we wish to replicate
-- Create the extension
mydb=> CREATE EXTENSION pglogical;
mydb=> SELECT pglogical.create_node(node_name := 'mydb_provider',dsn := 'host=rds_host port=rds_port dbname=mydb user=rds_replication_user password=rds_user_password');
⚠️ By creating the Pglogical extension, a pglogical schema is created. Even if our mydb_owner role doesn't need to access it, it may cause permission issues, for example if you do a pg_dump. To fix it, we will give him the necessary permissions.
-- Psql connected as replication_user on the database we wish to replicate
# Can access the schema
mydb=> GRANT USAGE ON SCHEMA pglogical TO mydb_owner;
# Have permissions on the tables
mydb=> GRANT ALL ON ALL TABLES IN SCHEMA pglogical TO mydb_owner;
TODO Let's check everything is in order, this query should show us our recently create publication:
-- Psql connected as rds_replication_user on the database we wish to replicate
-- List all publications
mydb=> \dRp+
Add tables to the replication set
The replication set is the element which holds all tables and sequences to be replicate. One issue for Multi-Tenant database is that can only add tables of a schema one at a time. We will loop through all schemas to add them all at once.
We will call our replication set default
. This is chosen name, feel free to change it.
mydb=> DO $$
DECLARE
sch record; -- declare our variable for the 'for loop'
BEGIN
FOR sch IN SELECT nspname -- Query to fetch all schemas
FROM pg_catalog.pg_namespace
WHERE nspname NOT LIKE('pg_%') -- avoid system's schemas
AND nspname NOT IN ('information_schema','pglogical') -- Avoid information_schema and pglogical schemas
ORDER BY 1
LOOP
PERFORM pglogical.replication_set_add_all_tables('default', ARRAY[sch.nspname]); -- The command which adds the tables to the replication set
RAISE NOTICE 'Added % tables to replication set', sch.nspname; -- just a log to say what we did
END LOOP;
END; $$
-- DEBUG COMMAND
-- Shows all tables in replication set (which will be replicated)
mydb=> SELECT * FROM pglogical.replication_set_table;
Now RDS is ready to stream our data through logical replication. 🚀
On premise PostgreSQL (replica)
We will setup ou on premise PostgreSQL to stream data from RDS the master.
Install Pglogical
In order to use pglogical, we need to install it.
apt-get install postgresql-11-pglogical
postgresql.conf (For a fast initialization)
those settings depends on how much tables you wish to replicate and your RDS instance in some cases, the default values are good, in anothers you should adjust them.
Mandatory settings
It determines how much information is written to the WAL.We need to have enough informations in the WAL for the replication.
- Set it to
logical
.
Specifies the TCP/IP address(es) on which the server is to listen for connections from client applications.
- If you have
'localhost'
:- Set it to
'localhost,rds_host'
- Set it to
- If you have
'*'
:- Nothing to do 👍
Lists shared libraries to preload into server
- Add
'pglogical'
to the list of values- If you already have a value for example
'pg_stat_statement'
: Change it to'pg_stat_statement,pglogical'
- If you already have a value for example
Record commit time of transactions. It is needed for the last/first update wins conflict resolution.
- Set it to
on
Optional settings
👋 Tip: Check your logs ! Again PostgreSQL will tell you if a parameter is wrongly set in the logs.
⚠️ Please note: Those settings are tailored to initialize quickly your subscriber, if you wish to use the replicated server as a read replica for example, please read FINE TUNING AFTER INITIALIZATION at the end of the article.
Sets the maximum number of background processes that the system can support.
- Set it to the number of cores you have, if you have more than 8.
- <= 8 vCPU: Leave the default value 8
- Example: 32 vCPU: Set it to 32
max_logical_replication_workers
Specifies maximum number of logical replication workers. This includes both apply workers and table synchronization workers. Depends on the number of subscribers you have being replicated, consider setting it to (vCPU / 2) + 2
- Set it to (vCPU / 2) + 2 if you have more than 4 vCPU:
- Default: 4, good for 4 vCPU or less
- Example: 8 vCPU = 6
max_sync_workers_per_subscription
This parameter controls the amount of parallelism of the initial data copy during the subscription initialization or when new tables are added.
- Set it to vCPU / 2 (Must be lower than
max_logical_replication_workers
) if you have more than 4 vCPU- Default: 2, good for 4 vCPU or less
- Example: 8 vCPU = 4
Sets the maximum number of workers that the system can support for parallel operations.
- Set it to (vCPU / 2) - 2 if you have 16 vCPU or less
- Default value: 8 if have 16 or less cores (vCPU)
- Example: 33 vCPU = 16
Here's an example with a PostgreSQL with 56 vCPU and 64Go RAM:
# /etc/postgresql/12/main/postgresql.conf
# postgresql listen to RDS
listen_addresses = 'localhost,rds_host'
# We wish to use Logical replication
wal_level = logical
max_worker_processes=56
max_parallel_workers=26 # the number of cores
max_logical_replication_workers=30 # depends than number of subscribers (or replicas)
max_sync_workers_per_subscription=28 # depends on number of tables being replicated
pg_hba.conf
We need to authorize the RDS database to connect from its host with the premise_replication_user
, If it isn't already allowed. Here's a list of different entries from most restrictive to most permissive.
💡 Unlike physical replication, you don't need a line with
replication
as host. Only access to the database as a regular user.
# Only allow connection to the database_a from RDS for the role premise_replication_user with password
host database_a premise_replication_user rds_host md5
#### OR
# Allow connection from RDS for the role premise_replication_user with password
host all premise_replication_user rds_host md5
#### OR
# Allow remote connection with premise_replication_user role from everywhere with password.
host all premise_replication_user 0.0.0.0/0 md5
#### OR
# Allow remote connection with any role from everywhere with password.
host all all 0.0.0.0/0 md5
Restart postgresql for changes to take effect !
systemctl restart postgresql.service
Create the user
-- Psql connected as admin user on any databases
-- Doesn't have to match the user from RDS
-- REPLICATION: replicate data
-- SUPERUSER: Needed for pglogical, see doc:
-- "Currently pglogical replication and administration requires superuser privileges. It may be later extended to more granular privileges."
postgres=> CREATE ROLE premise_replication_user WITH REPLICATION SUPERUSER LOGIN ENCRYPTED PASSWORD 'xxxx';
-- Create a user to the future owner of the database
postgres=> CREATE ROLE mydb_owner WITH LOGIN ENCRYPTED PASSWORD 'xxxx'
-- Grant mydb_owner to premise_replication_user to have the desired rights on the database, all tables and all schemas. Present and future.
postgres=> GRANT mydb_owner TO premise_replication_user;
Create the database
-- Psql connected as admin user on any databases
postgres=> CREATE DATABASE mydb WITH OWNER mydb_owner;
Export the database schema from RDS to the on premise
Dump the RDS mydb's schema to our newly created mydb on our PostgreSQL on premise.
#!/bin/bash
# -s only the schema, not the data
# --no-owner & --no-privileges to not dump RDS's database owner and role permissions. Do not use it if you plan on using the very same roles between the two instances.
PGPASSWORD=rds_replication_user_password pg_dump -h rds_host -d mydb -U rds_replication_user --no-owner --no-privileges -s | PGPASSWORD=premise_mydb_owner_password psql -U premise_mydb_owner -h premise_host -d mydb
Pglogical setup
-- Psql Connected as replication_user on mydb
mydb=> CREATE EXTENSION pglogical;
-- Create the subscriber node
SELECT pglogical.create_node(node_name := 'mydb_subscriber',dsn := 'host=premise_host port=premise_port dbname=mydb user=premise_replication_user password=premise_user_password');
-- Start replication from replication set 'default'
SELECT pglogical.create_subscription(subscription_name := 'mydb_subscription', provider_dsn := 'host=rds_host port=rds_host dbname=mydb user=rds_replication_user password=rds_user_password', replication_sets := ARRAY['default']);
⚠️ Again with the Pglogical extension, a pglogical schema is created. Even if our mydb_owner role doesn't need to access it, it may cause permission issues, for example if you do a pg_dump. To fix it, we will give him the necessary permissions.
-- Psql connected as replication_user on the database we wish to replicate
# Can access the schema
mydb=> GRANT USAGE ON SCHEMA pglogical TO mydb_owner;
# Have permissions on the tables
mydb=> GRANT ALL ON ALL TABLES IN SCHEMA pglogical TO mydb_owner;
Let's check everything is working as expected:
-- Psql connected as premise_replication_user on mydb
-- List of subscription on the current database
mydb=> \dRs+
Wait until the initialization phase is over
We are currently replicating the tables' data and only that. We're not replicating the sequences (UNIQUE ID generators) nor the DDL commands (CREATE TABLE
, CREATE SCHEMA
). We need to wait the initialization to be over first.
Let's check our subscription status.
-- Psql connected on the replica (on premise here) as the replication user on your database
-- Shows the subscription status
mydb=> SELECT subscription_name, status, provider_node FROM pglogical.show_subscription_status();
Check the status
column. There are 3 main values:
initialization
is the first step. During which the replica isn't "ready" yet.replicating
The replica is ready and is streaming data. (You monitor the lag from the master, see Monitoring section down the article)down
an error occured and your replication isn't working, check the logs to discover what's the issue.
⚠️ Important: we must wait for the initialization phase to be over, once it'll be replicating
we will be ready to move to the next phase.
You have two ways to know when it'll be good:
- Check the replicated database's size regulary, it will give you an idea of how much time it will take.
- You can run the query below, it will wait until the subscription becomes
replicating
⚠️ pglogical >= 2.2.2 only. - Check your subscription status on a regular basis.
-- Psql connected on the replica (on premise here) as the replication user on your database
-- pglogical >= 2.2.2 only
-- Will wait until the initialization is over
mydb=> SELECT pglogical.wait_for_subscription_sync_complete('mydb_subscription');
(Optional) Adjust our settings postgresql.conf
Our PostgreSQL is now streaming from the RDS as it did replicate it successfully. Our previous settings were meant for a fast replication. We can raise our max_parallel_workers
setting to improve PostgreSQL's performance.
👋 Tip: Check your logs ! Again PostgreSQL will tell you if a parameter is wrongly set in the logs.
- No need to change it 👍
max_logical_replication_workers
- Should be greater than your number of subscriptions and even:
- Default: 4, good if you have less than 4 subscriptions
- Example: 8 subscriptions = 10
max_sync_workers_per_subscription
- Set it to max_logical_replication_workers / 2 rounded to lowest even number
- Default: 2, good for
max_logical_replication_workers=4
- Example:
max_logical_replication_workers=10
= 4
- Default: 2, good for
- Set it to
max_worker_processes - max_logical_replication_workers
.- Default value: 8, good if
max_worker_processes - max_logical_replication_workers
<= 8 - Example:
max_worker_processes - max_logical_replication_workers =
- Default value: 8, good if
Here's the same example with a PostgreSQL with 56 vCPU and 64Go RAM and 6 subscriptions:
# /etc/postgresql/12/main/postgresql.conf
# [NO CHANGES] postgresql listen to RDS
listen_addresses = 'localhost,rds_host'
# [NO CHANGES] We wish to use Logical replication
wal_level = logical
# [NO CHANGES] Still the number of cores
max_worker_processes=56
# All workers left by max_logical_replication_workers
max_parallel_workers=48
# depends than number of subscribers (or replicas)
max_logical_replication_workers=8
# depends on number of tables being replicated
max_sync_workers_per_subscription=4
Restart postgresql
systemctl restart postgresql.service
Do the migration
This is the final part, in which we'll shut down our apps, change their database, migrate the sequences and finally turning our apps back on.
1. Shutdown your apps
In order to avoid:
- losing data because there will be a small gap of time when switching our apps and replciating the sequences.
- Cause an issue when migrating our sequences. Pglogical doesn't replicate sequences in real time.
2. Wait for all data to be replicated
As now our apps are down and cannot write, we wish to wait until all changes are replicated, it is only a matter of 1 - 2 seconds.
2.1 Pglogical >= 2.2.2
On the replica, the query will finish when the synchronization is complete.
-- Psql connected on the replica (on premise here) as the replication user on your database
-- pglogical >= 2.2.2 only
-- Will wait until the initialization is over
mydb=> SELECT pglogical.wait_for_subscription_sync_complete('mydb_subscription');
2.2 Pglogical < 2.2.2
On the master, wait until the several lags are 0
.
-- Psql connected on the master (on premise here) as the replication user on your database
-- See the different kind of lags for each subscription connected to it. 0 means it has been fully replicated, no lag
postgres=> SELECT pid, application_name, client_addr, state, sync_state, pg_wal_lsn_diff(sent_lsn, write_lsn) AS write_lag, pg_wal_lsn_diff(sent_lsn, flush_lsn) AS flush_lag, pg_wal_lsn_diff(sent_lsn, replay_lsn) AS replay_lag FROM pg_stat_replication;
3. Migrate the sequences from the master (RDS)
We will pglogical to migrate our sequences' exact values to our replica, our on premise PostgreSQL.
⚠️ Pglogical can replicate sequences over time. But it won't be the exact same values, it will increase them to try avoiding conflict ! Only if we migrate them at one time only, we will have the very same values for our sequences.
-- Psql connected on the master as replication user on mydb
mydb=> DO $$
DECLARE
sch record; -- declare our variable for the 'for loop'
BEGIN
FOR sch IN SELECT nspname -- Query to fetch all schemas
FROM pg_catalog.pg_namespace
WHERE nspname NOT LIKE('pg_%') -- avoid system's schemas
AND nspname NOT IN ('information_schema','pglogical') -- Avoid information_schema and pglogical schemas
ORDER BY 1
LOOP
-- The command which adds the sequences to the replication set
PERFORM pglogical.replication_set_add_all_sequences('default', ARRAY[sch.nspname], true); -- true here tells pglogical to replicate sequences NOW.
RAISE NOTICE 'Added % sequences to replication set', sch.nspname; -- just a log to say what we did
END LOOP;
END; $$
:bulb: It is a script for multi tenants apps (multi private schemas). It'll work for single schema app but this is why we're doing a loop.
You can check on your replica if the sequences have been migrated correctly. We will get the last_value
from a sequence and if it is > 0 then everything went fine !
-- Psql connected on the replica as mydb_owner on mydb
-- Only if you have private schemas / Tenants
mydb=> set search_path = 'a_private_schema', 'public';
-- List all sequences
mydb=> \ds
-- Inspect a sequence's last_value
mydb=> SELECT last_value FROM a_sequence;
4. Switch our app's database
Change your apps' database from RDS to the on premise PostgreSQL.
5. We've successfully migrated our databases ! :tada:
Turn back on your applications. We've successfully migrated our databases from RDS to the on premise PostgreSQL.
Monitoring
Here are several commands, that can help you monitor your replications
On master, RDS
-- On any database - See current replication tasks
postgres=> SELECT * FROM pg_stat_replication;
-- See the different kind of lags for each subscription connected to it. 0 means it has been fully replicated, no lag
postgres=> SELECT pid, application_name, client_addr, state, sync_state, pg_wal_lsn_diff(sent_lsn, write_lsn) AS write_lag, pg_wal_lsn_diff(sent_lsn, flush_lsn) AS flush_lag, pg_wal_lsn_diff(sent_lsn, replay_lsn) AS replay_lag FROM pg_stat_replication;
-- See all replications
postgres=> SELECT * FROM pg_replication_slots;
On the replica, on premise
-- On any database - See current subscription tasks and their status and details (no data regarding the lag)
postgres=> SELECT * FROM pg_stat_subscription;
-- See databases' size
postgres=> SELECT datname, pg_size_pretty(pg_database_size(datname)) FROM pg_database WHERE datname = 'mydb' ORDER BY datname DESC;
-- Show the pglogical subscription status
postgres=> SELECT * FROM pglogical.show_subscription_status();
-- nearly same as above
postgres=> \l+ mydb
Cleanup
Now that the migration have a success we can cleanup pglogical
.
On premise (old replica)
-- On your replica (On premise Postgresql)
-- Do not subscribe anymore
mydb=> SELECT pglogical.drop_subscription('mydb_subscription');
-- Delete the subscriber node
mydb=> SELECT pglogical.drop_node(node_name := 'mydb_subscriber');
-- mydb=> DROP event TRIGGER IF EXISTS pglogical_assign_repset_trg;
-- Drop the extension
mydb=> DROP EXTENSION pglogical;
-- Drop the schema created by pglogical
mydb=> DROP SCHEMA pglogical;
On RDS (old master)
-- On your master (RDS)
-- Delete the provider node
mydb=> SELECT pglogical.drop_node(node_name := 'mydb_provider');
-- Drop the trigger if it exists, which replicates all DDL commands (CREATE TABLE, ALTER TABLE, etc.)
mydb=> DROP event TRIGGER IF EXISTS pglogical_assign_repset_trg;
-- Drop the extension
mydb=> DROP EXTENSION pglogical;
-- Drop the schema created by pglogical
mydb=> DROP SCHEMA pglogical;
Resources
Here's a list of articles and documentations that helped me write this article.
- Major-version upgrading with minimal downtime
- Pglogical's github page
- How To Configure pglogical | streaming replication for PostgreSQL
- pglogical replication
- pglogical installation and configuration of logical streaming replication using pglogical
- PostgreSQL – logical replication with pglogical
- Major-version upgrading with minimal downtime