PostgreSQL/PostDock: Auto recovery failed in master node
I use Docker service and Docker swarm to deploy the PostDock cluster
This is my docker-compose.yml setup:
version: "3.3"
networks:
postdock:
external: true
services:
pgmaster:
image: postdock/postgres
environment:
PARTNER_NODES: "pgmaster,pgslave1"
CLUSTER_NODE_NETWORK_NAME: pgmaster
NODE_PRIORITY: 100
NODE_ID: 1
NODE_NAME: pgmaster
POSTGRES_PASSWORD: 123
POSTGRES_USER: postgres
POSTGRES_DB: postgres
CONFIGS: "listen_addresses:'*'"
CLUSTER_NAME: pg_cluster
REPLICATION_DB: replication_db
REPLICATION_USER: replication_user
REPLICATION_PASSWORD: replication_pass
ports:
- 4000:5432
volumes:
- /data/master_slave:/var/lib/postgresql/data
networks:
- postdock
deploy:
placement:
constraints:
- node.role == manager
- node.hostname == 192.168.1.161
pgslave1:
image: postdock/postgres
environment:
PARTNER_NODES: "pgmaster,pgslave1"
REPLICATION_PRIMARY_HOST: pgmaster
NODE_ID: 2
NODE_NAME: pgslave1
CLUSTER_NODE_NETWORK_NAME: pgslave1
REPLICATION_PRIMARY_PORT: 5432
CONFIGS: "max_replication_slots:10"
ports:
- 4001:5432
volumes:
- /data/slave_1:/var/lib/postgresql/data
networks:
- postdock
deploy:
placement:
constraints:
- node.role == manager
- node.hostname == 192.168.1.161
pgslave2:
image: postdock/postgres
environment:
PARTNER_NODES: "pgmaster,pgslave1,pgslave2"
REPLICATION_PRIMARY_HOST: pgmaster
NODE_ID: 3
NODE_NAME: pgslave2
CLUSTER_NODE_NETWORK_NAME: pgslave2
REPLICATION_PRIMARY_PORT: 5432
CONFIGS: "max_replication_slots:10"
ports:
- 4002:5432
volumes:
- /data/slave_2:/var/lib/postgresql/data
networks:
- postdock
deploy:
placement:
constraints:
- node.role == manager
- node.hostname == 192.168.1.161
db:
image: postdock/pgpool
environment:
PCP_USER: pcp_user
PCP_PASSWORD: pcp_pass
WAIT_BACKEND_TIMEOUT: 60
CHECK_USER: postgres
CHECK_PASSWORD: 123
CHECK_PGCONNECT_TIMEOUT: 3
DB_USERS: postgres:123
BACKENDS: "0:pgmaster:5432:1:/var/lib/postgresql/data:ALLOW_TO_FAILOVER,1:pgslave1::::,2:pgslave2::::,"
REQUIRE_MIN_BACKENDS: 1
CONFIGS: "num_init_children:250,max_pool:4"
ports:
- 4003:5432
- 9899:9898
networks:
- postdock
deploy:
placement:
constraints:
- node.role == manager
- node.hostname == 192.168.1.161
I run:
docker network create -d overlay postdock
docker stack deploy -c docker-compose.yml postdock
and things work smoothly.
However, after I update the services multiple times, then the auto failover on master node failed. In the master node log file, I noticed that the recovery process cannot detect the database replication_db and the schema replication_db.public:
>>> Waiting for local postgres server start...,
expr: non-integer argument,
>>> Wait schema . on pgmaster:5432(user: public,password: *******), will try times with delay 10 seconds (TIMEOUT=)
As you can see, there is no schema specified, only the dot sign ".", and the user is also wrong: It should be replication_user, not the user public
which leads to this error message:
2018-11-16 04:45:33.310 UTC [122] FATAL: password authentication failed for user "public",
2018-11-16 04:45:33.310 UTC [122] DETAIL: Role "public" does not exist.,
Connection matched pg_hba.conf line 95: "host all all all md5",
psql: FATAL: password authentication failed for user "public",
2018-11-16 04:45:37.974 UTC [125] FATAL: no PostgreSQL user name specified in startup packet,
2018-11-16 04:45:39.345 UTC [127] FATAL: no PostgreSQL user name specified in startup packet,
2018-11-16 04:45:40.374 UTC [128] FATAL: no PostgreSQL user name specified in startup packet,
2018-11-16 04:45:41.386 UTC [129] FATAL: no PostgreSQL user name specified in startup packet,
2018-11-16 04:45:42.421 UTC [130] FATAL: no PostgreSQL user name specified in startup packet,
>>>>>> Host pgmaster:5432 is not accessible (will try times more),
expr: non-integer argument,
As I understand, when the auto failover succeeds, the expected recovery log should be:
>>> Waiting for local postgres server start...,
>>> Wait schema replication_db.public on pgmaster:5432(user: replication_user,password: *******), will try 9 times with delay 10 seconds (TIMEOUT=90),
>>>>>> Schema replication_db.public exists on host pgmaster:5432!,
>>> Registering node with role master
Anybody have any idea about the root cause of this issue ?
postgresql pgpool automatic-failover repmgr
add a comment |
I use Docker service and Docker swarm to deploy the PostDock cluster
This is my docker-compose.yml setup:
version: "3.3"
networks:
postdock:
external: true
services:
pgmaster:
image: postdock/postgres
environment:
PARTNER_NODES: "pgmaster,pgslave1"
CLUSTER_NODE_NETWORK_NAME: pgmaster
NODE_PRIORITY: 100
NODE_ID: 1
NODE_NAME: pgmaster
POSTGRES_PASSWORD: 123
POSTGRES_USER: postgres
POSTGRES_DB: postgres
CONFIGS: "listen_addresses:'*'"
CLUSTER_NAME: pg_cluster
REPLICATION_DB: replication_db
REPLICATION_USER: replication_user
REPLICATION_PASSWORD: replication_pass
ports:
- 4000:5432
volumes:
- /data/master_slave:/var/lib/postgresql/data
networks:
- postdock
deploy:
placement:
constraints:
- node.role == manager
- node.hostname == 192.168.1.161
pgslave1:
image: postdock/postgres
environment:
PARTNER_NODES: "pgmaster,pgslave1"
REPLICATION_PRIMARY_HOST: pgmaster
NODE_ID: 2
NODE_NAME: pgslave1
CLUSTER_NODE_NETWORK_NAME: pgslave1
REPLICATION_PRIMARY_PORT: 5432
CONFIGS: "max_replication_slots:10"
ports:
- 4001:5432
volumes:
- /data/slave_1:/var/lib/postgresql/data
networks:
- postdock
deploy:
placement:
constraints:
- node.role == manager
- node.hostname == 192.168.1.161
pgslave2:
image: postdock/postgres
environment:
PARTNER_NODES: "pgmaster,pgslave1,pgslave2"
REPLICATION_PRIMARY_HOST: pgmaster
NODE_ID: 3
NODE_NAME: pgslave2
CLUSTER_NODE_NETWORK_NAME: pgslave2
REPLICATION_PRIMARY_PORT: 5432
CONFIGS: "max_replication_slots:10"
ports:
- 4002:5432
volumes:
- /data/slave_2:/var/lib/postgresql/data
networks:
- postdock
deploy:
placement:
constraints:
- node.role == manager
- node.hostname == 192.168.1.161
db:
image: postdock/pgpool
environment:
PCP_USER: pcp_user
PCP_PASSWORD: pcp_pass
WAIT_BACKEND_TIMEOUT: 60
CHECK_USER: postgres
CHECK_PASSWORD: 123
CHECK_PGCONNECT_TIMEOUT: 3
DB_USERS: postgres:123
BACKENDS: "0:pgmaster:5432:1:/var/lib/postgresql/data:ALLOW_TO_FAILOVER,1:pgslave1::::,2:pgslave2::::,"
REQUIRE_MIN_BACKENDS: 1
CONFIGS: "num_init_children:250,max_pool:4"
ports:
- 4003:5432
- 9899:9898
networks:
- postdock
deploy:
placement:
constraints:
- node.role == manager
- node.hostname == 192.168.1.161
I run:
docker network create -d overlay postdock
docker stack deploy -c docker-compose.yml postdock
and things work smoothly.
However, after I update the services multiple times, then the auto failover on master node failed. In the master node log file, I noticed that the recovery process cannot detect the database replication_db and the schema replication_db.public:
>>> Waiting for local postgres server start...,
expr: non-integer argument,
>>> Wait schema . on pgmaster:5432(user: public,password: *******), will try times with delay 10 seconds (TIMEOUT=)
As you can see, there is no schema specified, only the dot sign ".", and the user is also wrong: It should be replication_user, not the user public
which leads to this error message:
2018-11-16 04:45:33.310 UTC [122] FATAL: password authentication failed for user "public",
2018-11-16 04:45:33.310 UTC [122] DETAIL: Role "public" does not exist.,
Connection matched pg_hba.conf line 95: "host all all all md5",
psql: FATAL: password authentication failed for user "public",
2018-11-16 04:45:37.974 UTC [125] FATAL: no PostgreSQL user name specified in startup packet,
2018-11-16 04:45:39.345 UTC [127] FATAL: no PostgreSQL user name specified in startup packet,
2018-11-16 04:45:40.374 UTC [128] FATAL: no PostgreSQL user name specified in startup packet,
2018-11-16 04:45:41.386 UTC [129] FATAL: no PostgreSQL user name specified in startup packet,
2018-11-16 04:45:42.421 UTC [130] FATAL: no PostgreSQL user name specified in startup packet,
>>>>>> Host pgmaster:5432 is not accessible (will try times more),
expr: non-integer argument,
As I understand, when the auto failover succeeds, the expected recovery log should be:
>>> Waiting for local postgres server start...,
>>> Wait schema replication_db.public on pgmaster:5432(user: replication_user,password: *******), will try 9 times with delay 10 seconds (TIMEOUT=90),
>>>>>> Schema replication_db.public exists on host pgmaster:5432!,
>>> Registering node with role master
Anybody have any idea about the root cause of this issue ?
postgresql pgpool automatic-failover repmgr
add a comment |
I use Docker service and Docker swarm to deploy the PostDock cluster
This is my docker-compose.yml setup:
version: "3.3"
networks:
postdock:
external: true
services:
pgmaster:
image: postdock/postgres
environment:
PARTNER_NODES: "pgmaster,pgslave1"
CLUSTER_NODE_NETWORK_NAME: pgmaster
NODE_PRIORITY: 100
NODE_ID: 1
NODE_NAME: pgmaster
POSTGRES_PASSWORD: 123
POSTGRES_USER: postgres
POSTGRES_DB: postgres
CONFIGS: "listen_addresses:'*'"
CLUSTER_NAME: pg_cluster
REPLICATION_DB: replication_db
REPLICATION_USER: replication_user
REPLICATION_PASSWORD: replication_pass
ports:
- 4000:5432
volumes:
- /data/master_slave:/var/lib/postgresql/data
networks:
- postdock
deploy:
placement:
constraints:
- node.role == manager
- node.hostname == 192.168.1.161
pgslave1:
image: postdock/postgres
environment:
PARTNER_NODES: "pgmaster,pgslave1"
REPLICATION_PRIMARY_HOST: pgmaster
NODE_ID: 2
NODE_NAME: pgslave1
CLUSTER_NODE_NETWORK_NAME: pgslave1
REPLICATION_PRIMARY_PORT: 5432
CONFIGS: "max_replication_slots:10"
ports:
- 4001:5432
volumes:
- /data/slave_1:/var/lib/postgresql/data
networks:
- postdock
deploy:
placement:
constraints:
- node.role == manager
- node.hostname == 192.168.1.161
pgslave2:
image: postdock/postgres
environment:
PARTNER_NODES: "pgmaster,pgslave1,pgslave2"
REPLICATION_PRIMARY_HOST: pgmaster
NODE_ID: 3
NODE_NAME: pgslave2
CLUSTER_NODE_NETWORK_NAME: pgslave2
REPLICATION_PRIMARY_PORT: 5432
CONFIGS: "max_replication_slots:10"
ports:
- 4002:5432
volumes:
- /data/slave_2:/var/lib/postgresql/data
networks:
- postdock
deploy:
placement:
constraints:
- node.role == manager
- node.hostname == 192.168.1.161
db:
image: postdock/pgpool
environment:
PCP_USER: pcp_user
PCP_PASSWORD: pcp_pass
WAIT_BACKEND_TIMEOUT: 60
CHECK_USER: postgres
CHECK_PASSWORD: 123
CHECK_PGCONNECT_TIMEOUT: 3
DB_USERS: postgres:123
BACKENDS: "0:pgmaster:5432:1:/var/lib/postgresql/data:ALLOW_TO_FAILOVER,1:pgslave1::::,2:pgslave2::::,"
REQUIRE_MIN_BACKENDS: 1
CONFIGS: "num_init_children:250,max_pool:4"
ports:
- 4003:5432
- 9899:9898
networks:
- postdock
deploy:
placement:
constraints:
- node.role == manager
- node.hostname == 192.168.1.161
I run:
docker network create -d overlay postdock
docker stack deploy -c docker-compose.yml postdock
and things work smoothly.
However, after I update the services multiple times, then the auto failover on master node failed. In the master node log file, I noticed that the recovery process cannot detect the database replication_db and the schema replication_db.public:
>>> Waiting for local postgres server start...,
expr: non-integer argument,
>>> Wait schema . on pgmaster:5432(user: public,password: *******), will try times with delay 10 seconds (TIMEOUT=)
As you can see, there is no schema specified, only the dot sign ".", and the user is also wrong: It should be replication_user, not the user public
which leads to this error message:
2018-11-16 04:45:33.310 UTC [122] FATAL: password authentication failed for user "public",
2018-11-16 04:45:33.310 UTC [122] DETAIL: Role "public" does not exist.,
Connection matched pg_hba.conf line 95: "host all all all md5",
psql: FATAL: password authentication failed for user "public",
2018-11-16 04:45:37.974 UTC [125] FATAL: no PostgreSQL user name specified in startup packet,
2018-11-16 04:45:39.345 UTC [127] FATAL: no PostgreSQL user name specified in startup packet,
2018-11-16 04:45:40.374 UTC [128] FATAL: no PostgreSQL user name specified in startup packet,
2018-11-16 04:45:41.386 UTC [129] FATAL: no PostgreSQL user name specified in startup packet,
2018-11-16 04:45:42.421 UTC [130] FATAL: no PostgreSQL user name specified in startup packet,
>>>>>> Host pgmaster:5432 is not accessible (will try times more),
expr: non-integer argument,
As I understand, when the auto failover succeeds, the expected recovery log should be:
>>> Waiting for local postgres server start...,
>>> Wait schema replication_db.public on pgmaster:5432(user: replication_user,password: *******), will try 9 times with delay 10 seconds (TIMEOUT=90),
>>>>>> Schema replication_db.public exists on host pgmaster:5432!,
>>> Registering node with role master
Anybody have any idea about the root cause of this issue ?
postgresql pgpool automatic-failover repmgr
I use Docker service and Docker swarm to deploy the PostDock cluster
This is my docker-compose.yml setup:
version: "3.3"
networks:
postdock:
external: true
services:
pgmaster:
image: postdock/postgres
environment:
PARTNER_NODES: "pgmaster,pgslave1"
CLUSTER_NODE_NETWORK_NAME: pgmaster
NODE_PRIORITY: 100
NODE_ID: 1
NODE_NAME: pgmaster
POSTGRES_PASSWORD: 123
POSTGRES_USER: postgres
POSTGRES_DB: postgres
CONFIGS: "listen_addresses:'*'"
CLUSTER_NAME: pg_cluster
REPLICATION_DB: replication_db
REPLICATION_USER: replication_user
REPLICATION_PASSWORD: replication_pass
ports:
- 4000:5432
volumes:
- /data/master_slave:/var/lib/postgresql/data
networks:
- postdock
deploy:
placement:
constraints:
- node.role == manager
- node.hostname == 192.168.1.161
pgslave1:
image: postdock/postgres
environment:
PARTNER_NODES: "pgmaster,pgslave1"
REPLICATION_PRIMARY_HOST: pgmaster
NODE_ID: 2
NODE_NAME: pgslave1
CLUSTER_NODE_NETWORK_NAME: pgslave1
REPLICATION_PRIMARY_PORT: 5432
CONFIGS: "max_replication_slots:10"
ports:
- 4001:5432
volumes:
- /data/slave_1:/var/lib/postgresql/data
networks:
- postdock
deploy:
placement:
constraints:
- node.role == manager
- node.hostname == 192.168.1.161
pgslave2:
image: postdock/postgres
environment:
PARTNER_NODES: "pgmaster,pgslave1,pgslave2"
REPLICATION_PRIMARY_HOST: pgmaster
NODE_ID: 3
NODE_NAME: pgslave2
CLUSTER_NODE_NETWORK_NAME: pgslave2
REPLICATION_PRIMARY_PORT: 5432
CONFIGS: "max_replication_slots:10"
ports:
- 4002:5432
volumes:
- /data/slave_2:/var/lib/postgresql/data
networks:
- postdock
deploy:
placement:
constraints:
- node.role == manager
- node.hostname == 192.168.1.161
db:
image: postdock/pgpool
environment:
PCP_USER: pcp_user
PCP_PASSWORD: pcp_pass
WAIT_BACKEND_TIMEOUT: 60
CHECK_USER: postgres
CHECK_PASSWORD: 123
CHECK_PGCONNECT_TIMEOUT: 3
DB_USERS: postgres:123
BACKENDS: "0:pgmaster:5432:1:/var/lib/postgresql/data:ALLOW_TO_FAILOVER,1:pgslave1::::,2:pgslave2::::,"
REQUIRE_MIN_BACKENDS: 1
CONFIGS: "num_init_children:250,max_pool:4"
ports:
- 4003:5432
- 9899:9898
networks:
- postdock
deploy:
placement:
constraints:
- node.role == manager
- node.hostname == 192.168.1.161
I run:
docker network create -d overlay postdock
docker stack deploy -c docker-compose.yml postdock
and things work smoothly.
However, after I update the services multiple times, then the auto failover on master node failed. In the master node log file, I noticed that the recovery process cannot detect the database replication_db and the schema replication_db.public:
>>> Waiting for local postgres server start...,
expr: non-integer argument,
>>> Wait schema . on pgmaster:5432(user: public,password: *******), will try times with delay 10 seconds (TIMEOUT=)
As you can see, there is no schema specified, only the dot sign ".", and the user is also wrong: It should be replication_user, not the user public
which leads to this error message:
2018-11-16 04:45:33.310 UTC [122] FATAL: password authentication failed for user "public",
2018-11-16 04:45:33.310 UTC [122] DETAIL: Role "public" does not exist.,
Connection matched pg_hba.conf line 95: "host all all all md5",
psql: FATAL: password authentication failed for user "public",
2018-11-16 04:45:37.974 UTC [125] FATAL: no PostgreSQL user name specified in startup packet,
2018-11-16 04:45:39.345 UTC [127] FATAL: no PostgreSQL user name specified in startup packet,
2018-11-16 04:45:40.374 UTC [128] FATAL: no PostgreSQL user name specified in startup packet,
2018-11-16 04:45:41.386 UTC [129] FATAL: no PostgreSQL user name specified in startup packet,
2018-11-16 04:45:42.421 UTC [130] FATAL: no PostgreSQL user name specified in startup packet,
>>>>>> Host pgmaster:5432 is not accessible (will try times more),
expr: non-integer argument,
As I understand, when the auto failover succeeds, the expected recovery log should be:
>>> Waiting for local postgres server start...,
>>> Wait schema replication_db.public on pgmaster:5432(user: replication_user,password: *******), will try 9 times with delay 10 seconds (TIMEOUT=90),
>>>>>> Schema replication_db.public exists on host pgmaster:5432!,
>>> Registering node with role master
Anybody have any idea about the root cause of this issue ?
postgresql pgpool automatic-failover repmgr
postgresql pgpool automatic-failover repmgr
edited Nov 16 '18 at 11:00
asked Nov 16 '18 at 10:53
moeman
112
112
add a comment |
add a comment |
active
oldest
votes
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53336403%2fpostgresql-postdock-auto-recovery-failed-in-master-node%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
active
oldest
votes
active
oldest
votes
active
oldest
votes
active
oldest
votes
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Some of your past answers have not been well-received, and you're in danger of being blocked from answering.
Please pay close attention to the following guidance:
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53336403%2fpostgresql-postdock-auto-recovery-failed-in-master-node%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown