PostgreSQL/PostDock: Auto recovery failed in master node

I use Docker service and Docker swarm to deploy the PostDock cluster

This is my docker-compose.yml setup:

version: "3.3"

networks:

  postdock:

    external: true



services:      

  pgmaster:

    image: postdock/postgres

    environment:

      PARTNER_NODES: "pgmaster,pgslave1"

      CLUSTER_NODE_NETWORK_NAME: pgmaster

      NODE_PRIORITY: 100  

      NODE_ID: 1

      NODE_NAME: pgmaster

      POSTGRES_PASSWORD: 123

      POSTGRES_USER: postgres

      POSTGRES_DB: postgres

      CONFIGS: "listen_addresses:'*'"

      CLUSTER_NAME: pg_cluster 

      REPLICATION_DB: replication_db

      REPLICATION_USER: replication_user 

      REPLICATION_PASSWORD: replication_pass 

    ports:

      - 4000:5432

    volumes:

      - /data/master_slave:/var/lib/postgresql/data

    networks:

      - postdock

    deploy:

      placement:

        constraints:

          - node.role == manager

          - node.hostname == 192.168.1.161



  pgslave1:

    image: postdock/postgres

    environment:

      PARTNER_NODES: "pgmaster,pgslave1"

      REPLICATION_PRIMARY_HOST: pgmaster

      NODE_ID: 2

      NODE_NAME: pgslave1

      CLUSTER_NODE_NETWORK_NAME: pgslave1 

      REPLICATION_PRIMARY_PORT: 5432

      CONFIGS: "max_replication_slots:10"

    ports:

      - 4001:5432

    volumes:

      - /data/slave_1:/var/lib/postgresql/data

    networks:

      - postdock

    deploy:

      placement:

        constraints:

          - node.role == manager

          - node.hostname == 192.168.1.161



  pgslave2:

    image: postdock/postgres

    environment:

      PARTNER_NODES: "pgmaster,pgslave1,pgslave2"

      REPLICATION_PRIMARY_HOST: pgmaster

      NODE_ID: 3

      NODE_NAME: pgslave2

      CLUSTER_NODE_NETWORK_NAME: pgslave2 

      REPLICATION_PRIMARY_PORT: 5432

      CONFIGS: "max_replication_slots:10"

    ports:

      - 4002:5432

    volumes:

      - /data/slave_2:/var/lib/postgresql/data

    networks:

      - postdock

    deploy:

      placement:

        constraints:

          - node.role == manager

          - node.hostname == 192.168.1.161



  db:

    image: postdock/pgpool

    environment:

      PCP_USER: pcp_user

      PCP_PASSWORD: pcp_pass

      WAIT_BACKEND_TIMEOUT: 60

      CHECK_USER: postgres

      CHECK_PASSWORD: 123

      CHECK_PGCONNECT_TIMEOUT: 3 

      DB_USERS: postgres:123

      BACKENDS: "0:pgmaster:5432:1:/var/lib/postgresql/data:ALLOW_TO_FAILOVER,1:pgslave1::::,2:pgslave2::::," 

      REQUIRE_MIN_BACKENDS: 1 

      CONFIGS: "num_init_children:250,max_pool:4"

    ports:

      - 4003:5432

      - 9899:9898

    networks:

      - postdock

    deploy:

      placement:

        constraints:

          - node.role == manager

          - node.hostname == 192.168.1.161

I run:

docker network create -d overlay postdock



docker stack deploy -c docker-compose.yml postdock

and things work smoothly.

However, after I update the services multiple times, then the auto failover on master node failed. In the master node log file, I noticed that the recovery process cannot detect the database replication_db and the schema replication_db.public:

>>> Waiting for local postgres server start...,

expr: non-integer argument,

>>> Wait schema . on pgmaster:5432(user: public,password: *******), will try  times with delay 10 seconds (TIMEOUT=)

As you can see, there is no schema specified, only the dot sign ".", and the user is also wrong: It should be replication_user, not the user public

which leads to this error message:

2018-11-16 04:45:33.310 UTC [122] FATAL:  password authentication failed for user "public",

2018-11-16 04:45:33.310 UTC [122] DETAIL:  Role "public" does not exist.,

    Connection matched pg_hba.conf line 95: "host all all all md5",

psql: FATAL:  password authentication failed for user "public",

2018-11-16 04:45:37.974 UTC [125] FATAL:  no PostgreSQL user name specified in startup packet,

2018-11-16 04:45:39.345 UTC [127] FATAL:  no PostgreSQL user name specified in startup packet,

2018-11-16 04:45:40.374 UTC [128] FATAL:  no PostgreSQL user name specified in startup packet,

2018-11-16 04:45:41.386 UTC [129] FATAL:  no PostgreSQL user name specified in startup packet,

2018-11-16 04:45:42.421 UTC [130] FATAL:  no PostgreSQL user name specified in startup packet,

>>>>>> Host pgmaster:5432 is not accessible (will try  times more),

expr: non-integer argument,

As I understand, when the auto failover succeeds, the expected recovery log should be:

>>> Waiting for local postgres server start...,

>>> Wait schema replication_db.public on pgmaster:5432(user: replication_user,password: *******), will try 9 times with delay 10 seconds (TIMEOUT=90),

>>>>>> Schema replication_db.public exists on host pgmaster:5432!,

>>> Registering node with role master

Anybody have any idea about the root cause of this issue ?

edited Nov 16 '18 at 11:00

asked Nov 16 '18 at 10:53

moeman

112

add a comment |

I use Docker service and Docker swarm to deploy the PostDock cluster

This is my docker-compose.yml setup:

version: "3.3"

networks:

  postdock:

    external: true



services:      

  pgmaster:

    image: postdock/postgres

    environment:

      PARTNER_NODES: "pgmaster,pgslave1"

      CLUSTER_NODE_NETWORK_NAME: pgmaster

      NODE_PRIORITY: 100  

      NODE_ID: 1

      NODE_NAME: pgmaster

      POSTGRES_PASSWORD: 123

      POSTGRES_USER: postgres

      POSTGRES_DB: postgres

      CONFIGS: "listen_addresses:'*'"

      CLUSTER_NAME: pg_cluster 

      REPLICATION_DB: replication_db

      REPLICATION_USER: replication_user 

      REPLICATION_PASSWORD: replication_pass 

    ports:

      - 4000:5432

    volumes:

      - /data/master_slave:/var/lib/postgresql/data

    networks:

      - postdock

    deploy:

      placement:

        constraints:

          - node.role == manager

          - node.hostname == 192.168.1.161



  pgslave1:

    image: postdock/postgres

    environment:

      PARTNER_NODES: "pgmaster,pgslave1"

      REPLICATION_PRIMARY_HOST: pgmaster

      NODE_ID: 2

      NODE_NAME: pgslave1

      CLUSTER_NODE_NETWORK_NAME: pgslave1 

      REPLICATION_PRIMARY_PORT: 5432

      CONFIGS: "max_replication_slots:10"

    ports:

      - 4001:5432

    volumes:

      - /data/slave_1:/var/lib/postgresql/data

    networks:

      - postdock

    deploy:

      placement:

        constraints:

          - node.role == manager

          - node.hostname == 192.168.1.161



  pgslave2:

    image: postdock/postgres

    environment:

      PARTNER_NODES: "pgmaster,pgslave1,pgslave2"

      REPLICATION_PRIMARY_HOST: pgmaster

      NODE_ID: 3

      NODE_NAME: pgslave2

      CLUSTER_NODE_NETWORK_NAME: pgslave2 

      REPLICATION_PRIMARY_PORT: 5432

      CONFIGS: "max_replication_slots:10"

    ports:

      - 4002:5432

    volumes:

      - /data/slave_2:/var/lib/postgresql/data

    networks:

      - postdock

    deploy:

      placement:

        constraints:

          - node.role == manager

          - node.hostname == 192.168.1.161



  db:

    image: postdock/pgpool

    environment:

      PCP_USER: pcp_user

      PCP_PASSWORD: pcp_pass

      WAIT_BACKEND_TIMEOUT: 60

      CHECK_USER: postgres

      CHECK_PASSWORD: 123

      CHECK_PGCONNECT_TIMEOUT: 3 

      DB_USERS: postgres:123

      BACKENDS: "0:pgmaster:5432:1:/var/lib/postgresql/data:ALLOW_TO_FAILOVER,1:pgslave1::::,2:pgslave2::::," 

      REQUIRE_MIN_BACKENDS: 1 

      CONFIGS: "num_init_children:250,max_pool:4"

    ports:

      - 4003:5432

      - 9899:9898

    networks:

      - postdock

    deploy:

      placement:

        constraints:

          - node.role == manager

          - node.hostname == 192.168.1.161

I run:

docker network create -d overlay postdock



docker stack deploy -c docker-compose.yml postdock

and things work smoothly.

>>> Waiting for local postgres server start...,

expr: non-integer argument,

>>> Wait schema . on pgmaster:5432(user: public,password: *******), will try  times with delay 10 seconds (TIMEOUT=)

As you can see, there is no schema specified, only the dot sign ".", and the user is also wrong: It should be replication_user, not the user public

which leads to this error message:

2018-11-16 04:45:33.310 UTC [122] FATAL:  password authentication failed for user "public",

2018-11-16 04:45:33.310 UTC [122] DETAIL:  Role "public" does not exist.,

    Connection matched pg_hba.conf line 95: "host all all all md5",

psql: FATAL:  password authentication failed for user "public",

2018-11-16 04:45:37.974 UTC [125] FATAL:  no PostgreSQL user name specified in startup packet,

2018-11-16 04:45:39.345 UTC [127] FATAL:  no PostgreSQL user name specified in startup packet,

2018-11-16 04:45:40.374 UTC [128] FATAL:  no PostgreSQL user name specified in startup packet,

2018-11-16 04:45:41.386 UTC [129] FATAL:  no PostgreSQL user name specified in startup packet,

2018-11-16 04:45:42.421 UTC [130] FATAL:  no PostgreSQL user name specified in startup packet,

>>>>>> Host pgmaster:5432 is not accessible (will try  times more),

expr: non-integer argument,

As I understand, when the auto failover succeeds, the expected recovery log should be:

>>> Waiting for local postgres server start...,

>>> Wait schema replication_db.public on pgmaster:5432(user: replication_user,password: *******), will try 9 times with delay 10 seconds (TIMEOUT=90),

>>>>>> Schema replication_db.public exists on host pgmaster:5432!,

>>> Registering node with role master

Anybody have any idea about the root cause of this issue ?

edited Nov 16 '18 at 11:00

asked Nov 16 '18 at 10:53

moeman

112

add a comment |

I use Docker service and Docker swarm to deploy the PostDock cluster

This is my docker-compose.yml setup:

version: "3.3"

networks:

  postdock:

    external: true



services:      

  pgmaster:

    image: postdock/postgres

    environment:

      PARTNER_NODES: "pgmaster,pgslave1"

      CLUSTER_NODE_NETWORK_NAME: pgmaster

      NODE_PRIORITY: 100  

      NODE_ID: 1

      NODE_NAME: pgmaster

      POSTGRES_PASSWORD: 123

      POSTGRES_USER: postgres

      POSTGRES_DB: postgres

      CONFIGS: "listen_addresses:'*'"

      CLUSTER_NAME: pg_cluster 

      REPLICATION_DB: replication_db

      REPLICATION_USER: replication_user 

      REPLICATION_PASSWORD: replication_pass 

    ports:

      - 4000:5432

    volumes:

      - /data/master_slave:/var/lib/postgresql/data

    networks:

      - postdock

    deploy:

      placement:

        constraints:

          - node.role == manager

          - node.hostname == 192.168.1.161



  pgslave1:

    image: postdock/postgres

    environment:

      PARTNER_NODES: "pgmaster,pgslave1"

      REPLICATION_PRIMARY_HOST: pgmaster

      NODE_ID: 2

      NODE_NAME: pgslave1

      CLUSTER_NODE_NETWORK_NAME: pgslave1 

      REPLICATION_PRIMARY_PORT: 5432

      CONFIGS: "max_replication_slots:10"

    ports:

      - 4001:5432

    volumes:

      - /data/slave_1:/var/lib/postgresql/data

    networks:

      - postdock

    deploy:

      placement:

        constraints:

          - node.role == manager

          - node.hostname == 192.168.1.161



  pgslave2:

    image: postdock/postgres

    environment:

      PARTNER_NODES: "pgmaster,pgslave1,pgslave2"

      REPLICATION_PRIMARY_HOST: pgmaster

      NODE_ID: 3

      NODE_NAME: pgslave2

      CLUSTER_NODE_NETWORK_NAME: pgslave2 

      REPLICATION_PRIMARY_PORT: 5432

      CONFIGS: "max_replication_slots:10"

    ports:

      - 4002:5432

    volumes:

      - /data/slave_2:/var/lib/postgresql/data

    networks:

      - postdock

    deploy:

      placement:

        constraints:

          - node.role == manager

          - node.hostname == 192.168.1.161



  db:

    image: postdock/pgpool

    environment:

      PCP_USER: pcp_user

      PCP_PASSWORD: pcp_pass

      WAIT_BACKEND_TIMEOUT: 60

      CHECK_USER: postgres

      CHECK_PASSWORD: 123

      CHECK_PGCONNECT_TIMEOUT: 3 

      DB_USERS: postgres:123

      BACKENDS: "0:pgmaster:5432:1:/var/lib/postgresql/data:ALLOW_TO_FAILOVER,1:pgslave1::::,2:pgslave2::::," 

      REQUIRE_MIN_BACKENDS: 1 

      CONFIGS: "num_init_children:250,max_pool:4"

    ports:

      - 4003:5432

      - 9899:9898

    networks:

      - postdock

    deploy:

      placement:

        constraints:

          - node.role == manager

          - node.hostname == 192.168.1.161

I run:

docker network create -d overlay postdock



docker stack deploy -c docker-compose.yml postdock

and things work smoothly.

>>> Waiting for local postgres server start...,

expr: non-integer argument,

>>> Wait schema . on pgmaster:5432(user: public,password: *******), will try  times with delay 10 seconds (TIMEOUT=)

As you can see, there is no schema specified, only the dot sign ".", and the user is also wrong: It should be replication_user, not the user public

which leads to this error message:

2018-11-16 04:45:33.310 UTC [122] FATAL:  password authentication failed for user "public",

2018-11-16 04:45:33.310 UTC [122] DETAIL:  Role "public" does not exist.,

    Connection matched pg_hba.conf line 95: "host all all all md5",

psql: FATAL:  password authentication failed for user "public",

2018-11-16 04:45:37.974 UTC [125] FATAL:  no PostgreSQL user name specified in startup packet,

2018-11-16 04:45:39.345 UTC [127] FATAL:  no PostgreSQL user name specified in startup packet,

2018-11-16 04:45:40.374 UTC [128] FATAL:  no PostgreSQL user name specified in startup packet,

2018-11-16 04:45:41.386 UTC [129] FATAL:  no PostgreSQL user name specified in startup packet,

2018-11-16 04:45:42.421 UTC [130] FATAL:  no PostgreSQL user name specified in startup packet,

>>>>>> Host pgmaster:5432 is not accessible (will try  times more),

expr: non-integer argument,

As I understand, when the auto failover succeeds, the expected recovery log should be:

>>> Waiting for local postgres server start...,

>>> Wait schema replication_db.public on pgmaster:5432(user: replication_user,password: *******), will try 9 times with delay 10 seconds (TIMEOUT=90),

>>>>>> Schema replication_db.public exists on host pgmaster:5432!,

>>> Registering node with role master

Anybody have any idea about the root cause of this issue ?

edited Nov 16 '18 at 11:00

asked Nov 16 '18 at 10:53

moeman

112

I use Docker service and Docker swarm to deploy the PostDock cluster

This is my docker-compose.yml setup:

version: "3.3"

networks:

  postdock:

    external: true



services:      

  pgmaster:

    image: postdock/postgres

    environment:

      PARTNER_NODES: "pgmaster,pgslave1"

      CLUSTER_NODE_NETWORK_NAME: pgmaster

      NODE_PRIORITY: 100  

      NODE_ID: 1

      NODE_NAME: pgmaster

      POSTGRES_PASSWORD: 123

      POSTGRES_USER: postgres

      POSTGRES_DB: postgres

      CONFIGS: "listen_addresses:'*'"

      CLUSTER_NAME: pg_cluster 

      REPLICATION_DB: replication_db

      REPLICATION_USER: replication_user 

      REPLICATION_PASSWORD: replication_pass 

    ports:

      - 4000:5432

    volumes:

      - /data/master_slave:/var/lib/postgresql/data

    networks:

      - postdock

    deploy:

      placement:

        constraints:

          - node.role == manager

          - node.hostname == 192.168.1.161



  pgslave1:

    image: postdock/postgres

    environment:

      PARTNER_NODES: "pgmaster,pgslave1"

      REPLICATION_PRIMARY_HOST: pgmaster

      NODE_ID: 2

      NODE_NAME: pgslave1

      CLUSTER_NODE_NETWORK_NAME: pgslave1 

      REPLICATION_PRIMARY_PORT: 5432

      CONFIGS: "max_replication_slots:10"

    ports:

      - 4001:5432

    volumes:

      - /data/slave_1:/var/lib/postgresql/data

    networks:

      - postdock

    deploy:

      placement:

        constraints:

          - node.role == manager

          - node.hostname == 192.168.1.161



  pgslave2:

    image: postdock/postgres

    environment:

      PARTNER_NODES: "pgmaster,pgslave1,pgslave2"

      REPLICATION_PRIMARY_HOST: pgmaster

      NODE_ID: 3

      NODE_NAME: pgslave2

      CLUSTER_NODE_NETWORK_NAME: pgslave2 

      REPLICATION_PRIMARY_PORT: 5432

      CONFIGS: "max_replication_slots:10"

    ports:

      - 4002:5432

    volumes:

      - /data/slave_2:/var/lib/postgresql/data

    networks:

      - postdock

    deploy:

      placement:

        constraints:

          - node.role == manager

          - node.hostname == 192.168.1.161



  db:

    image: postdock/pgpool

    environment:

      PCP_USER: pcp_user

      PCP_PASSWORD: pcp_pass

      WAIT_BACKEND_TIMEOUT: 60

      CHECK_USER: postgres

      CHECK_PASSWORD: 123

      CHECK_PGCONNECT_TIMEOUT: 3 

      DB_USERS: postgres:123

      BACKENDS: "0:pgmaster:5432:1:/var/lib/postgresql/data:ALLOW_TO_FAILOVER,1:pgslave1::::,2:pgslave2::::," 

      REQUIRE_MIN_BACKENDS: 1 

      CONFIGS: "num_init_children:250,max_pool:4"

    ports:

      - 4003:5432

      - 9899:9898

    networks:

      - postdock

    deploy:

      placement:

        constraints:

          - node.role == manager

          - node.hostname == 192.168.1.161

I run:

docker network create -d overlay postdock



docker stack deploy -c docker-compose.yml postdock

and things work smoothly.

>>> Waiting for local postgres server start...,

expr: non-integer argument,

>>> Wait schema . on pgmaster:5432(user: public,password: *******), will try  times with delay 10 seconds (TIMEOUT=)

As you can see, there is no schema specified, only the dot sign ".", and the user is also wrong: It should be replication_user, not the user public

which leads to this error message:

2018-11-16 04:45:33.310 UTC [122] FATAL:  password authentication failed for user "public",

2018-11-16 04:45:33.310 UTC [122] DETAIL:  Role "public" does not exist.,

    Connection matched pg_hba.conf line 95: "host all all all md5",

psql: FATAL:  password authentication failed for user "public",

2018-11-16 04:45:37.974 UTC [125] FATAL:  no PostgreSQL user name specified in startup packet,

2018-11-16 04:45:39.345 UTC [127] FATAL:  no PostgreSQL user name specified in startup packet,

2018-11-16 04:45:40.374 UTC [128] FATAL:  no PostgreSQL user name specified in startup packet,

2018-11-16 04:45:41.386 UTC [129] FATAL:  no PostgreSQL user name specified in startup packet,

2018-11-16 04:45:42.421 UTC [130] FATAL:  no PostgreSQL user name specified in startup packet,

>>>>>> Host pgmaster:5432 is not accessible (will try  times more),

expr: non-integer argument,

As I understand, when the auto failover succeeds, the expected recovery log should be:

>>> Waiting for local postgres server start...,

>>> Wait schema replication_db.public on pgmaster:5432(user: replication_user,password: *******), will try 9 times with delay 10 seconds (TIMEOUT=90),

>>>>>> Schema replication_db.public exists on host pgmaster:5432!,

>>> Registering node with role master

Anybody have any idea about the root cause of this issue ?

postgresql pgpool automatic-failover repmgr

edited Nov 16 '18 at 11:00

asked Nov 16 '18 at 10:53

moeman

112

edited Nov 16 '18 at 11:00

asked Nov 16 '18 at 10:53

moeman

112

edited Nov 16 '18 at 11:00

asked Nov 16 '18 at 10:53

moeman

112

asked Nov 16 '18 at 10:53

moeman

112

asked Nov 16 '18 at 10:53

moeman

112

add a comment |

active

oldest

votes

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53336403%2fpostgresql-postdock-auto-recovery-failed-in-master-node%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

active

oldest

votes

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

Some of your past answers have not been well-received, and you're in danger of being blocked from answering.

Please pay close attention to the following guidance:

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Cfrgtkky