PostgreSQL/PostDock: Auto recovery failed in master node












2














I use Docker service and Docker swarm to deploy the PostDock cluster



This is my docker-compose.yml setup:



version: "3.3"
networks:
postdock:
external: true

services:
pgmaster:
image: postdock/postgres
environment:
PARTNER_NODES: "pgmaster,pgslave1"
CLUSTER_NODE_NETWORK_NAME: pgmaster
NODE_PRIORITY: 100
NODE_ID: 1
NODE_NAME: pgmaster
POSTGRES_PASSWORD: 123
POSTGRES_USER: postgres
POSTGRES_DB: postgres
CONFIGS: "listen_addresses:'*'"
CLUSTER_NAME: pg_cluster
REPLICATION_DB: replication_db
REPLICATION_USER: replication_user
REPLICATION_PASSWORD: replication_pass
ports:
- 4000:5432
volumes:
- /data/master_slave:/var/lib/postgresql/data
networks:
- postdock
deploy:
placement:
constraints:
- node.role == manager
- node.hostname == 192.168.1.161

pgslave1:
image: postdock/postgres
environment:
PARTNER_NODES: "pgmaster,pgslave1"
REPLICATION_PRIMARY_HOST: pgmaster
NODE_ID: 2
NODE_NAME: pgslave1
CLUSTER_NODE_NETWORK_NAME: pgslave1
REPLICATION_PRIMARY_PORT: 5432
CONFIGS: "max_replication_slots:10"
ports:
- 4001:5432
volumes:
- /data/slave_1:/var/lib/postgresql/data
networks:
- postdock
deploy:
placement:
constraints:
- node.role == manager
- node.hostname == 192.168.1.161

pgslave2:
image: postdock/postgres
environment:
PARTNER_NODES: "pgmaster,pgslave1,pgslave2"
REPLICATION_PRIMARY_HOST: pgmaster
NODE_ID: 3
NODE_NAME: pgslave2
CLUSTER_NODE_NETWORK_NAME: pgslave2
REPLICATION_PRIMARY_PORT: 5432
CONFIGS: "max_replication_slots:10"
ports:
- 4002:5432
volumes:
- /data/slave_2:/var/lib/postgresql/data
networks:
- postdock
deploy:
placement:
constraints:
- node.role == manager
- node.hostname == 192.168.1.161

db:
image: postdock/pgpool
environment:
PCP_USER: pcp_user
PCP_PASSWORD: pcp_pass
WAIT_BACKEND_TIMEOUT: 60
CHECK_USER: postgres
CHECK_PASSWORD: 123
CHECK_PGCONNECT_TIMEOUT: 3
DB_USERS: postgres:123
BACKENDS: "0:pgmaster:5432:1:/var/lib/postgresql/data:ALLOW_TO_FAILOVER,1:pgslave1::::,2:pgslave2::::,"
REQUIRE_MIN_BACKENDS: 1
CONFIGS: "num_init_children:250,max_pool:4"
ports:
- 4003:5432
- 9899:9898
networks:
- postdock
deploy:
placement:
constraints:
- node.role == manager
- node.hostname == 192.168.1.161


I run:



docker network create -d overlay postdock

docker stack deploy -c docker-compose.yml postdock


and things work smoothly.



However, after I update the services multiple times, then the auto failover on master node failed. In the master node log file, I noticed that the recovery process cannot detect the database replication_db and the schema replication_db.public:



>>> Waiting for local postgres server start...,
expr: non-integer argument,
>>> Wait schema . on pgmaster:5432(user: public,password: *******), will try times with delay 10 seconds (TIMEOUT=)


As you can see, there is no schema specified, only the dot sign ".", and the user is also wrong: It should be replication_user, not the user public



which leads to this error message:



2018-11-16 04:45:33.310 UTC [122] FATAL:  password authentication failed for user "public",
2018-11-16 04:45:33.310 UTC [122] DETAIL: Role "public" does not exist.,
Connection matched pg_hba.conf line 95: "host all all all md5",
psql: FATAL: password authentication failed for user "public",
2018-11-16 04:45:37.974 UTC [125] FATAL: no PostgreSQL user name specified in startup packet,
2018-11-16 04:45:39.345 UTC [127] FATAL: no PostgreSQL user name specified in startup packet,
2018-11-16 04:45:40.374 UTC [128] FATAL: no PostgreSQL user name specified in startup packet,
2018-11-16 04:45:41.386 UTC [129] FATAL: no PostgreSQL user name specified in startup packet,
2018-11-16 04:45:42.421 UTC [130] FATAL: no PostgreSQL user name specified in startup packet,
>>>>>> Host pgmaster:5432 is not accessible (will try times more),
expr: non-integer argument,


As I understand, when the auto failover succeeds, the expected recovery log should be:



>>> Waiting for local postgres server start...,
>>> Wait schema replication_db.public on pgmaster:5432(user: replication_user,password: *******), will try 9 times with delay 10 seconds (TIMEOUT=90),
>>>>>> Schema replication_db.public exists on host pgmaster:5432!,
>>> Registering node with role master


Anybody have any idea about the root cause of this issue ?










share|improve this question





























    2














    I use Docker service and Docker swarm to deploy the PostDock cluster



    This is my docker-compose.yml setup:



    version: "3.3"
    networks:
    postdock:
    external: true

    services:
    pgmaster:
    image: postdock/postgres
    environment:
    PARTNER_NODES: "pgmaster,pgslave1"
    CLUSTER_NODE_NETWORK_NAME: pgmaster
    NODE_PRIORITY: 100
    NODE_ID: 1
    NODE_NAME: pgmaster
    POSTGRES_PASSWORD: 123
    POSTGRES_USER: postgres
    POSTGRES_DB: postgres
    CONFIGS: "listen_addresses:'*'"
    CLUSTER_NAME: pg_cluster
    REPLICATION_DB: replication_db
    REPLICATION_USER: replication_user
    REPLICATION_PASSWORD: replication_pass
    ports:
    - 4000:5432
    volumes:
    - /data/master_slave:/var/lib/postgresql/data
    networks:
    - postdock
    deploy:
    placement:
    constraints:
    - node.role == manager
    - node.hostname == 192.168.1.161

    pgslave1:
    image: postdock/postgres
    environment:
    PARTNER_NODES: "pgmaster,pgslave1"
    REPLICATION_PRIMARY_HOST: pgmaster
    NODE_ID: 2
    NODE_NAME: pgslave1
    CLUSTER_NODE_NETWORK_NAME: pgslave1
    REPLICATION_PRIMARY_PORT: 5432
    CONFIGS: "max_replication_slots:10"
    ports:
    - 4001:5432
    volumes:
    - /data/slave_1:/var/lib/postgresql/data
    networks:
    - postdock
    deploy:
    placement:
    constraints:
    - node.role == manager
    - node.hostname == 192.168.1.161

    pgslave2:
    image: postdock/postgres
    environment:
    PARTNER_NODES: "pgmaster,pgslave1,pgslave2"
    REPLICATION_PRIMARY_HOST: pgmaster
    NODE_ID: 3
    NODE_NAME: pgslave2
    CLUSTER_NODE_NETWORK_NAME: pgslave2
    REPLICATION_PRIMARY_PORT: 5432
    CONFIGS: "max_replication_slots:10"
    ports:
    - 4002:5432
    volumes:
    - /data/slave_2:/var/lib/postgresql/data
    networks:
    - postdock
    deploy:
    placement:
    constraints:
    - node.role == manager
    - node.hostname == 192.168.1.161

    db:
    image: postdock/pgpool
    environment:
    PCP_USER: pcp_user
    PCP_PASSWORD: pcp_pass
    WAIT_BACKEND_TIMEOUT: 60
    CHECK_USER: postgres
    CHECK_PASSWORD: 123
    CHECK_PGCONNECT_TIMEOUT: 3
    DB_USERS: postgres:123
    BACKENDS: "0:pgmaster:5432:1:/var/lib/postgresql/data:ALLOW_TO_FAILOVER,1:pgslave1::::,2:pgslave2::::,"
    REQUIRE_MIN_BACKENDS: 1
    CONFIGS: "num_init_children:250,max_pool:4"
    ports:
    - 4003:5432
    - 9899:9898
    networks:
    - postdock
    deploy:
    placement:
    constraints:
    - node.role == manager
    - node.hostname == 192.168.1.161


    I run:



    docker network create -d overlay postdock

    docker stack deploy -c docker-compose.yml postdock


    and things work smoothly.



    However, after I update the services multiple times, then the auto failover on master node failed. In the master node log file, I noticed that the recovery process cannot detect the database replication_db and the schema replication_db.public:



    >>> Waiting for local postgres server start...,
    expr: non-integer argument,
    >>> Wait schema . on pgmaster:5432(user: public,password: *******), will try times with delay 10 seconds (TIMEOUT=)


    As you can see, there is no schema specified, only the dot sign ".", and the user is also wrong: It should be replication_user, not the user public



    which leads to this error message:



    2018-11-16 04:45:33.310 UTC [122] FATAL:  password authentication failed for user "public",
    2018-11-16 04:45:33.310 UTC [122] DETAIL: Role "public" does not exist.,
    Connection matched pg_hba.conf line 95: "host all all all md5",
    psql: FATAL: password authentication failed for user "public",
    2018-11-16 04:45:37.974 UTC [125] FATAL: no PostgreSQL user name specified in startup packet,
    2018-11-16 04:45:39.345 UTC [127] FATAL: no PostgreSQL user name specified in startup packet,
    2018-11-16 04:45:40.374 UTC [128] FATAL: no PostgreSQL user name specified in startup packet,
    2018-11-16 04:45:41.386 UTC [129] FATAL: no PostgreSQL user name specified in startup packet,
    2018-11-16 04:45:42.421 UTC [130] FATAL: no PostgreSQL user name specified in startup packet,
    >>>>>> Host pgmaster:5432 is not accessible (will try times more),
    expr: non-integer argument,


    As I understand, when the auto failover succeeds, the expected recovery log should be:



    >>> Waiting for local postgres server start...,
    >>> Wait schema replication_db.public on pgmaster:5432(user: replication_user,password: *******), will try 9 times with delay 10 seconds (TIMEOUT=90),
    >>>>>> Schema replication_db.public exists on host pgmaster:5432!,
    >>> Registering node with role master


    Anybody have any idea about the root cause of this issue ?










    share|improve this question



























      2












      2








      2







      I use Docker service and Docker swarm to deploy the PostDock cluster



      This is my docker-compose.yml setup:



      version: "3.3"
      networks:
      postdock:
      external: true

      services:
      pgmaster:
      image: postdock/postgres
      environment:
      PARTNER_NODES: "pgmaster,pgslave1"
      CLUSTER_NODE_NETWORK_NAME: pgmaster
      NODE_PRIORITY: 100
      NODE_ID: 1
      NODE_NAME: pgmaster
      POSTGRES_PASSWORD: 123
      POSTGRES_USER: postgres
      POSTGRES_DB: postgres
      CONFIGS: "listen_addresses:'*'"
      CLUSTER_NAME: pg_cluster
      REPLICATION_DB: replication_db
      REPLICATION_USER: replication_user
      REPLICATION_PASSWORD: replication_pass
      ports:
      - 4000:5432
      volumes:
      - /data/master_slave:/var/lib/postgresql/data
      networks:
      - postdock
      deploy:
      placement:
      constraints:
      - node.role == manager
      - node.hostname == 192.168.1.161

      pgslave1:
      image: postdock/postgres
      environment:
      PARTNER_NODES: "pgmaster,pgslave1"
      REPLICATION_PRIMARY_HOST: pgmaster
      NODE_ID: 2
      NODE_NAME: pgslave1
      CLUSTER_NODE_NETWORK_NAME: pgslave1
      REPLICATION_PRIMARY_PORT: 5432
      CONFIGS: "max_replication_slots:10"
      ports:
      - 4001:5432
      volumes:
      - /data/slave_1:/var/lib/postgresql/data
      networks:
      - postdock
      deploy:
      placement:
      constraints:
      - node.role == manager
      - node.hostname == 192.168.1.161

      pgslave2:
      image: postdock/postgres
      environment:
      PARTNER_NODES: "pgmaster,pgslave1,pgslave2"
      REPLICATION_PRIMARY_HOST: pgmaster
      NODE_ID: 3
      NODE_NAME: pgslave2
      CLUSTER_NODE_NETWORK_NAME: pgslave2
      REPLICATION_PRIMARY_PORT: 5432
      CONFIGS: "max_replication_slots:10"
      ports:
      - 4002:5432
      volumes:
      - /data/slave_2:/var/lib/postgresql/data
      networks:
      - postdock
      deploy:
      placement:
      constraints:
      - node.role == manager
      - node.hostname == 192.168.1.161

      db:
      image: postdock/pgpool
      environment:
      PCP_USER: pcp_user
      PCP_PASSWORD: pcp_pass
      WAIT_BACKEND_TIMEOUT: 60
      CHECK_USER: postgres
      CHECK_PASSWORD: 123
      CHECK_PGCONNECT_TIMEOUT: 3
      DB_USERS: postgres:123
      BACKENDS: "0:pgmaster:5432:1:/var/lib/postgresql/data:ALLOW_TO_FAILOVER,1:pgslave1::::,2:pgslave2::::,"
      REQUIRE_MIN_BACKENDS: 1
      CONFIGS: "num_init_children:250,max_pool:4"
      ports:
      - 4003:5432
      - 9899:9898
      networks:
      - postdock
      deploy:
      placement:
      constraints:
      - node.role == manager
      - node.hostname == 192.168.1.161


      I run:



      docker network create -d overlay postdock

      docker stack deploy -c docker-compose.yml postdock


      and things work smoothly.



      However, after I update the services multiple times, then the auto failover on master node failed. In the master node log file, I noticed that the recovery process cannot detect the database replication_db and the schema replication_db.public:



      >>> Waiting for local postgres server start...,
      expr: non-integer argument,
      >>> Wait schema . on pgmaster:5432(user: public,password: *******), will try times with delay 10 seconds (TIMEOUT=)


      As you can see, there is no schema specified, only the dot sign ".", and the user is also wrong: It should be replication_user, not the user public



      which leads to this error message:



      2018-11-16 04:45:33.310 UTC [122] FATAL:  password authentication failed for user "public",
      2018-11-16 04:45:33.310 UTC [122] DETAIL: Role "public" does not exist.,
      Connection matched pg_hba.conf line 95: "host all all all md5",
      psql: FATAL: password authentication failed for user "public",
      2018-11-16 04:45:37.974 UTC [125] FATAL: no PostgreSQL user name specified in startup packet,
      2018-11-16 04:45:39.345 UTC [127] FATAL: no PostgreSQL user name specified in startup packet,
      2018-11-16 04:45:40.374 UTC [128] FATAL: no PostgreSQL user name specified in startup packet,
      2018-11-16 04:45:41.386 UTC [129] FATAL: no PostgreSQL user name specified in startup packet,
      2018-11-16 04:45:42.421 UTC [130] FATAL: no PostgreSQL user name specified in startup packet,
      >>>>>> Host pgmaster:5432 is not accessible (will try times more),
      expr: non-integer argument,


      As I understand, when the auto failover succeeds, the expected recovery log should be:



      >>> Waiting for local postgres server start...,
      >>> Wait schema replication_db.public on pgmaster:5432(user: replication_user,password: *******), will try 9 times with delay 10 seconds (TIMEOUT=90),
      >>>>>> Schema replication_db.public exists on host pgmaster:5432!,
      >>> Registering node with role master


      Anybody have any idea about the root cause of this issue ?










      share|improve this question















      I use Docker service and Docker swarm to deploy the PostDock cluster



      This is my docker-compose.yml setup:



      version: "3.3"
      networks:
      postdock:
      external: true

      services:
      pgmaster:
      image: postdock/postgres
      environment:
      PARTNER_NODES: "pgmaster,pgslave1"
      CLUSTER_NODE_NETWORK_NAME: pgmaster
      NODE_PRIORITY: 100
      NODE_ID: 1
      NODE_NAME: pgmaster
      POSTGRES_PASSWORD: 123
      POSTGRES_USER: postgres
      POSTGRES_DB: postgres
      CONFIGS: "listen_addresses:'*'"
      CLUSTER_NAME: pg_cluster
      REPLICATION_DB: replication_db
      REPLICATION_USER: replication_user
      REPLICATION_PASSWORD: replication_pass
      ports:
      - 4000:5432
      volumes:
      - /data/master_slave:/var/lib/postgresql/data
      networks:
      - postdock
      deploy:
      placement:
      constraints:
      - node.role == manager
      - node.hostname == 192.168.1.161

      pgslave1:
      image: postdock/postgres
      environment:
      PARTNER_NODES: "pgmaster,pgslave1"
      REPLICATION_PRIMARY_HOST: pgmaster
      NODE_ID: 2
      NODE_NAME: pgslave1
      CLUSTER_NODE_NETWORK_NAME: pgslave1
      REPLICATION_PRIMARY_PORT: 5432
      CONFIGS: "max_replication_slots:10"
      ports:
      - 4001:5432
      volumes:
      - /data/slave_1:/var/lib/postgresql/data
      networks:
      - postdock
      deploy:
      placement:
      constraints:
      - node.role == manager
      - node.hostname == 192.168.1.161

      pgslave2:
      image: postdock/postgres
      environment:
      PARTNER_NODES: "pgmaster,pgslave1,pgslave2"
      REPLICATION_PRIMARY_HOST: pgmaster
      NODE_ID: 3
      NODE_NAME: pgslave2
      CLUSTER_NODE_NETWORK_NAME: pgslave2
      REPLICATION_PRIMARY_PORT: 5432
      CONFIGS: "max_replication_slots:10"
      ports:
      - 4002:5432
      volumes:
      - /data/slave_2:/var/lib/postgresql/data
      networks:
      - postdock
      deploy:
      placement:
      constraints:
      - node.role == manager
      - node.hostname == 192.168.1.161

      db:
      image: postdock/pgpool
      environment:
      PCP_USER: pcp_user
      PCP_PASSWORD: pcp_pass
      WAIT_BACKEND_TIMEOUT: 60
      CHECK_USER: postgres
      CHECK_PASSWORD: 123
      CHECK_PGCONNECT_TIMEOUT: 3
      DB_USERS: postgres:123
      BACKENDS: "0:pgmaster:5432:1:/var/lib/postgresql/data:ALLOW_TO_FAILOVER,1:pgslave1::::,2:pgslave2::::,"
      REQUIRE_MIN_BACKENDS: 1
      CONFIGS: "num_init_children:250,max_pool:4"
      ports:
      - 4003:5432
      - 9899:9898
      networks:
      - postdock
      deploy:
      placement:
      constraints:
      - node.role == manager
      - node.hostname == 192.168.1.161


      I run:



      docker network create -d overlay postdock

      docker stack deploy -c docker-compose.yml postdock


      and things work smoothly.



      However, after I update the services multiple times, then the auto failover on master node failed. In the master node log file, I noticed that the recovery process cannot detect the database replication_db and the schema replication_db.public:



      >>> Waiting for local postgres server start...,
      expr: non-integer argument,
      >>> Wait schema . on pgmaster:5432(user: public,password: *******), will try times with delay 10 seconds (TIMEOUT=)


      As you can see, there is no schema specified, only the dot sign ".", and the user is also wrong: It should be replication_user, not the user public



      which leads to this error message:



      2018-11-16 04:45:33.310 UTC [122] FATAL:  password authentication failed for user "public",
      2018-11-16 04:45:33.310 UTC [122] DETAIL: Role "public" does not exist.,
      Connection matched pg_hba.conf line 95: "host all all all md5",
      psql: FATAL: password authentication failed for user "public",
      2018-11-16 04:45:37.974 UTC [125] FATAL: no PostgreSQL user name specified in startup packet,
      2018-11-16 04:45:39.345 UTC [127] FATAL: no PostgreSQL user name specified in startup packet,
      2018-11-16 04:45:40.374 UTC [128] FATAL: no PostgreSQL user name specified in startup packet,
      2018-11-16 04:45:41.386 UTC [129] FATAL: no PostgreSQL user name specified in startup packet,
      2018-11-16 04:45:42.421 UTC [130] FATAL: no PostgreSQL user name specified in startup packet,
      >>>>>> Host pgmaster:5432 is not accessible (will try times more),
      expr: non-integer argument,


      As I understand, when the auto failover succeeds, the expected recovery log should be:



      >>> Waiting for local postgres server start...,
      >>> Wait schema replication_db.public on pgmaster:5432(user: replication_user,password: *******), will try 9 times with delay 10 seconds (TIMEOUT=90),
      >>>>>> Schema replication_db.public exists on host pgmaster:5432!,
      >>> Registering node with role master


      Anybody have any idea about the root cause of this issue ?







      postgresql pgpool automatic-failover repmgr






      share|improve this question















      share|improve this question













      share|improve this question




      share|improve this question








      edited Nov 16 '18 at 11:00

























      asked Nov 16 '18 at 10:53









      moeman

      112




      112





























          active

          oldest

          votes











          Your Answer






          StackExchange.ifUsing("editor", function () {
          StackExchange.using("externalEditor", function () {
          StackExchange.using("snippets", function () {
          StackExchange.snippets.init();
          });
          });
          }, "code-snippets");

          StackExchange.ready(function() {
          var channelOptions = {
          tags: "".split(" "),
          id: "1"
          };
          initTagRenderer("".split(" "), "".split(" "), channelOptions);

          StackExchange.using("externalEditor", function() {
          // Have to fire editor after snippets, if snippets enabled
          if (StackExchange.settings.snippets.snippetsEnabled) {
          StackExchange.using("snippets", function() {
          createEditor();
          });
          }
          else {
          createEditor();
          }
          });

          function createEditor() {
          StackExchange.prepareEditor({
          heartbeatType: 'answer',
          autoActivateHeartbeat: false,
          convertImagesToLinks: true,
          noModals: true,
          showLowRepImageUploadWarning: true,
          reputationToPostImages: 10,
          bindNavPrevention: true,
          postfix: "",
          imageUploader: {
          brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
          contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
          allowUrls: true
          },
          onDemand: true,
          discardSelector: ".discard-answer"
          ,immediatelyShowMarkdownHelp:true
          });


          }
          });














          draft saved

          draft discarded


















          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53336403%2fpostgresql-postdock-auto-recovery-failed-in-master-node%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown






























          active

          oldest

          votes













          active

          oldest

          votes









          active

          oldest

          votes






          active

          oldest

          votes
















          draft saved

          draft discarded




















































          Thanks for contributing an answer to Stack Overflow!


          • Please be sure to answer the question. Provide details and share your research!

          But avoid



          • Asking for help, clarification, or responding to other answers.

          • Making statements based on opinion; back them up with references or personal experience.


          To learn more, see our tips on writing great answers.





          Some of your past answers have not been well-received, and you're in danger of being blocked from answering.


          Please pay close attention to the following guidance:


          • Please be sure to answer the question. Provide details and share your research!

          But avoid



          • Asking for help, clarification, or responding to other answers.

          • Making statements based on opinion; back them up with references or personal experience.


          To learn more, see our tips on writing great answers.




          draft saved


          draft discarded














          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53336403%2fpostgresql-postdock-auto-recovery-failed-in-master-node%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown





















































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown

































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown







          Popular posts from this blog

          How to change which sound is reproduced for terminal bell?

          Can I use Tabulator js library in my java Spring + Thymeleaf project?

          Title Spacing in Bjornstrup Chapter, Removing Chapter Number From Contents