How to remove duplicate rows with foreign keys dependencies?





.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty{ height:90px;width:728px;box-sizing:border-box;
}







3















I'm sure this is common place, but Google is not helping. I am trying to write a simple stored procedure in PostgreSQL 9.1 that will remove duplicate entries from a parent cpt table. The parent table cpt is referenced by a child table lab defined as:



CREATE TABLE lab (
recid serial NOT NULL,
cpt_recid integer,
........
CONSTRAINT cs_cpt FOREIGN KEY (cpt_recid)
REFERENCES cpt (recid) MATCH SIMPLE
ON UPDATE NO ACTION ON DELETE RESTRICT,
...
);


The biggest problem I'm having is how to obtain the record which failed so that I can use it in the EXCEPTION clause to move the children rows from lab to one acceptable key, then loop back through and delete the unnecessary records from the cpt table.



Here is the (very wrong) code:



CREATE OR REPLACE FUNCTION h_RemoveDuplicateCPT()
RETURNS void AS
$BODY$
BEGIN
LOOP
BEGIN

DELETE FROM cpt
WHERE recid IN (
SELECT recid
FROM (
SELECT recid,
row_number() over (partition BY cdesc ORDER BY recid) AS rnum
FROM cpt) t
WHERE t.rnum > 1)
RETURNING recid;

IF count = 0 THEN
RETURN;
END IF;

EXCEPTION WHEN foreign_key_violation THEN
RAISE NOTICE 'fixing unique_violation';
RAISE NOTICE 'recid is %' , recid;
END;
END LOOP;
END;
$BODY$
LANGUAGE plpgsql VOLATILE;









share|improve this question

























  • Have you tried hidden column ctid ? It is helpful with deleting duplicates even when all visible columns are the same.

    – Tomasz Myrta
    Jun 20 '15 at 21:08


















3















I'm sure this is common place, but Google is not helping. I am trying to write a simple stored procedure in PostgreSQL 9.1 that will remove duplicate entries from a parent cpt table. The parent table cpt is referenced by a child table lab defined as:



CREATE TABLE lab (
recid serial NOT NULL,
cpt_recid integer,
........
CONSTRAINT cs_cpt FOREIGN KEY (cpt_recid)
REFERENCES cpt (recid) MATCH SIMPLE
ON UPDATE NO ACTION ON DELETE RESTRICT,
...
);


The biggest problem I'm having is how to obtain the record which failed so that I can use it in the EXCEPTION clause to move the children rows from lab to one acceptable key, then loop back through and delete the unnecessary records from the cpt table.



Here is the (very wrong) code:



CREATE OR REPLACE FUNCTION h_RemoveDuplicateCPT()
RETURNS void AS
$BODY$
BEGIN
LOOP
BEGIN

DELETE FROM cpt
WHERE recid IN (
SELECT recid
FROM (
SELECT recid,
row_number() over (partition BY cdesc ORDER BY recid) AS rnum
FROM cpt) t
WHERE t.rnum > 1)
RETURNING recid;

IF count = 0 THEN
RETURN;
END IF;

EXCEPTION WHEN foreign_key_violation THEN
RAISE NOTICE 'fixing unique_violation';
RAISE NOTICE 'recid is %' , recid;
END;
END LOOP;
END;
$BODY$
LANGUAGE plpgsql VOLATILE;









share|improve this question

























  • Have you tried hidden column ctid ? It is helpful with deleting duplicates even when all visible columns are the same.

    – Tomasz Myrta
    Jun 20 '15 at 21:08














3












3








3


3






I'm sure this is common place, but Google is not helping. I am trying to write a simple stored procedure in PostgreSQL 9.1 that will remove duplicate entries from a parent cpt table. The parent table cpt is referenced by a child table lab defined as:



CREATE TABLE lab (
recid serial NOT NULL,
cpt_recid integer,
........
CONSTRAINT cs_cpt FOREIGN KEY (cpt_recid)
REFERENCES cpt (recid) MATCH SIMPLE
ON UPDATE NO ACTION ON DELETE RESTRICT,
...
);


The biggest problem I'm having is how to obtain the record which failed so that I can use it in the EXCEPTION clause to move the children rows from lab to one acceptable key, then loop back through and delete the unnecessary records from the cpt table.



Here is the (very wrong) code:



CREATE OR REPLACE FUNCTION h_RemoveDuplicateCPT()
RETURNS void AS
$BODY$
BEGIN
LOOP
BEGIN

DELETE FROM cpt
WHERE recid IN (
SELECT recid
FROM (
SELECT recid,
row_number() over (partition BY cdesc ORDER BY recid) AS rnum
FROM cpt) t
WHERE t.rnum > 1)
RETURNING recid;

IF count = 0 THEN
RETURN;
END IF;

EXCEPTION WHEN foreign_key_violation THEN
RAISE NOTICE 'fixing unique_violation';
RAISE NOTICE 'recid is %' , recid;
END;
END LOOP;
END;
$BODY$
LANGUAGE plpgsql VOLATILE;









share|improve this question
















I'm sure this is common place, but Google is not helping. I am trying to write a simple stored procedure in PostgreSQL 9.1 that will remove duplicate entries from a parent cpt table. The parent table cpt is referenced by a child table lab defined as:



CREATE TABLE lab (
recid serial NOT NULL,
cpt_recid integer,
........
CONSTRAINT cs_cpt FOREIGN KEY (cpt_recid)
REFERENCES cpt (recid) MATCH SIMPLE
ON UPDATE NO ACTION ON DELETE RESTRICT,
...
);


The biggest problem I'm having is how to obtain the record which failed so that I can use it in the EXCEPTION clause to move the children rows from lab to one acceptable key, then loop back through and delete the unnecessary records from the cpt table.



Here is the (very wrong) code:



CREATE OR REPLACE FUNCTION h_RemoveDuplicateCPT()
RETURNS void AS
$BODY$
BEGIN
LOOP
BEGIN

DELETE FROM cpt
WHERE recid IN (
SELECT recid
FROM (
SELECT recid,
row_number() over (partition BY cdesc ORDER BY recid) AS rnum
FROM cpt) t
WHERE t.rnum > 1)
RETURNING recid;

IF count = 0 THEN
RETURN;
END IF;

EXCEPTION WHEN foreign_key_violation THEN
RAISE NOTICE 'fixing unique_violation';
RAISE NOTICE 'recid is %' , recid;
END;
END LOOP;
END;
$BODY$
LANGUAGE plpgsql VOLATILE;






database postgresql exception-handling foreign-keys plpgsql






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Jun 22 '15 at 1:46









Erwin Brandstetter

356k71642823




356k71642823










asked Jun 20 '15 at 20:34









Alan WayneAlan Wayne

1,68822652




1,68822652













  • Have you tried hidden column ctid ? It is helpful with deleting duplicates even when all visible columns are the same.

    – Tomasz Myrta
    Jun 20 '15 at 21:08



















  • Have you tried hidden column ctid ? It is helpful with deleting duplicates even when all visible columns are the same.

    – Tomasz Myrta
    Jun 20 '15 at 21:08

















Have you tried hidden column ctid ? It is helpful with deleting duplicates even when all visible columns are the same.

– Tomasz Myrta
Jun 20 '15 at 21:08





Have you tried hidden column ctid ? It is helpful with deleting duplicates even when all visible columns are the same.

– Tomasz Myrta
Jun 20 '15 at 21:08












2 Answers
2






active

oldest

votes


















6














You can do this much more efficiently with a single SQL statement with data-modifying CTEs.



WITH plan AS (
SELECT *
FROM (
SELECT recid, min(recid) OVER (PARTITION BY cdesc) AS master_recid
FROM cpt
) sub
WHERE recid <> master_recid -- ... <> self
)
, upd_lab AS (
UPDATE lab l
SET cpt_recid = p.master_recid -- link to master recid ...
FROM plan p
WHERE l.cpt_recid = p.recid
)
DELETE FROM cpt c
USING plan p
WHERE c.recid = p.recid
RETURNING c.recid;


db<>fiddle here (pg 11)
SQL Fiddle (pg 9.6)



This should be much faster and cleaner. Looping is comparatively expensive, exception handling is comparatively even more expensive.

More importantly, references in lab are redirected to the respective master row in cpt automatically, which wasn't in your original code, yet. So you can delete all dupes at once.



You can still wrap this in a plpgsql or SQL function if you like.



Explanation




  1. In the 1st CTE plan, identify a master row in each partition with the same cdesc. In your case the row with the minimum recid.


  2. In the 2nd CTE upd_lab redirect all rows referencing a dupe to the master row in cpt.


  3. Finally, delete dupes, which is not going to raise exceptions because depending rows are being linked to the remaining master row virtually at the same time.



ON DELETE RESTRICT



All CTEs and the main query of a statement operate on the same snapshot of underlying tables, virtually concurrently. They don't see each others' effects on underlying tables:




  • Delete parent if it's not referenced by any other child


One might expect a FK constraint with ON DELETE RESTRICT to raise exceptions because, [per documentation][3]:




Referential actions other than the NO ACTION check cannot be deferred,
even if the constraint is declared deferrable.




However, the above statement is a single command and, [the manual again][3]:




A constraint that is not deferrable will be checked immediately after
every command.




Bold emphasis mine. Works for the less restrictive default ON DELETE NO ACTION too, of course.



But be wary of concurrent transactions writing to the same tables, but that's a general consideration, not specific to this task.



An exception applies for UNIQUE and PRIMARY KEY constraint, but that does not concern this case:




  • Constraint defined DEFERRABLE INITIALLY IMMEDIATE is still DEFERRED?






share|improve this answer


























  • Wow...Very impressive and much appreciated. Thanks.

    – Alan Wayne
    Jun 23 '15 at 4:37











  • How (when) does the upd_lab get executed?

    – Alan Wayne
    Jun 27 '15 at 17:11











  • @AlanWayne: CTE plan is executed first, because the other two queries depend on it. The other two are executed in arbitrary order. But all queries see the same snapshot of the underlying tables. Hence "virtually at the same time".

    – Erwin Brandstetter
    Jun 27 '15 at 18:24



















1














You can select all duplicates once and loop over the result with a record variable.
You'll have access to whole current record. The function below may serve as an example:



create or replace function show_remove_duplicates_in_cpt ()
returns setof text language plpgsql
as $$
declare
rec record;
begin
for rec in
select * from (
select
recid, cdesc,
row_number() over (partition by cdesc order by recid) as rnum
from cpt
) alias
where rnum > 1
loop
return next format ('fixing foreign key for %s %s %s', rec.recid, rec.cdesc, rec.rnum);
return next format ('deleting from cpt where recid = %s', rec.recid);
end loop;
end $$;

select * from show_remove_duplicates_in_cpt ();





share|improve this answer
























    Your Answer






    StackExchange.ifUsing("editor", function () {
    StackExchange.using("externalEditor", function () {
    StackExchange.using("snippets", function () {
    StackExchange.snippets.init();
    });
    });
    }, "code-snippets");

    StackExchange.ready(function() {
    var channelOptions = {
    tags: "".split(" "),
    id: "1"
    };
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function() {
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled) {
    StackExchange.using("snippets", function() {
    createEditor();
    });
    }
    else {
    createEditor();
    }
    });

    function createEditor() {
    StackExchange.prepareEditor({
    heartbeatType: 'answer',
    autoActivateHeartbeat: false,
    convertImagesToLinks: true,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: 10,
    bindNavPrevention: true,
    postfix: "",
    imageUploader: {
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    },
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    });


    }
    });














    draft saved

    draft discarded


















    StackExchange.ready(
    function () {
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f30958622%2fhow-to-remove-duplicate-rows-with-foreign-keys-dependencies%23new-answer', 'question_page');
    }
    );

    Post as a guest















    Required, but never shown

























    2 Answers
    2






    active

    oldest

    votes








    2 Answers
    2






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes









    6














    You can do this much more efficiently with a single SQL statement with data-modifying CTEs.



    WITH plan AS (
    SELECT *
    FROM (
    SELECT recid, min(recid) OVER (PARTITION BY cdesc) AS master_recid
    FROM cpt
    ) sub
    WHERE recid <> master_recid -- ... <> self
    )
    , upd_lab AS (
    UPDATE lab l
    SET cpt_recid = p.master_recid -- link to master recid ...
    FROM plan p
    WHERE l.cpt_recid = p.recid
    )
    DELETE FROM cpt c
    USING plan p
    WHERE c.recid = p.recid
    RETURNING c.recid;


    db<>fiddle here (pg 11)
    SQL Fiddle (pg 9.6)



    This should be much faster and cleaner. Looping is comparatively expensive, exception handling is comparatively even more expensive.

    More importantly, references in lab are redirected to the respective master row in cpt automatically, which wasn't in your original code, yet. So you can delete all dupes at once.



    You can still wrap this in a plpgsql or SQL function if you like.



    Explanation




    1. In the 1st CTE plan, identify a master row in each partition with the same cdesc. In your case the row with the minimum recid.


    2. In the 2nd CTE upd_lab redirect all rows referencing a dupe to the master row in cpt.


    3. Finally, delete dupes, which is not going to raise exceptions because depending rows are being linked to the remaining master row virtually at the same time.



    ON DELETE RESTRICT



    All CTEs and the main query of a statement operate on the same snapshot of underlying tables, virtually concurrently. They don't see each others' effects on underlying tables:




    • Delete parent if it's not referenced by any other child


    One might expect a FK constraint with ON DELETE RESTRICT to raise exceptions because, [per documentation][3]:




    Referential actions other than the NO ACTION check cannot be deferred,
    even if the constraint is declared deferrable.




    However, the above statement is a single command and, [the manual again][3]:




    A constraint that is not deferrable will be checked immediately after
    every command.




    Bold emphasis mine. Works for the less restrictive default ON DELETE NO ACTION too, of course.



    But be wary of concurrent transactions writing to the same tables, but that's a general consideration, not specific to this task.



    An exception applies for UNIQUE and PRIMARY KEY constraint, but that does not concern this case:




    • Constraint defined DEFERRABLE INITIALLY IMMEDIATE is still DEFERRED?






    share|improve this answer


























    • Wow...Very impressive and much appreciated. Thanks.

      – Alan Wayne
      Jun 23 '15 at 4:37











    • How (when) does the upd_lab get executed?

      – Alan Wayne
      Jun 27 '15 at 17:11











    • @AlanWayne: CTE plan is executed first, because the other two queries depend on it. The other two are executed in arbitrary order. But all queries see the same snapshot of the underlying tables. Hence "virtually at the same time".

      – Erwin Brandstetter
      Jun 27 '15 at 18:24
















    6














    You can do this much more efficiently with a single SQL statement with data-modifying CTEs.



    WITH plan AS (
    SELECT *
    FROM (
    SELECT recid, min(recid) OVER (PARTITION BY cdesc) AS master_recid
    FROM cpt
    ) sub
    WHERE recid <> master_recid -- ... <> self
    )
    , upd_lab AS (
    UPDATE lab l
    SET cpt_recid = p.master_recid -- link to master recid ...
    FROM plan p
    WHERE l.cpt_recid = p.recid
    )
    DELETE FROM cpt c
    USING plan p
    WHERE c.recid = p.recid
    RETURNING c.recid;


    db<>fiddle here (pg 11)
    SQL Fiddle (pg 9.6)



    This should be much faster and cleaner. Looping is comparatively expensive, exception handling is comparatively even more expensive.

    More importantly, references in lab are redirected to the respective master row in cpt automatically, which wasn't in your original code, yet. So you can delete all dupes at once.



    You can still wrap this in a plpgsql or SQL function if you like.



    Explanation




    1. In the 1st CTE plan, identify a master row in each partition with the same cdesc. In your case the row with the minimum recid.


    2. In the 2nd CTE upd_lab redirect all rows referencing a dupe to the master row in cpt.


    3. Finally, delete dupes, which is not going to raise exceptions because depending rows are being linked to the remaining master row virtually at the same time.



    ON DELETE RESTRICT



    All CTEs and the main query of a statement operate on the same snapshot of underlying tables, virtually concurrently. They don't see each others' effects on underlying tables:




    • Delete parent if it's not referenced by any other child


    One might expect a FK constraint with ON DELETE RESTRICT to raise exceptions because, [per documentation][3]:




    Referential actions other than the NO ACTION check cannot be deferred,
    even if the constraint is declared deferrable.




    However, the above statement is a single command and, [the manual again][3]:




    A constraint that is not deferrable will be checked immediately after
    every command.




    Bold emphasis mine. Works for the less restrictive default ON DELETE NO ACTION too, of course.



    But be wary of concurrent transactions writing to the same tables, but that's a general consideration, not specific to this task.



    An exception applies for UNIQUE and PRIMARY KEY constraint, but that does not concern this case:




    • Constraint defined DEFERRABLE INITIALLY IMMEDIATE is still DEFERRED?






    share|improve this answer


























    • Wow...Very impressive and much appreciated. Thanks.

      – Alan Wayne
      Jun 23 '15 at 4:37











    • How (when) does the upd_lab get executed?

      – Alan Wayne
      Jun 27 '15 at 17:11











    • @AlanWayne: CTE plan is executed first, because the other two queries depend on it. The other two are executed in arbitrary order. But all queries see the same snapshot of the underlying tables. Hence "virtually at the same time".

      – Erwin Brandstetter
      Jun 27 '15 at 18:24














    6












    6








    6







    You can do this much more efficiently with a single SQL statement with data-modifying CTEs.



    WITH plan AS (
    SELECT *
    FROM (
    SELECT recid, min(recid) OVER (PARTITION BY cdesc) AS master_recid
    FROM cpt
    ) sub
    WHERE recid <> master_recid -- ... <> self
    )
    , upd_lab AS (
    UPDATE lab l
    SET cpt_recid = p.master_recid -- link to master recid ...
    FROM plan p
    WHERE l.cpt_recid = p.recid
    )
    DELETE FROM cpt c
    USING plan p
    WHERE c.recid = p.recid
    RETURNING c.recid;


    db<>fiddle here (pg 11)
    SQL Fiddle (pg 9.6)



    This should be much faster and cleaner. Looping is comparatively expensive, exception handling is comparatively even more expensive.

    More importantly, references in lab are redirected to the respective master row in cpt automatically, which wasn't in your original code, yet. So you can delete all dupes at once.



    You can still wrap this in a plpgsql or SQL function if you like.



    Explanation




    1. In the 1st CTE plan, identify a master row in each partition with the same cdesc. In your case the row with the minimum recid.


    2. In the 2nd CTE upd_lab redirect all rows referencing a dupe to the master row in cpt.


    3. Finally, delete dupes, which is not going to raise exceptions because depending rows are being linked to the remaining master row virtually at the same time.



    ON DELETE RESTRICT



    All CTEs and the main query of a statement operate on the same snapshot of underlying tables, virtually concurrently. They don't see each others' effects on underlying tables:




    • Delete parent if it's not referenced by any other child


    One might expect a FK constraint with ON DELETE RESTRICT to raise exceptions because, [per documentation][3]:




    Referential actions other than the NO ACTION check cannot be deferred,
    even if the constraint is declared deferrable.




    However, the above statement is a single command and, [the manual again][3]:




    A constraint that is not deferrable will be checked immediately after
    every command.




    Bold emphasis mine. Works for the less restrictive default ON DELETE NO ACTION too, of course.



    But be wary of concurrent transactions writing to the same tables, but that's a general consideration, not specific to this task.



    An exception applies for UNIQUE and PRIMARY KEY constraint, but that does not concern this case:




    • Constraint defined DEFERRABLE INITIALLY IMMEDIATE is still DEFERRED?






    share|improve this answer















    You can do this much more efficiently with a single SQL statement with data-modifying CTEs.



    WITH plan AS (
    SELECT *
    FROM (
    SELECT recid, min(recid) OVER (PARTITION BY cdesc) AS master_recid
    FROM cpt
    ) sub
    WHERE recid <> master_recid -- ... <> self
    )
    , upd_lab AS (
    UPDATE lab l
    SET cpt_recid = p.master_recid -- link to master recid ...
    FROM plan p
    WHERE l.cpt_recid = p.recid
    )
    DELETE FROM cpt c
    USING plan p
    WHERE c.recid = p.recid
    RETURNING c.recid;


    db<>fiddle here (pg 11)
    SQL Fiddle (pg 9.6)



    This should be much faster and cleaner. Looping is comparatively expensive, exception handling is comparatively even more expensive.

    More importantly, references in lab are redirected to the respective master row in cpt automatically, which wasn't in your original code, yet. So you can delete all dupes at once.



    You can still wrap this in a plpgsql or SQL function if you like.



    Explanation




    1. In the 1st CTE plan, identify a master row in each partition with the same cdesc. In your case the row with the minimum recid.


    2. In the 2nd CTE upd_lab redirect all rows referencing a dupe to the master row in cpt.


    3. Finally, delete dupes, which is not going to raise exceptions because depending rows are being linked to the remaining master row virtually at the same time.



    ON DELETE RESTRICT



    All CTEs and the main query of a statement operate on the same snapshot of underlying tables, virtually concurrently. They don't see each others' effects on underlying tables:




    • Delete parent if it's not referenced by any other child


    One might expect a FK constraint with ON DELETE RESTRICT to raise exceptions because, [per documentation][3]:




    Referential actions other than the NO ACTION check cannot be deferred,
    even if the constraint is declared deferrable.




    However, the above statement is a single command and, [the manual again][3]:




    A constraint that is not deferrable will be checked immediately after
    every command.




    Bold emphasis mine. Works for the less restrictive default ON DELETE NO ACTION too, of course.



    But be wary of concurrent transactions writing to the same tables, but that's a general consideration, not specific to this task.



    An exception applies for UNIQUE and PRIMARY KEY constraint, but that does not concern this case:




    • Constraint defined DEFERRABLE INITIALLY IMMEDIATE is still DEFERRED?







    share|improve this answer














    share|improve this answer



    share|improve this answer








    edited Nov 22 '18 at 23:44

























    answered Jun 22 '15 at 1:44









    Erwin BrandstetterErwin Brandstetter

    356k71642823




    356k71642823













    • Wow...Very impressive and much appreciated. Thanks.

      – Alan Wayne
      Jun 23 '15 at 4:37











    • How (when) does the upd_lab get executed?

      – Alan Wayne
      Jun 27 '15 at 17:11











    • @AlanWayne: CTE plan is executed first, because the other two queries depend on it. The other two are executed in arbitrary order. But all queries see the same snapshot of the underlying tables. Hence "virtually at the same time".

      – Erwin Brandstetter
      Jun 27 '15 at 18:24



















    • Wow...Very impressive and much appreciated. Thanks.

      – Alan Wayne
      Jun 23 '15 at 4:37











    • How (when) does the upd_lab get executed?

      – Alan Wayne
      Jun 27 '15 at 17:11











    • @AlanWayne: CTE plan is executed first, because the other two queries depend on it. The other two are executed in arbitrary order. But all queries see the same snapshot of the underlying tables. Hence "virtually at the same time".

      – Erwin Brandstetter
      Jun 27 '15 at 18:24

















    Wow...Very impressive and much appreciated. Thanks.

    – Alan Wayne
    Jun 23 '15 at 4:37





    Wow...Very impressive and much appreciated. Thanks.

    – Alan Wayne
    Jun 23 '15 at 4:37













    How (when) does the upd_lab get executed?

    – Alan Wayne
    Jun 27 '15 at 17:11





    How (when) does the upd_lab get executed?

    – Alan Wayne
    Jun 27 '15 at 17:11













    @AlanWayne: CTE plan is executed first, because the other two queries depend on it. The other two are executed in arbitrary order. But all queries see the same snapshot of the underlying tables. Hence "virtually at the same time".

    – Erwin Brandstetter
    Jun 27 '15 at 18:24





    @AlanWayne: CTE plan is executed first, because the other two queries depend on it. The other two are executed in arbitrary order. But all queries see the same snapshot of the underlying tables. Hence "virtually at the same time".

    – Erwin Brandstetter
    Jun 27 '15 at 18:24













    1














    You can select all duplicates once and loop over the result with a record variable.
    You'll have access to whole current record. The function below may serve as an example:



    create or replace function show_remove_duplicates_in_cpt ()
    returns setof text language plpgsql
    as $$
    declare
    rec record;
    begin
    for rec in
    select * from (
    select
    recid, cdesc,
    row_number() over (partition by cdesc order by recid) as rnum
    from cpt
    ) alias
    where rnum > 1
    loop
    return next format ('fixing foreign key for %s %s %s', rec.recid, rec.cdesc, rec.rnum);
    return next format ('deleting from cpt where recid = %s', rec.recid);
    end loop;
    end $$;

    select * from show_remove_duplicates_in_cpt ();





    share|improve this answer




























      1














      You can select all duplicates once and loop over the result with a record variable.
      You'll have access to whole current record. The function below may serve as an example:



      create or replace function show_remove_duplicates_in_cpt ()
      returns setof text language plpgsql
      as $$
      declare
      rec record;
      begin
      for rec in
      select * from (
      select
      recid, cdesc,
      row_number() over (partition by cdesc order by recid) as rnum
      from cpt
      ) alias
      where rnum > 1
      loop
      return next format ('fixing foreign key for %s %s %s', rec.recid, rec.cdesc, rec.rnum);
      return next format ('deleting from cpt where recid = %s', rec.recid);
      end loop;
      end $$;

      select * from show_remove_duplicates_in_cpt ();





      share|improve this answer


























        1












        1








        1







        You can select all duplicates once and loop over the result with a record variable.
        You'll have access to whole current record. The function below may serve as an example:



        create or replace function show_remove_duplicates_in_cpt ()
        returns setof text language plpgsql
        as $$
        declare
        rec record;
        begin
        for rec in
        select * from (
        select
        recid, cdesc,
        row_number() over (partition by cdesc order by recid) as rnum
        from cpt
        ) alias
        where rnum > 1
        loop
        return next format ('fixing foreign key for %s %s %s', rec.recid, rec.cdesc, rec.rnum);
        return next format ('deleting from cpt where recid = %s', rec.recid);
        end loop;
        end $$;

        select * from show_remove_duplicates_in_cpt ();





        share|improve this answer













        You can select all duplicates once and loop over the result with a record variable.
        You'll have access to whole current record. The function below may serve as an example:



        create or replace function show_remove_duplicates_in_cpt ()
        returns setof text language plpgsql
        as $$
        declare
        rec record;
        begin
        for rec in
        select * from (
        select
        recid, cdesc,
        row_number() over (partition by cdesc order by recid) as rnum
        from cpt
        ) alias
        where rnum > 1
        loop
        return next format ('fixing foreign key for %s %s %s', rec.recid, rec.cdesc, rec.rnum);
        return next format ('deleting from cpt where recid = %s', rec.recid);
        end loop;
        end $$;

        select * from show_remove_duplicates_in_cpt ();






        share|improve this answer












        share|improve this answer



        share|improve this answer










        answered Jun 20 '15 at 21:13









        klinklin

        61.1k65991




        61.1k65991






























            draft saved

            draft discarded




















































            Thanks for contributing an answer to Stack Overflow!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid



            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.


            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f30958622%2fhow-to-remove-duplicate-rows-with-foreign-keys-dependencies%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            Popular posts from this blog

            How to change which sound is reproduced for terminal bell?

            Can I use Tabulator js library in my java Spring + Thymeleaf project?

            Title Spacing in Bjornstrup Chapter, Removing Chapter Number From Contents