PostgreSQL aggregate union, intersection and set differences

-1

I have a table of pairs to aggregate as follows:

+---------+----------+

| left_id | right_id |

+---------+----------+

| a       | b        |

+---------+----------+

| a       | c        |

+---------+----------+

And a table of values as so:

+----+-------+

| id | value |

+----+-------+

| a  | 1     |

+----+-------+

| a  | 2     |

+----+-------+

| a  | 3     |

+----+-------+

| b  | 1     |

+----+-------+

| b  | 4     |

+----+-------+

| b  | 5     |

+----+-------+

| c  | 1     |

+----+-------+

| c  | 2     |

+----+-------+

| c  | 3     |

+----+-------+

| c  | 4     |

+----+-------+

For each pair, I would like to calculate the length of the union, intersection and set differences (each way) comparing the values, so that the output would look like this:

+---------+----------+-------+--------------+-----------+------------+

| left_id | right_id | union | intersection | left_diff | right_diff |

+---------+----------+-------+--------------+-----------+------------+

| a       | b        | 5     | 1            | 2         | 2          |

+---------+----------+-------+--------------+-----------+------------+

| a       | c        | 4     | 3            | 0         | 1          |

+---------+----------+-------+--------------+-----------+------------+

What would be the best way to approach this using PostgreSQL?

UPDATE: here is a rextester link with data https://rextester.com/RWID9864

edited Nov 20 '18 at 12:02

asked Nov 19 '18 at 21:19

Floris

18110

I don't understand what the "pairwise union" is supposed to be. Can you explain the logic what that is 5 for a,b?

– a_horse_with_no_name
Nov 19 '18 at 21:43

The number of unique values if you combine all the values from a (1,2,3) and b (1,4,5). Union = (1,2,3,4,5) = 5 values. Intersection is (1) = 1 value, left diff (2,3) = 2 values, right diff (4,5) = 2 values.

– Floris
Nov 19 '18 at 21:45

add a comment |

-1

I have a table of pairs to aggregate as follows:

+---------+----------+

| left_id | right_id |

+---------+----------+

| a       | b        |

+---------+----------+

| a       | c        |

+---------+----------+

And a table of values as so:

+----+-------+

| id | value |

+----+-------+

| a  | 1     |

+----+-------+

| a  | 2     |

+----+-------+

| a  | 3     |

+----+-------+

| b  | 1     |

+----+-------+

| b  | 4     |

+----+-------+

| b  | 5     |

+----+-------+

| c  | 1     |

+----+-------+

| c  | 2     |

+----+-------+

| c  | 3     |

+----+-------+

| c  | 4     |

+----+-------+

For each pair, I would like to calculate the length of the union, intersection and set differences (each way) comparing the values, so that the output would look like this:

+---------+----------+-------+--------------+-----------+------------+

| left_id | right_id | union | intersection | left_diff | right_diff |

+---------+----------+-------+--------------+-----------+------------+

| a       | b        | 5     | 1            | 2         | 2          |

+---------+----------+-------+--------------+-----------+------------+

| a       | c        | 4     | 3            | 0         | 1          |

+---------+----------+-------+--------------+-----------+------------+

What would be the best way to approach this using PostgreSQL?

UPDATE: here is a rextester link with data https://rextester.com/RWID9864

edited Nov 20 '18 at 12:02

asked Nov 19 '18 at 21:19

Floris

18110

I don't understand what the "pairwise union" is supposed to be. Can you explain the logic what that is 5 for a,b?

– a_horse_with_no_name
Nov 19 '18 at 21:43

The number of unique values if you combine all the values from a (1,2,3) and b (1,4,5). Union = (1,2,3,4,5) = 5 values. Intersection is (1) = 1 value, left diff (2,3) = 2 values, right diff (4,5) = 2 values.

– Floris
Nov 19 '18 at 21:45

add a comment |

-1

I have a table of pairs to aggregate as follows:

+---------+----------+

| left_id | right_id |

+---------+----------+

| a       | b        |

+---------+----------+

| a       | c        |

+---------+----------+

And a table of values as so:

+----+-------+

| id | value |

+----+-------+

| a  | 1     |

+----+-------+

| a  | 2     |

+----+-------+

| a  | 3     |

+----+-------+

| b  | 1     |

+----+-------+

| b  | 4     |

+----+-------+

| b  | 5     |

+----+-------+

| c  | 1     |

+----+-------+

| c  | 2     |

+----+-------+

| c  | 3     |

+----+-------+

| c  | 4     |

+----+-------+

For each pair, I would like to calculate the length of the union, intersection and set differences (each way) comparing the values, so that the output would look like this:

+---------+----------+-------+--------------+-----------+------------+

| left_id | right_id | union | intersection | left_diff | right_diff |

+---------+----------+-------+--------------+-----------+------------+

| a       | b        | 5     | 1            | 2         | 2          |

+---------+----------+-------+--------------+-----------+------------+

| a       | c        | 4     | 3            | 0         | 1          |

+---------+----------+-------+--------------+-----------+------------+

What would be the best way to approach this using PostgreSQL?

UPDATE: here is a rextester link with data https://rextester.com/RWID9864

edited Nov 20 '18 at 12:02

asked Nov 19 '18 at 21:19

Floris

18110

I have a table of pairs to aggregate as follows:

+---------+----------+

| left_id | right_id |

+---------+----------+

| a       | b        |

+---------+----------+

| a       | c        |

+---------+----------+

And a table of values as so:

+----+-------+

| id | value |

+----+-------+

| a  | 1     |

+----+-------+

| a  | 2     |

+----+-------+

| a  | 3     |

+----+-------+

| b  | 1     |

+----+-------+

| b  | 4     |

+----+-------+

| b  | 5     |

+----+-------+

| c  | 1     |

+----+-------+

| c  | 2     |

+----+-------+

| c  | 3     |

+----+-------+

| c  | 4     |

+----+-------+

For each pair, I would like to calculate the length of the union, intersection and set differences (each way) comparing the values, so that the output would look like this:

+---------+----------+-------+--------------+-----------+------------+

| left_id | right_id | union | intersection | left_diff | right_diff |

+---------+----------+-------+--------------+-----------+------------+

| a       | b        | 5     | 1            | 2         | 2          |

+---------+----------+-------+--------------+-----------+------------+

| a       | c        | 4     | 3            | 0         | 1          |

+---------+----------+-------+--------------+-----------+------------+

What would be the best way to approach this using PostgreSQL?

UPDATE: here is a rextester link with data https://rextester.com/RWID9864

sql postgresql

edited Nov 20 '18 at 12:02

asked Nov 19 '18 at 21:19

Floris

18110

edited Nov 20 '18 at 12:02

asked Nov 19 '18 at 21:19

Floris

18110

edited Nov 20 '18 at 12:02

asked Nov 19 '18 at 21:19

Floris

18110

asked Nov 19 '18 at 21:19

Floris

18110

asked Nov 19 '18 at 21:19

Floris

18110

I don't understand what the "pairwise union" is supposed to be. Can you explain the logic what that is 5 for a,b?

– a_horse_with_no_name
Nov 19 '18 at 21:43

The number of unique values if you combine all the values from a (1,2,3) and b (1,4,5). Union = (1,2,3,4,5) = 5 values. Intersection is (1) = 1 value, left diff (2,3) = 2 values, right diff (4,5) = 2 values.

– Floris
Nov 19 '18 at 21:45

add a comment |

I don't understand what the "pairwise union" is supposed to be. Can you explain the logic what that is 5 for a,b?

– a_horse_with_no_name
Nov 19 '18 at 21:43

The number of unique values if you combine all the values from a (1,2,3) and b (1,4,5). Union = (1,2,3,4,5) = 5 values. Intersection is (1) = 1 value, left diff (2,3) = 2 values, right diff (4,5) = 2 values.

– Floris
Nov 19 '18 at 21:45

I don't understand what the "pairwise union" is supposed to be. Can you explain the logic what that is 5 for a,b?

– a_horse_with_no_name
Nov 19 '18 at 21:43

The number of unique values if you combine all the values from a (1,2,3) and b (1,4,5). Union = (1,2,3,4,5) = 5 values. Intersection is (1) = 1 value, left diff (2,3) = 2 values, right diff (4,5) = 2 values.

– Floris
Nov 19 '18 at 21:45

add a comment |

2 Answers
2

active

oldest

votes

You need scalar sub-queries that do that.

The UNION can also be expressed by an OR which makes that query somewhat shorter to write. But for the intersection you need a query that is a bit longer.

To calculate the "diff", use the except operator:

SELECT p.*, 

       (select count(distinct value) from values where id in (p.left_id, p.right_id)) as "union",

       (select count(*)

        from (

          select v.value from values v where id = p.left_id

          intersect

          select v.value from values v where id = p.right_id

        ) t) as intersection,

       (select count(*)

        from (

          select v.value from values v where id = p.left_id

          except

          select v.value from values v where id = p.right_id

        ) t) as left_diff,

       (select count(*)

        from (

          select v.value from values v where id = p.right_id

          except

          select v.value from values v where id = p.left_id

        ) t) as right_diff

from pairs p

edited Nov 19 '18 at 22:02

answered Nov 19 '18 at 21:54

a_horse_with_no_name

296k46451546

Thank you for introducing me to "scalar subqueries". I had no idea there was such a thing. Another user posted a very similar solution a minute ago, however he/she noted that it could be quite inefficient. Unfortunately now deleted (which I really wished didn't happen as I was writing a response). Is there any way to make such a query more efficient? I suppose I could save the resulting table as a materialized view perhaps?

– Floris
Nov 19 '18 at 22:14

add a comment |

I don't know what causes your slowness, as I cannot see table sizes and/or explain plans. Presuming both tables are large enough to make nested loops inefficient and to not dare thinking about joining values to itself, I'd try to rewrite it free from scalar subqueries like this:

select p.*,

       coalesce(stats."union", 0) "union",

       coalesce(stats.intersection, 0) intersection,

       coalesce(stats.left_cnt - stats.intersection, 0) left_diff,

       coalesce(stats.right_cnt - stats.intersection, 0) right_diff

from pairs p

left join (

       select left_id,

              right_id,

              count(*) "union",

              count(has_left and has_right) intersection,

              count(has_left) left_cnt,

              count(has_right) right_cnt

       from (

              select p.*,

                     v."value" the_value,

                     true has_left

              from pairs p

              join "values" v on v.id = p.left_id

       ) l

       full join (

              select p.*,

                     v."value" the_value,

                     true has_right

              from pairs p

              join "values" v on v.id = p.right_id

       ) r using(left_id, right_id, the_value)

       group by left_id,

                right_id

) stats on p.left_id = stats.left_id

       and p.right_id = stats.right_id;

Each join condition here allows hash and/or merge join, so the planner will have a chance to avoid nested loops.

edited Nov 20 '18 at 12:04

a_horse_with_no_name

296k46451546

answered Nov 19 '18 at 23:06

Alexey Bashtanov

40945

Very cool. Thank you @alexey-bashtanov and @a_horse_with_no_name both for your answers. I learned a lot from both of these. In the end, this solution performs slightly faster than "scalar subquery", completing in 1m17s versus 1m36s. It's a bit of a mess to show the query plans for my dataset because the pairs table is a view aggregating data from multiple tables (376 rows) so the plan is huge. The values table has 2,9 million rows btw. Too bad my question got downvoted.

– Floris
Nov 19 '18 at 23:52

1

in case pairs is such a small thing and aggregated from somewhere, maybe it would make sense to have a CTE for it, for it to be calculated only once. For plan optimization purposes, just create a temp table with the view contents and ANALYZE it, then use it instead of the view.

– Alexey Bashtanov
Nov 20 '18 at 0:00

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53382788%2fpostgresql-aggregate-union-intersection-and-set-differences%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

2 Answers
2

active

oldest

votes

2 Answers
2

active

oldest

votes

You need scalar sub-queries that do that.

The UNION can also be expressed by an OR which makes that query somewhat shorter to write. But for the intersection you need a query that is a bit longer.

To calculate the "diff", use the except operator:

SELECT p.*, 

       (select count(distinct value) from values where id in (p.left_id, p.right_id)) as "union",

       (select count(*)

        from (

          select v.value from values v where id = p.left_id

          intersect

          select v.value from values v where id = p.right_id

        ) t) as intersection,

       (select count(*)

        from (

          select v.value from values v where id = p.left_id

          except

          select v.value from values v where id = p.right_id

        ) t) as left_diff,

       (select count(*)

        from (

          select v.value from values v where id = p.right_id

          except

          select v.value from values v where id = p.left_id

        ) t) as right_diff

from pairs p

edited Nov 19 '18 at 22:02

answered Nov 19 '18 at 21:54

a_horse_with_no_name

296k46451546

Thank you for introducing me to "scalar subqueries". I had no idea there was such a thing. Another user posted a very similar solution a minute ago, however he/she noted that it could be quite inefficient. Unfortunately now deleted (which I really wished didn't happen as I was writing a response). Is there any way to make such a query more efficient? I suppose I could save the resulting table as a materialized view perhaps?

– Floris
Nov 19 '18 at 22:14

add a comment |

You need scalar sub-queries that do that.

The UNION can also be expressed by an OR which makes that query somewhat shorter to write. But for the intersection you need a query that is a bit longer.

To calculate the "diff", use the except operator:

SELECT p.*, 

       (select count(distinct value) from values where id in (p.left_id, p.right_id)) as "union",

       (select count(*)

        from (

          select v.value from values v where id = p.left_id

          intersect

          select v.value from values v where id = p.right_id

        ) t) as intersection,

       (select count(*)

        from (

          select v.value from values v where id = p.left_id

          except

          select v.value from values v where id = p.right_id

        ) t) as left_diff,

       (select count(*)

        from (

          select v.value from values v where id = p.right_id

          except

          select v.value from values v where id = p.left_id

        ) t) as right_diff

from pairs p

edited Nov 19 '18 at 22:02

answered Nov 19 '18 at 21:54

a_horse_with_no_name

296k46451546

Thank you for introducing me to "scalar subqueries". I had no idea there was such a thing. Another user posted a very similar solution a minute ago, however he/she noted that it could be quite inefficient. Unfortunately now deleted (which I really wished didn't happen as I was writing a response). Is there any way to make such a query more efficient? I suppose I could save the resulting table as a materialized view perhaps?

– Floris
Nov 19 '18 at 22:14

add a comment |

You need scalar sub-queries that do that.

The UNION can also be expressed by an OR which makes that query somewhat shorter to write. But for the intersection you need a query that is a bit longer.

To calculate the "diff", use the except operator:

SELECT p.*, 

       (select count(distinct value) from values where id in (p.left_id, p.right_id)) as "union",

       (select count(*)

        from (

          select v.value from values v where id = p.left_id

          intersect

          select v.value from values v where id = p.right_id

        ) t) as intersection,

       (select count(*)

        from (

          select v.value from values v where id = p.left_id

          except

          select v.value from values v where id = p.right_id

        ) t) as left_diff,

       (select count(*)

        from (

          select v.value from values v where id = p.right_id

          except

          select v.value from values v where id = p.left_id

        ) t) as right_diff

from pairs p

edited Nov 19 '18 at 22:02

answered Nov 19 '18 at 21:54

a_horse_with_no_name

296k46451546

You need scalar sub-queries that do that.

The UNION can also be expressed by an OR which makes that query somewhat shorter to write. But for the intersection you need a query that is a bit longer.

To calculate the "diff", use the except operator:

SELECT p.*, 

       (select count(distinct value) from values where id in (p.left_id, p.right_id)) as "union",

       (select count(*)

        from (

          select v.value from values v where id = p.left_id

          intersect

          select v.value from values v where id = p.right_id

        ) t) as intersection,

       (select count(*)

        from (

          select v.value from values v where id = p.left_id

          except

          select v.value from values v where id = p.right_id

        ) t) as left_diff,

       (select count(*)

        from (

          select v.value from values v where id = p.right_id

          except

          select v.value from values v where id = p.left_id

        ) t) as right_diff

from pairs p

edited Nov 19 '18 at 22:02

answered Nov 19 '18 at 21:54

a_horse_with_no_name

296k46451546

edited Nov 19 '18 at 22:02

answered Nov 19 '18 at 21:54

a_horse_with_no_name

296k46451546

answered Nov 19 '18 at 21:54

a_horse_with_no_name

296k46451546

answered Nov 19 '18 at 21:54

a_horse_with_no_name

296k46451546

Thank you for introducing me to "scalar subqueries". I had no idea there was such a thing. Another user posted a very similar solution a minute ago, however he/she noted that it could be quite inefficient. Unfortunately now deleted (which I really wished didn't happen as I was writing a response). Is there any way to make such a query more efficient? I suppose I could save the resulting table as a materialized view perhaps?

– Floris
Nov 19 '18 at 22:14

add a comment |

Thank you for introducing me to "scalar subqueries". I had no idea there was such a thing. Another user posted a very similar solution a minute ago, however he/she noted that it could be quite inefficient. Unfortunately now deleted (which I really wished didn't happen as I was writing a response). Is there any way to make such a query more efficient? I suppose I could save the resulting table as a materialized view perhaps?

– Floris
Nov 19 '18 at 22:14

Thank you for introducing me to "scalar subqueries". I had no idea there was such a thing. Another user posted a very similar solution a minute ago, however he/she noted that it could be quite inefficient. Unfortunately now deleted (which I really wished didn't happen as I was writing a response). Is there any way to make such a query more efficient? I suppose I could save the resulting table as a materialized view perhaps?

– Floris
Nov 19 '18 at 22:14

add a comment |

select p.*,

       coalesce(stats."union", 0) "union",

       coalesce(stats.intersection, 0) intersection,

       coalesce(stats.left_cnt - stats.intersection, 0) left_diff,

       coalesce(stats.right_cnt - stats.intersection, 0) right_diff

from pairs p

left join (

       select left_id,

              right_id,

              count(*) "union",

              count(has_left and has_right) intersection,

              count(has_left) left_cnt,

              count(has_right) right_cnt

       from (

              select p.*,

                     v."value" the_value,

                     true has_left

              from pairs p

              join "values" v on v.id = p.left_id

       ) l

       full join (

              select p.*,

                     v."value" the_value,

                     true has_right

              from pairs p

              join "values" v on v.id = p.right_id

       ) r using(left_id, right_id, the_value)

       group by left_id,

                right_id

) stats on p.left_id = stats.left_id

       and p.right_id = stats.right_id;

Each join condition here allows hash and/or merge join, so the planner will have a chance to avoid nested loops.

edited Nov 20 '18 at 12:04

a_horse_with_no_name

296k46451546

answered Nov 19 '18 at 23:06

Alexey Bashtanov

40945

Very cool. Thank you @alexey-bashtanov and @a_horse_with_no_name both for your answers. I learned a lot from both of these. In the end, this solution performs slightly faster than "scalar subquery", completing in 1m17s versus 1m36s. It's a bit of a mess to show the query plans for my dataset because the pairs table is a view aggregating data from multiple tables (376 rows) so the plan is huge. The values table has 2,9 million rows btw. Too bad my question got downvoted.

– Floris
Nov 19 '18 at 23:52

1

in case pairs is such a small thing and aggregated from somewhere, maybe it would make sense to have a CTE for it, for it to be calculated only once. For plan optimization purposes, just create a temp table with the view contents and ANALYZE it, then use it instead of the view.

– Alexey Bashtanov
Nov 20 '18 at 0:00

add a comment |

select p.*,

       coalesce(stats."union", 0) "union",

       coalesce(stats.intersection, 0) intersection,

       coalesce(stats.left_cnt - stats.intersection, 0) left_diff,

       coalesce(stats.right_cnt - stats.intersection, 0) right_diff

from pairs p

left join (

       select left_id,

              right_id,

              count(*) "union",

              count(has_left and has_right) intersection,

              count(has_left) left_cnt,

              count(has_right) right_cnt

       from (

              select p.*,

                     v."value" the_value,

                     true has_left

              from pairs p

              join "values" v on v.id = p.left_id

       ) l

       full join (

              select p.*,

                     v."value" the_value,

                     true has_right

              from pairs p

              join "values" v on v.id = p.right_id

       ) r using(left_id, right_id, the_value)

       group by left_id,

                right_id

) stats on p.left_id = stats.left_id

       and p.right_id = stats.right_id;

Each join condition here allows hash and/or merge join, so the planner will have a chance to avoid nested loops.

edited Nov 20 '18 at 12:04

a_horse_with_no_name

296k46451546

answered Nov 19 '18 at 23:06

Alexey Bashtanov

40945

Very cool. Thank you @alexey-bashtanov and @a_horse_with_no_name both for your answers. I learned a lot from both of these. In the end, this solution performs slightly faster than "scalar subquery", completing in 1m17s versus 1m36s. It's a bit of a mess to show the query plans for my dataset because the pairs table is a view aggregating data from multiple tables (376 rows) so the plan is huge. The values table has 2,9 million rows btw. Too bad my question got downvoted.

– Floris
Nov 19 '18 at 23:52

1

in case pairs is such a small thing and aggregated from somewhere, maybe it would make sense to have a CTE for it, for it to be calculated only once. For plan optimization purposes, just create a temp table with the view contents and ANALYZE it, then use it instead of the view.

– Alexey Bashtanov
Nov 20 '18 at 0:00

add a comment |

select p.*,

       coalesce(stats."union", 0) "union",

       coalesce(stats.intersection, 0) intersection,

       coalesce(stats.left_cnt - stats.intersection, 0) left_diff,

       coalesce(stats.right_cnt - stats.intersection, 0) right_diff

from pairs p

left join (

       select left_id,

              right_id,

              count(*) "union",

              count(has_left and has_right) intersection,

              count(has_left) left_cnt,

              count(has_right) right_cnt

       from (

              select p.*,

                     v."value" the_value,

                     true has_left

              from pairs p

              join "values" v on v.id = p.left_id

       ) l

       full join (

              select p.*,

                     v."value" the_value,

                     true has_right

              from pairs p

              join "values" v on v.id = p.right_id

       ) r using(left_id, right_id, the_value)

       group by left_id,

                right_id

) stats on p.left_id = stats.left_id

       and p.right_id = stats.right_id;

Each join condition here allows hash and/or merge join, so the planner will have a chance to avoid nested loops.

edited Nov 20 '18 at 12:04

a_horse_with_no_name

296k46451546

answered Nov 19 '18 at 23:06

Alexey Bashtanov

40945

select p.*,

       coalesce(stats."union", 0) "union",

       coalesce(stats.intersection, 0) intersection,

       coalesce(stats.left_cnt - stats.intersection, 0) left_diff,

       coalesce(stats.right_cnt - stats.intersection, 0) right_diff

from pairs p

left join (

       select left_id,

              right_id,

              count(*) "union",

              count(has_left and has_right) intersection,

              count(has_left) left_cnt,

              count(has_right) right_cnt

       from (

              select p.*,

                     v."value" the_value,

                     true has_left

              from pairs p

              join "values" v on v.id = p.left_id

       ) l

       full join (

              select p.*,

                     v."value" the_value,

                     true has_right

              from pairs p

              join "values" v on v.id = p.right_id

       ) r using(left_id, right_id, the_value)

       group by left_id,

                right_id

) stats on p.left_id = stats.left_id

       and p.right_id = stats.right_id;

Each join condition here allows hash and/or merge join, so the planner will have a chance to avoid nested loops.

edited Nov 20 '18 at 12:04

a_horse_with_no_name

296k46451546

answered Nov 19 '18 at 23:06

Alexey Bashtanov

40945

edited Nov 20 '18 at 12:04

a_horse_with_no_name

296k46451546

edited Nov 20 '18 at 12:04

a_horse_with_no_name

296k46451546

edited Nov 20 '18 at 12:04

a_horse_with_no_name

296k46451546

answered Nov 19 '18 at 23:06

Alexey Bashtanov

40945

answered Nov 19 '18 at 23:06

Alexey Bashtanov

40945

answered Nov 19 '18 at 23:06

Alexey Bashtanov

40945

Very cool. Thank you @alexey-bashtanov and @a_horse_with_no_name both for your answers. I learned a lot from both of these. In the end, this solution performs slightly faster than "scalar subquery", completing in 1m17s versus 1m36s. It's a bit of a mess to show the query plans for my dataset because the pairs table is a view aggregating data from multiple tables (376 rows) so the plan is huge. The values table has 2,9 million rows btw. Too bad my question got downvoted.

– Floris
Nov 19 '18 at 23:52

1

in case pairs is such a small thing and aggregated from somewhere, maybe it would make sense to have a CTE for it, for it to be calculated only once. For plan optimization purposes, just create a temp table with the view contents and ANALYZE it, then use it instead of the view.

– Alexey Bashtanov
Nov 20 '18 at 0:00

add a comment |

Very cool. Thank you @alexey-bashtanov and @a_horse_with_no_name both for your answers. I learned a lot from both of these. In the end, this solution performs slightly faster than "scalar subquery", completing in 1m17s versus 1m36s. It's a bit of a mess to show the query plans for my dataset because the pairs table is a view aggregating data from multiple tables (376 rows) so the plan is huge. The values table has 2,9 million rows btw. Too bad my question got downvoted.

– Floris
Nov 19 '18 at 23:52

1

in case pairs is such a small thing and aggregated from somewhere, maybe it would make sense to have a CTE for it, for it to be calculated only once. For plan optimization purposes, just create a temp table with the view contents and ANALYZE it, then use it instead of the view.

– Alexey Bashtanov
Nov 20 '18 at 0:00

Very cool. Thank you @alexey-bashtanov and @a_horse_with_no_name both for your answers. I learned a lot from both of these. In the end, this solution performs slightly faster than "scalar subquery", completing in 1m17s versus 1m36s. It's a bit of a mess to show the query plans for my dataset because the pairs table is a view aggregating data from multiple tables (376 rows) so the plan is huge. The values table has 2,9 million rows btw. Too bad my question got downvoted.

– Floris
Nov 19 '18 at 23:52

in case pairs is such a small thing and aggregated from somewhere, maybe it would make sense to have a CTE for it, for it to be calculated only once. For plan optimization purposes, just create a temp table with the view contents and ANALYZE it, then use it instead of the view.

– Alexey Bashtanov
Nov 20 '18 at 0:00

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Cfrgtkky