PostgreSQL aggregate union, intersection and set differences












-1















I have a table of pairs to aggregate as follows:



+---------+----------+
| left_id | right_id |
+---------+----------+
| a | b |
+---------+----------+
| a | c |
+---------+----------+


And a table of values as so:



+----+-------+
| id | value |
+----+-------+
| a | 1 |
+----+-------+
| a | 2 |
+----+-------+
| a | 3 |
+----+-------+
| b | 1 |
+----+-------+
| b | 4 |
+----+-------+
| b | 5 |
+----+-------+
| c | 1 |
+----+-------+
| c | 2 |
+----+-------+
| c | 3 |
+----+-------+
| c | 4 |
+----+-------+


For each pair, I would like to calculate the length of the union, intersection and set differences (each way) comparing the values, so that the output would look like this:



+---------+----------+-------+--------------+-----------+------------+
| left_id | right_id | union | intersection | left_diff | right_diff |
+---------+----------+-------+--------------+-----------+------------+
| a | b | 5 | 1 | 2 | 2 |
+---------+----------+-------+--------------+-----------+------------+
| a | c | 4 | 3 | 0 | 1 |
+---------+----------+-------+--------------+-----------+------------+


What would be the best way to approach this using PostgreSQL?



UPDATE: here is a rextester link with data https://rextester.com/RWID9864










share|improve this question

























  • I don't understand what the "pairwise union" is supposed to be. Can you explain the logic what that is 5 for a,b?

    – a_horse_with_no_name
    Nov 19 '18 at 21:43











  • The number of unique values if you combine all the values from a (1,2,3) and b (1,4,5). Union = (1,2,3,4,5) = 5 values. Intersection is (1) = 1 value, left diff (2,3) = 2 values, right diff (4,5) = 2 values.

    – Floris
    Nov 19 '18 at 21:45


















-1















I have a table of pairs to aggregate as follows:



+---------+----------+
| left_id | right_id |
+---------+----------+
| a | b |
+---------+----------+
| a | c |
+---------+----------+


And a table of values as so:



+----+-------+
| id | value |
+----+-------+
| a | 1 |
+----+-------+
| a | 2 |
+----+-------+
| a | 3 |
+----+-------+
| b | 1 |
+----+-------+
| b | 4 |
+----+-------+
| b | 5 |
+----+-------+
| c | 1 |
+----+-------+
| c | 2 |
+----+-------+
| c | 3 |
+----+-------+
| c | 4 |
+----+-------+


For each pair, I would like to calculate the length of the union, intersection and set differences (each way) comparing the values, so that the output would look like this:



+---------+----------+-------+--------------+-----------+------------+
| left_id | right_id | union | intersection | left_diff | right_diff |
+---------+----------+-------+--------------+-----------+------------+
| a | b | 5 | 1 | 2 | 2 |
+---------+----------+-------+--------------+-----------+------------+
| a | c | 4 | 3 | 0 | 1 |
+---------+----------+-------+--------------+-----------+------------+


What would be the best way to approach this using PostgreSQL?



UPDATE: here is a rextester link with data https://rextester.com/RWID9864










share|improve this question

























  • I don't understand what the "pairwise union" is supposed to be. Can you explain the logic what that is 5 for a,b?

    – a_horse_with_no_name
    Nov 19 '18 at 21:43











  • The number of unique values if you combine all the values from a (1,2,3) and b (1,4,5). Union = (1,2,3,4,5) = 5 values. Intersection is (1) = 1 value, left diff (2,3) = 2 values, right diff (4,5) = 2 values.

    – Floris
    Nov 19 '18 at 21:45
















-1












-1








-1








I have a table of pairs to aggregate as follows:



+---------+----------+
| left_id | right_id |
+---------+----------+
| a | b |
+---------+----------+
| a | c |
+---------+----------+


And a table of values as so:



+----+-------+
| id | value |
+----+-------+
| a | 1 |
+----+-------+
| a | 2 |
+----+-------+
| a | 3 |
+----+-------+
| b | 1 |
+----+-------+
| b | 4 |
+----+-------+
| b | 5 |
+----+-------+
| c | 1 |
+----+-------+
| c | 2 |
+----+-------+
| c | 3 |
+----+-------+
| c | 4 |
+----+-------+


For each pair, I would like to calculate the length of the union, intersection and set differences (each way) comparing the values, so that the output would look like this:



+---------+----------+-------+--------------+-----------+------------+
| left_id | right_id | union | intersection | left_diff | right_diff |
+---------+----------+-------+--------------+-----------+------------+
| a | b | 5 | 1 | 2 | 2 |
+---------+----------+-------+--------------+-----------+------------+
| a | c | 4 | 3 | 0 | 1 |
+---------+----------+-------+--------------+-----------+------------+


What would be the best way to approach this using PostgreSQL?



UPDATE: here is a rextester link with data https://rextester.com/RWID9864










share|improve this question
















I have a table of pairs to aggregate as follows:



+---------+----------+
| left_id | right_id |
+---------+----------+
| a | b |
+---------+----------+
| a | c |
+---------+----------+


And a table of values as so:



+----+-------+
| id | value |
+----+-------+
| a | 1 |
+----+-------+
| a | 2 |
+----+-------+
| a | 3 |
+----+-------+
| b | 1 |
+----+-------+
| b | 4 |
+----+-------+
| b | 5 |
+----+-------+
| c | 1 |
+----+-------+
| c | 2 |
+----+-------+
| c | 3 |
+----+-------+
| c | 4 |
+----+-------+


For each pair, I would like to calculate the length of the union, intersection and set differences (each way) comparing the values, so that the output would look like this:



+---------+----------+-------+--------------+-----------+------------+
| left_id | right_id | union | intersection | left_diff | right_diff |
+---------+----------+-------+--------------+-----------+------------+
| a | b | 5 | 1 | 2 | 2 |
+---------+----------+-------+--------------+-----------+------------+
| a | c | 4 | 3 | 0 | 1 |
+---------+----------+-------+--------------+-----------+------------+


What would be the best way to approach this using PostgreSQL?



UPDATE: here is a rextester link with data https://rextester.com/RWID9864







sql postgresql






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Nov 20 '18 at 12:02







Floris

















asked Nov 19 '18 at 21:19









FlorisFloris

18110




18110













  • I don't understand what the "pairwise union" is supposed to be. Can you explain the logic what that is 5 for a,b?

    – a_horse_with_no_name
    Nov 19 '18 at 21:43











  • The number of unique values if you combine all the values from a (1,2,3) and b (1,4,5). Union = (1,2,3,4,5) = 5 values. Intersection is (1) = 1 value, left diff (2,3) = 2 values, right diff (4,5) = 2 values.

    – Floris
    Nov 19 '18 at 21:45





















  • I don't understand what the "pairwise union" is supposed to be. Can you explain the logic what that is 5 for a,b?

    – a_horse_with_no_name
    Nov 19 '18 at 21:43











  • The number of unique values if you combine all the values from a (1,2,3) and b (1,4,5). Union = (1,2,3,4,5) = 5 values. Intersection is (1) = 1 value, left diff (2,3) = 2 values, right diff (4,5) = 2 values.

    – Floris
    Nov 19 '18 at 21:45



















I don't understand what the "pairwise union" is supposed to be. Can you explain the logic what that is 5 for a,b?

– a_horse_with_no_name
Nov 19 '18 at 21:43





I don't understand what the "pairwise union" is supposed to be. Can you explain the logic what that is 5 for a,b?

– a_horse_with_no_name
Nov 19 '18 at 21:43













The number of unique values if you combine all the values from a (1,2,3) and b (1,4,5). Union = (1,2,3,4,5) = 5 values. Intersection is (1) = 1 value, left diff (2,3) = 2 values, right diff (4,5) = 2 values.

– Floris
Nov 19 '18 at 21:45







The number of unique values if you combine all the values from a (1,2,3) and b (1,4,5). Union = (1,2,3,4,5) = 5 values. Intersection is (1) = 1 value, left diff (2,3) = 2 values, right diff (4,5) = 2 values.

– Floris
Nov 19 '18 at 21:45














2 Answers
2






active

oldest

votes


















1














You need scalar sub-queries that do that.



The UNION can also be expressed by an OR which makes that query somewhat shorter to write. But for the intersection you need a query that is a bit longer.



To calculate the "diff", use the except operator:



SELECT p.*, 
(select count(distinct value) from values where id in (p.left_id, p.right_id)) as "union",
(select count(*)
from (
select v.value from values v where id = p.left_id
intersect
select v.value from values v where id = p.right_id
) t) as intersection,
(select count(*)
from (
select v.value from values v where id = p.left_id
except
select v.value from values v where id = p.right_id
) t) as left_diff,
(select count(*)
from (
select v.value from values v where id = p.right_id
except
select v.value from values v where id = p.left_id
) t) as right_diff
from pairs p





share|improve this answer


























  • Thank you for introducing me to "scalar subqueries". I had no idea there was such a thing. Another user posted a very similar solution a minute ago, however he/she noted that it could be quite inefficient. Unfortunately now deleted (which I really wished didn't happen as I was writing a response). Is there any way to make such a query more efficient? I suppose I could save the resulting table as a materialized view perhaps?

    – Floris
    Nov 19 '18 at 22:14





















1














I don't know what causes your slowness, as I cannot see table sizes and/or explain plans. Presuming both tables are large enough to make nested loops inefficient and to not dare thinking about joining values to itself, I'd try to rewrite it free from scalar subqueries like this:



select p.*,
coalesce(stats."union", 0) "union",
coalesce(stats.intersection, 0) intersection,
coalesce(stats.left_cnt - stats.intersection, 0) left_diff,
coalesce(stats.right_cnt - stats.intersection, 0) right_diff
from pairs p
left join (
select left_id,
right_id,
count(*) "union",
count(has_left and has_right) intersection,
count(has_left) left_cnt,
count(has_right) right_cnt
from (
select p.*,
v."value" the_value,
true has_left
from pairs p
join "values" v on v.id = p.left_id
) l
full join (
select p.*,
v."value" the_value,
true has_right
from pairs p
join "values" v on v.id = p.right_id
) r using(left_id, right_id, the_value)
group by left_id,
right_id
) stats on p.left_id = stats.left_id
and p.right_id = stats.right_id;


Each join condition here allows hash and/or merge join, so the planner will have a chance to avoid nested loops.






share|improve this answer


























  • Very cool. Thank you @alexey-bashtanov and @a_horse_with_no_name both for your answers. I learned a lot from both of these. In the end, this solution performs slightly faster than "scalar subquery", completing in 1m17s versus 1m36s. It's a bit of a mess to show the query plans for my dataset because the pairs table is a view aggregating data from multiple tables (376 rows) so the plan is huge. The values table has 2,9 million rows btw. Too bad my question got downvoted.

    – Floris
    Nov 19 '18 at 23:52








  • 1





    in case pairs is such a small thing and aggregated from somewhere, maybe it would make sense to have a CTE for it, for it to be calculated only once. For plan optimization purposes, just create a temp table with the view contents and ANALYZE it, then use it instead of the view.

    – Alexey Bashtanov
    Nov 20 '18 at 0:00











Your Answer






StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});


}
});














draft saved

draft discarded


















StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53382788%2fpostgresql-aggregate-union-intersection-and-set-differences%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown

























2 Answers
2






active

oldest

votes








2 Answers
2






active

oldest

votes









active

oldest

votes






active

oldest

votes









1














You need scalar sub-queries that do that.



The UNION can also be expressed by an OR which makes that query somewhat shorter to write. But for the intersection you need a query that is a bit longer.



To calculate the "diff", use the except operator:



SELECT p.*, 
(select count(distinct value) from values where id in (p.left_id, p.right_id)) as "union",
(select count(*)
from (
select v.value from values v where id = p.left_id
intersect
select v.value from values v where id = p.right_id
) t) as intersection,
(select count(*)
from (
select v.value from values v where id = p.left_id
except
select v.value from values v where id = p.right_id
) t) as left_diff,
(select count(*)
from (
select v.value from values v where id = p.right_id
except
select v.value from values v where id = p.left_id
) t) as right_diff
from pairs p





share|improve this answer


























  • Thank you for introducing me to "scalar subqueries". I had no idea there was such a thing. Another user posted a very similar solution a minute ago, however he/she noted that it could be quite inefficient. Unfortunately now deleted (which I really wished didn't happen as I was writing a response). Is there any way to make such a query more efficient? I suppose I could save the resulting table as a materialized view perhaps?

    – Floris
    Nov 19 '18 at 22:14


















1














You need scalar sub-queries that do that.



The UNION can also be expressed by an OR which makes that query somewhat shorter to write. But for the intersection you need a query that is a bit longer.



To calculate the "diff", use the except operator:



SELECT p.*, 
(select count(distinct value) from values where id in (p.left_id, p.right_id)) as "union",
(select count(*)
from (
select v.value from values v where id = p.left_id
intersect
select v.value from values v where id = p.right_id
) t) as intersection,
(select count(*)
from (
select v.value from values v where id = p.left_id
except
select v.value from values v where id = p.right_id
) t) as left_diff,
(select count(*)
from (
select v.value from values v where id = p.right_id
except
select v.value from values v where id = p.left_id
) t) as right_diff
from pairs p





share|improve this answer


























  • Thank you for introducing me to "scalar subqueries". I had no idea there was such a thing. Another user posted a very similar solution a minute ago, however he/she noted that it could be quite inefficient. Unfortunately now deleted (which I really wished didn't happen as I was writing a response). Is there any way to make such a query more efficient? I suppose I could save the resulting table as a materialized view perhaps?

    – Floris
    Nov 19 '18 at 22:14
















1












1








1







You need scalar sub-queries that do that.



The UNION can also be expressed by an OR which makes that query somewhat shorter to write. But for the intersection you need a query that is a bit longer.



To calculate the "diff", use the except operator:



SELECT p.*, 
(select count(distinct value) from values where id in (p.left_id, p.right_id)) as "union",
(select count(*)
from (
select v.value from values v where id = p.left_id
intersect
select v.value from values v where id = p.right_id
) t) as intersection,
(select count(*)
from (
select v.value from values v where id = p.left_id
except
select v.value from values v where id = p.right_id
) t) as left_diff,
(select count(*)
from (
select v.value from values v where id = p.right_id
except
select v.value from values v where id = p.left_id
) t) as right_diff
from pairs p





share|improve this answer















You need scalar sub-queries that do that.



The UNION can also be expressed by an OR which makes that query somewhat shorter to write. But for the intersection you need a query that is a bit longer.



To calculate the "diff", use the except operator:



SELECT p.*, 
(select count(distinct value) from values where id in (p.left_id, p.right_id)) as "union",
(select count(*)
from (
select v.value from values v where id = p.left_id
intersect
select v.value from values v where id = p.right_id
) t) as intersection,
(select count(*)
from (
select v.value from values v where id = p.left_id
except
select v.value from values v where id = p.right_id
) t) as left_diff,
(select count(*)
from (
select v.value from values v where id = p.right_id
except
select v.value from values v where id = p.left_id
) t) as right_diff
from pairs p






share|improve this answer














share|improve this answer



share|improve this answer








edited Nov 19 '18 at 22:02

























answered Nov 19 '18 at 21:54









a_horse_with_no_namea_horse_with_no_name

296k46451546




296k46451546













  • Thank you for introducing me to "scalar subqueries". I had no idea there was such a thing. Another user posted a very similar solution a minute ago, however he/she noted that it could be quite inefficient. Unfortunately now deleted (which I really wished didn't happen as I was writing a response). Is there any way to make such a query more efficient? I suppose I could save the resulting table as a materialized view perhaps?

    – Floris
    Nov 19 '18 at 22:14





















  • Thank you for introducing me to "scalar subqueries". I had no idea there was such a thing. Another user posted a very similar solution a minute ago, however he/she noted that it could be quite inefficient. Unfortunately now deleted (which I really wished didn't happen as I was writing a response). Is there any way to make such a query more efficient? I suppose I could save the resulting table as a materialized view perhaps?

    – Floris
    Nov 19 '18 at 22:14



















Thank you for introducing me to "scalar subqueries". I had no idea there was such a thing. Another user posted a very similar solution a minute ago, however he/she noted that it could be quite inefficient. Unfortunately now deleted (which I really wished didn't happen as I was writing a response). Is there any way to make such a query more efficient? I suppose I could save the resulting table as a materialized view perhaps?

– Floris
Nov 19 '18 at 22:14







Thank you for introducing me to "scalar subqueries". I had no idea there was such a thing. Another user posted a very similar solution a minute ago, however he/she noted that it could be quite inefficient. Unfortunately now deleted (which I really wished didn't happen as I was writing a response). Is there any way to make such a query more efficient? I suppose I could save the resulting table as a materialized view perhaps?

– Floris
Nov 19 '18 at 22:14















1














I don't know what causes your slowness, as I cannot see table sizes and/or explain plans. Presuming both tables are large enough to make nested loops inefficient and to not dare thinking about joining values to itself, I'd try to rewrite it free from scalar subqueries like this:



select p.*,
coalesce(stats."union", 0) "union",
coalesce(stats.intersection, 0) intersection,
coalesce(stats.left_cnt - stats.intersection, 0) left_diff,
coalesce(stats.right_cnt - stats.intersection, 0) right_diff
from pairs p
left join (
select left_id,
right_id,
count(*) "union",
count(has_left and has_right) intersection,
count(has_left) left_cnt,
count(has_right) right_cnt
from (
select p.*,
v."value" the_value,
true has_left
from pairs p
join "values" v on v.id = p.left_id
) l
full join (
select p.*,
v."value" the_value,
true has_right
from pairs p
join "values" v on v.id = p.right_id
) r using(left_id, right_id, the_value)
group by left_id,
right_id
) stats on p.left_id = stats.left_id
and p.right_id = stats.right_id;


Each join condition here allows hash and/or merge join, so the planner will have a chance to avoid nested loops.






share|improve this answer


























  • Very cool. Thank you @alexey-bashtanov and @a_horse_with_no_name both for your answers. I learned a lot from both of these. In the end, this solution performs slightly faster than "scalar subquery", completing in 1m17s versus 1m36s. It's a bit of a mess to show the query plans for my dataset because the pairs table is a view aggregating data from multiple tables (376 rows) so the plan is huge. The values table has 2,9 million rows btw. Too bad my question got downvoted.

    – Floris
    Nov 19 '18 at 23:52








  • 1





    in case pairs is such a small thing and aggregated from somewhere, maybe it would make sense to have a CTE for it, for it to be calculated only once. For plan optimization purposes, just create a temp table with the view contents and ANALYZE it, then use it instead of the view.

    – Alexey Bashtanov
    Nov 20 '18 at 0:00
















1














I don't know what causes your slowness, as I cannot see table sizes and/or explain plans. Presuming both tables are large enough to make nested loops inefficient and to not dare thinking about joining values to itself, I'd try to rewrite it free from scalar subqueries like this:



select p.*,
coalesce(stats."union", 0) "union",
coalesce(stats.intersection, 0) intersection,
coalesce(stats.left_cnt - stats.intersection, 0) left_diff,
coalesce(stats.right_cnt - stats.intersection, 0) right_diff
from pairs p
left join (
select left_id,
right_id,
count(*) "union",
count(has_left and has_right) intersection,
count(has_left) left_cnt,
count(has_right) right_cnt
from (
select p.*,
v."value" the_value,
true has_left
from pairs p
join "values" v on v.id = p.left_id
) l
full join (
select p.*,
v."value" the_value,
true has_right
from pairs p
join "values" v on v.id = p.right_id
) r using(left_id, right_id, the_value)
group by left_id,
right_id
) stats on p.left_id = stats.left_id
and p.right_id = stats.right_id;


Each join condition here allows hash and/or merge join, so the planner will have a chance to avoid nested loops.






share|improve this answer


























  • Very cool. Thank you @alexey-bashtanov and @a_horse_with_no_name both for your answers. I learned a lot from both of these. In the end, this solution performs slightly faster than "scalar subquery", completing in 1m17s versus 1m36s. It's a bit of a mess to show the query plans for my dataset because the pairs table is a view aggregating data from multiple tables (376 rows) so the plan is huge. The values table has 2,9 million rows btw. Too bad my question got downvoted.

    – Floris
    Nov 19 '18 at 23:52








  • 1





    in case pairs is such a small thing and aggregated from somewhere, maybe it would make sense to have a CTE for it, for it to be calculated only once. For plan optimization purposes, just create a temp table with the view contents and ANALYZE it, then use it instead of the view.

    – Alexey Bashtanov
    Nov 20 '18 at 0:00














1












1








1







I don't know what causes your slowness, as I cannot see table sizes and/or explain plans. Presuming both tables are large enough to make nested loops inefficient and to not dare thinking about joining values to itself, I'd try to rewrite it free from scalar subqueries like this:



select p.*,
coalesce(stats."union", 0) "union",
coalesce(stats.intersection, 0) intersection,
coalesce(stats.left_cnt - stats.intersection, 0) left_diff,
coalesce(stats.right_cnt - stats.intersection, 0) right_diff
from pairs p
left join (
select left_id,
right_id,
count(*) "union",
count(has_left and has_right) intersection,
count(has_left) left_cnt,
count(has_right) right_cnt
from (
select p.*,
v."value" the_value,
true has_left
from pairs p
join "values" v on v.id = p.left_id
) l
full join (
select p.*,
v."value" the_value,
true has_right
from pairs p
join "values" v on v.id = p.right_id
) r using(left_id, right_id, the_value)
group by left_id,
right_id
) stats on p.left_id = stats.left_id
and p.right_id = stats.right_id;


Each join condition here allows hash and/or merge join, so the planner will have a chance to avoid nested loops.






share|improve this answer















I don't know what causes your slowness, as I cannot see table sizes and/or explain plans. Presuming both tables are large enough to make nested loops inefficient and to not dare thinking about joining values to itself, I'd try to rewrite it free from scalar subqueries like this:



select p.*,
coalesce(stats."union", 0) "union",
coalesce(stats.intersection, 0) intersection,
coalesce(stats.left_cnt - stats.intersection, 0) left_diff,
coalesce(stats.right_cnt - stats.intersection, 0) right_diff
from pairs p
left join (
select left_id,
right_id,
count(*) "union",
count(has_left and has_right) intersection,
count(has_left) left_cnt,
count(has_right) right_cnt
from (
select p.*,
v."value" the_value,
true has_left
from pairs p
join "values" v on v.id = p.left_id
) l
full join (
select p.*,
v."value" the_value,
true has_right
from pairs p
join "values" v on v.id = p.right_id
) r using(left_id, right_id, the_value)
group by left_id,
right_id
) stats on p.left_id = stats.left_id
and p.right_id = stats.right_id;


Each join condition here allows hash and/or merge join, so the planner will have a chance to avoid nested loops.







share|improve this answer














share|improve this answer



share|improve this answer








edited Nov 20 '18 at 12:04









a_horse_with_no_name

296k46451546




296k46451546










answered Nov 19 '18 at 23:06









Alexey BashtanovAlexey Bashtanov

40945




40945













  • Very cool. Thank you @alexey-bashtanov and @a_horse_with_no_name both for your answers. I learned a lot from both of these. In the end, this solution performs slightly faster than "scalar subquery", completing in 1m17s versus 1m36s. It's a bit of a mess to show the query plans for my dataset because the pairs table is a view aggregating data from multiple tables (376 rows) so the plan is huge. The values table has 2,9 million rows btw. Too bad my question got downvoted.

    – Floris
    Nov 19 '18 at 23:52








  • 1





    in case pairs is such a small thing and aggregated from somewhere, maybe it would make sense to have a CTE for it, for it to be calculated only once. For plan optimization purposes, just create a temp table with the view contents and ANALYZE it, then use it instead of the view.

    – Alexey Bashtanov
    Nov 20 '18 at 0:00



















  • Very cool. Thank you @alexey-bashtanov and @a_horse_with_no_name both for your answers. I learned a lot from both of these. In the end, this solution performs slightly faster than "scalar subquery", completing in 1m17s versus 1m36s. It's a bit of a mess to show the query plans for my dataset because the pairs table is a view aggregating data from multiple tables (376 rows) so the plan is huge. The values table has 2,9 million rows btw. Too bad my question got downvoted.

    – Floris
    Nov 19 '18 at 23:52








  • 1





    in case pairs is such a small thing and aggregated from somewhere, maybe it would make sense to have a CTE for it, for it to be calculated only once. For plan optimization purposes, just create a temp table with the view contents and ANALYZE it, then use it instead of the view.

    – Alexey Bashtanov
    Nov 20 '18 at 0:00

















Very cool. Thank you @alexey-bashtanov and @a_horse_with_no_name both for your answers. I learned a lot from both of these. In the end, this solution performs slightly faster than "scalar subquery", completing in 1m17s versus 1m36s. It's a bit of a mess to show the query plans for my dataset because the pairs table is a view aggregating data from multiple tables (376 rows) so the plan is huge. The values table has 2,9 million rows btw. Too bad my question got downvoted.

– Floris
Nov 19 '18 at 23:52







Very cool. Thank you @alexey-bashtanov and @a_horse_with_no_name both for your answers. I learned a lot from both of these. In the end, this solution performs slightly faster than "scalar subquery", completing in 1m17s versus 1m36s. It's a bit of a mess to show the query plans for my dataset because the pairs table is a view aggregating data from multiple tables (376 rows) so the plan is huge. The values table has 2,9 million rows btw. Too bad my question got downvoted.

– Floris
Nov 19 '18 at 23:52






1




1





in case pairs is such a small thing and aggregated from somewhere, maybe it would make sense to have a CTE for it, for it to be calculated only once. For plan optimization purposes, just create a temp table with the view contents and ANALYZE it, then use it instead of the view.

– Alexey Bashtanov
Nov 20 '18 at 0:00





in case pairs is such a small thing and aggregated from somewhere, maybe it would make sense to have a CTE for it, for it to be calculated only once. For plan optimization purposes, just create a temp table with the view contents and ANALYZE it, then use it instead of the view.

– Alexey Bashtanov
Nov 20 '18 at 0:00


















draft saved

draft discarded




















































Thanks for contributing an answer to Stack Overflow!


  • Please be sure to answer the question. Provide details and share your research!

But avoid



  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.


To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53382788%2fpostgresql-aggregate-union-intersection-and-set-differences%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

Biblatex bibliography style without URLs when DOI exists (in Overleaf with Zotero bibliography)

ComboBox Display Member on multiple fields

Is it possible to collect Nectar points via Trainline?