PostgreSQL aggregate union, intersection and set differences
I have a table of pairs to aggregate as follows:
+---------+----------+
| left_id | right_id |
+---------+----------+
| a | b |
+---------+----------+
| a | c |
+---------+----------+
And a table of values as so:
+----+-------+
| id | value |
+----+-------+
| a | 1 |
+----+-------+
| a | 2 |
+----+-------+
| a | 3 |
+----+-------+
| b | 1 |
+----+-------+
| b | 4 |
+----+-------+
| b | 5 |
+----+-------+
| c | 1 |
+----+-------+
| c | 2 |
+----+-------+
| c | 3 |
+----+-------+
| c | 4 |
+----+-------+
For each pair, I would like to calculate the length of the union, intersection and set differences (each way) comparing the values, so that the output would look like this:
+---------+----------+-------+--------------+-----------+------------+
| left_id | right_id | union | intersection | left_diff | right_diff |
+---------+----------+-------+--------------+-----------+------------+
| a | b | 5 | 1 | 2 | 2 |
+---------+----------+-------+--------------+-----------+------------+
| a | c | 4 | 3 | 0 | 1 |
+---------+----------+-------+--------------+-----------+------------+
What would be the best way to approach this using PostgreSQL?
UPDATE: here is a rextester link with data https://rextester.com/RWID9864
sql postgresql
add a comment |
I have a table of pairs to aggregate as follows:
+---------+----------+
| left_id | right_id |
+---------+----------+
| a | b |
+---------+----------+
| a | c |
+---------+----------+
And a table of values as so:
+----+-------+
| id | value |
+----+-------+
| a | 1 |
+----+-------+
| a | 2 |
+----+-------+
| a | 3 |
+----+-------+
| b | 1 |
+----+-------+
| b | 4 |
+----+-------+
| b | 5 |
+----+-------+
| c | 1 |
+----+-------+
| c | 2 |
+----+-------+
| c | 3 |
+----+-------+
| c | 4 |
+----+-------+
For each pair, I would like to calculate the length of the union, intersection and set differences (each way) comparing the values, so that the output would look like this:
+---------+----------+-------+--------------+-----------+------------+
| left_id | right_id | union | intersection | left_diff | right_diff |
+---------+----------+-------+--------------+-----------+------------+
| a | b | 5 | 1 | 2 | 2 |
+---------+----------+-------+--------------+-----------+------------+
| a | c | 4 | 3 | 0 | 1 |
+---------+----------+-------+--------------+-----------+------------+
What would be the best way to approach this using PostgreSQL?
UPDATE: here is a rextester link with data https://rextester.com/RWID9864
sql postgresql
I don't understand what the "pairwise union" is supposed to be. Can you explain the logic what that is5
fora,b
?
– a_horse_with_no_name
Nov 19 '18 at 21:43
The number of unique values if you combine all the values from a (1,2,3) and b (1,4,5). Union = (1,2,3,4,5) = 5 values. Intersection is (1) = 1 value, left diff (2,3) = 2 values, right diff (4,5) = 2 values.
– Floris
Nov 19 '18 at 21:45
add a comment |
I have a table of pairs to aggregate as follows:
+---------+----------+
| left_id | right_id |
+---------+----------+
| a | b |
+---------+----------+
| a | c |
+---------+----------+
And a table of values as so:
+----+-------+
| id | value |
+----+-------+
| a | 1 |
+----+-------+
| a | 2 |
+----+-------+
| a | 3 |
+----+-------+
| b | 1 |
+----+-------+
| b | 4 |
+----+-------+
| b | 5 |
+----+-------+
| c | 1 |
+----+-------+
| c | 2 |
+----+-------+
| c | 3 |
+----+-------+
| c | 4 |
+----+-------+
For each pair, I would like to calculate the length of the union, intersection and set differences (each way) comparing the values, so that the output would look like this:
+---------+----------+-------+--------------+-----------+------------+
| left_id | right_id | union | intersection | left_diff | right_diff |
+---------+----------+-------+--------------+-----------+------------+
| a | b | 5 | 1 | 2 | 2 |
+---------+----------+-------+--------------+-----------+------------+
| a | c | 4 | 3 | 0 | 1 |
+---------+----------+-------+--------------+-----------+------------+
What would be the best way to approach this using PostgreSQL?
UPDATE: here is a rextester link with data https://rextester.com/RWID9864
sql postgresql
I have a table of pairs to aggregate as follows:
+---------+----------+
| left_id | right_id |
+---------+----------+
| a | b |
+---------+----------+
| a | c |
+---------+----------+
And a table of values as so:
+----+-------+
| id | value |
+----+-------+
| a | 1 |
+----+-------+
| a | 2 |
+----+-------+
| a | 3 |
+----+-------+
| b | 1 |
+----+-------+
| b | 4 |
+----+-------+
| b | 5 |
+----+-------+
| c | 1 |
+----+-------+
| c | 2 |
+----+-------+
| c | 3 |
+----+-------+
| c | 4 |
+----+-------+
For each pair, I would like to calculate the length of the union, intersection and set differences (each way) comparing the values, so that the output would look like this:
+---------+----------+-------+--------------+-----------+------------+
| left_id | right_id | union | intersection | left_diff | right_diff |
+---------+----------+-------+--------------+-----------+------------+
| a | b | 5 | 1 | 2 | 2 |
+---------+----------+-------+--------------+-----------+------------+
| a | c | 4 | 3 | 0 | 1 |
+---------+----------+-------+--------------+-----------+------------+
What would be the best way to approach this using PostgreSQL?
UPDATE: here is a rextester link with data https://rextester.com/RWID9864
sql postgresql
sql postgresql
edited Nov 20 '18 at 12:02
Floris
asked Nov 19 '18 at 21:19
FlorisFloris
18110
18110
I don't understand what the "pairwise union" is supposed to be. Can you explain the logic what that is5
fora,b
?
– a_horse_with_no_name
Nov 19 '18 at 21:43
The number of unique values if you combine all the values from a (1,2,3) and b (1,4,5). Union = (1,2,3,4,5) = 5 values. Intersection is (1) = 1 value, left diff (2,3) = 2 values, right diff (4,5) = 2 values.
– Floris
Nov 19 '18 at 21:45
add a comment |
I don't understand what the "pairwise union" is supposed to be. Can you explain the logic what that is5
fora,b
?
– a_horse_with_no_name
Nov 19 '18 at 21:43
The number of unique values if you combine all the values from a (1,2,3) and b (1,4,5). Union = (1,2,3,4,5) = 5 values. Intersection is (1) = 1 value, left diff (2,3) = 2 values, right diff (4,5) = 2 values.
– Floris
Nov 19 '18 at 21:45
I don't understand what the "pairwise union" is supposed to be. Can you explain the logic what that is
5
for a,b
?– a_horse_with_no_name
Nov 19 '18 at 21:43
I don't understand what the "pairwise union" is supposed to be. Can you explain the logic what that is
5
for a,b
?– a_horse_with_no_name
Nov 19 '18 at 21:43
The number of unique values if you combine all the values from a (1,2,3) and b (1,4,5). Union = (1,2,3,4,5) = 5 values. Intersection is (1) = 1 value, left diff (2,3) = 2 values, right diff (4,5) = 2 values.
– Floris
Nov 19 '18 at 21:45
The number of unique values if you combine all the values from a (1,2,3) and b (1,4,5). Union = (1,2,3,4,5) = 5 values. Intersection is (1) = 1 value, left diff (2,3) = 2 values, right diff (4,5) = 2 values.
– Floris
Nov 19 '18 at 21:45
add a comment |
2 Answers
2
active
oldest
votes
You need scalar sub-queries that do that.
The UNION can also be expressed by an OR
which makes that query somewhat shorter to write. But for the intersection you need a query that is a bit longer.
To calculate the "diff", use the except
operator:
SELECT p.*,
(select count(distinct value) from values where id in (p.left_id, p.right_id)) as "union",
(select count(*)
from (
select v.value from values v where id = p.left_id
intersect
select v.value from values v where id = p.right_id
) t) as intersection,
(select count(*)
from (
select v.value from values v where id = p.left_id
except
select v.value from values v where id = p.right_id
) t) as left_diff,
(select count(*)
from (
select v.value from values v where id = p.right_id
except
select v.value from values v where id = p.left_id
) t) as right_diff
from pairs p
Thank you for introducing me to "scalar subqueries". I had no idea there was such a thing. Another user posted a very similar solution a minute ago, however he/she noted that it could be quite inefficient. Unfortunately now deleted (which I really wished didn't happen as I was writing a response). Is there any way to make such a query more efficient? I suppose I could save the resulting table as a materialized view perhaps?
– Floris
Nov 19 '18 at 22:14
add a comment |
I don't know what causes your slowness, as I cannot see table sizes and/or explain plans. Presuming both tables are large enough to make nested loops inefficient and to not dare thinking about joining values to itself, I'd try to rewrite it free from scalar subqueries like this:
select p.*,
coalesce(stats."union", 0) "union",
coalesce(stats.intersection, 0) intersection,
coalesce(stats.left_cnt - stats.intersection, 0) left_diff,
coalesce(stats.right_cnt - stats.intersection, 0) right_diff
from pairs p
left join (
select left_id,
right_id,
count(*) "union",
count(has_left and has_right) intersection,
count(has_left) left_cnt,
count(has_right) right_cnt
from (
select p.*,
v."value" the_value,
true has_left
from pairs p
join "values" v on v.id = p.left_id
) l
full join (
select p.*,
v."value" the_value,
true has_right
from pairs p
join "values" v on v.id = p.right_id
) r using(left_id, right_id, the_value)
group by left_id,
right_id
) stats on p.left_id = stats.left_id
and p.right_id = stats.right_id;
Each join condition here allows hash and/or merge join, so the planner will have a chance to avoid nested loops.
Very cool. Thank you @alexey-bashtanov and @a_horse_with_no_name both for your answers. I learned a lot from both of these. In the end, this solution performs slightly faster than "scalar subquery", completing in 1m17s versus 1m36s. It's a bit of a mess to show the query plans for my dataset because thepairs
table is a view aggregating data from multiple tables (376 rows) so the plan is huge. Thevalues
table has 2,9 million rows btw. Too bad my question got downvoted.
– Floris
Nov 19 '18 at 23:52
1
in case pairs is such a small thing and aggregated from somewhere, maybe it would make sense to have a CTE for it, for it to be calculated only once. For plan optimization purposes, just create a temp table with the view contents and ANALYZE it, then use it instead of the view.
– Alexey Bashtanov
Nov 20 '18 at 0:00
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53382788%2fpostgresql-aggregate-union-intersection-and-set-differences%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
You need scalar sub-queries that do that.
The UNION can also be expressed by an OR
which makes that query somewhat shorter to write. But for the intersection you need a query that is a bit longer.
To calculate the "diff", use the except
operator:
SELECT p.*,
(select count(distinct value) from values where id in (p.left_id, p.right_id)) as "union",
(select count(*)
from (
select v.value from values v where id = p.left_id
intersect
select v.value from values v where id = p.right_id
) t) as intersection,
(select count(*)
from (
select v.value from values v where id = p.left_id
except
select v.value from values v where id = p.right_id
) t) as left_diff,
(select count(*)
from (
select v.value from values v where id = p.right_id
except
select v.value from values v where id = p.left_id
) t) as right_diff
from pairs p
Thank you for introducing me to "scalar subqueries". I had no idea there was such a thing. Another user posted a very similar solution a minute ago, however he/she noted that it could be quite inefficient. Unfortunately now deleted (which I really wished didn't happen as I was writing a response). Is there any way to make such a query more efficient? I suppose I could save the resulting table as a materialized view perhaps?
– Floris
Nov 19 '18 at 22:14
add a comment |
You need scalar sub-queries that do that.
The UNION can also be expressed by an OR
which makes that query somewhat shorter to write. But for the intersection you need a query that is a bit longer.
To calculate the "diff", use the except
operator:
SELECT p.*,
(select count(distinct value) from values where id in (p.left_id, p.right_id)) as "union",
(select count(*)
from (
select v.value from values v where id = p.left_id
intersect
select v.value from values v where id = p.right_id
) t) as intersection,
(select count(*)
from (
select v.value from values v where id = p.left_id
except
select v.value from values v where id = p.right_id
) t) as left_diff,
(select count(*)
from (
select v.value from values v where id = p.right_id
except
select v.value from values v where id = p.left_id
) t) as right_diff
from pairs p
Thank you for introducing me to "scalar subqueries". I had no idea there was such a thing. Another user posted a very similar solution a minute ago, however he/she noted that it could be quite inefficient. Unfortunately now deleted (which I really wished didn't happen as I was writing a response). Is there any way to make such a query more efficient? I suppose I could save the resulting table as a materialized view perhaps?
– Floris
Nov 19 '18 at 22:14
add a comment |
You need scalar sub-queries that do that.
The UNION can also be expressed by an OR
which makes that query somewhat shorter to write. But for the intersection you need a query that is a bit longer.
To calculate the "diff", use the except
operator:
SELECT p.*,
(select count(distinct value) from values where id in (p.left_id, p.right_id)) as "union",
(select count(*)
from (
select v.value from values v where id = p.left_id
intersect
select v.value from values v where id = p.right_id
) t) as intersection,
(select count(*)
from (
select v.value from values v where id = p.left_id
except
select v.value from values v where id = p.right_id
) t) as left_diff,
(select count(*)
from (
select v.value from values v where id = p.right_id
except
select v.value from values v where id = p.left_id
) t) as right_diff
from pairs p
You need scalar sub-queries that do that.
The UNION can also be expressed by an OR
which makes that query somewhat shorter to write. But for the intersection you need a query that is a bit longer.
To calculate the "diff", use the except
operator:
SELECT p.*,
(select count(distinct value) from values where id in (p.left_id, p.right_id)) as "union",
(select count(*)
from (
select v.value from values v where id = p.left_id
intersect
select v.value from values v where id = p.right_id
) t) as intersection,
(select count(*)
from (
select v.value from values v where id = p.left_id
except
select v.value from values v where id = p.right_id
) t) as left_diff,
(select count(*)
from (
select v.value from values v where id = p.right_id
except
select v.value from values v where id = p.left_id
) t) as right_diff
from pairs p
edited Nov 19 '18 at 22:02
answered Nov 19 '18 at 21:54
a_horse_with_no_namea_horse_with_no_name
296k46451546
296k46451546
Thank you for introducing me to "scalar subqueries". I had no idea there was such a thing. Another user posted a very similar solution a minute ago, however he/she noted that it could be quite inefficient. Unfortunately now deleted (which I really wished didn't happen as I was writing a response). Is there any way to make such a query more efficient? I suppose I could save the resulting table as a materialized view perhaps?
– Floris
Nov 19 '18 at 22:14
add a comment |
Thank you for introducing me to "scalar subqueries". I had no idea there was such a thing. Another user posted a very similar solution a minute ago, however he/she noted that it could be quite inefficient. Unfortunately now deleted (which I really wished didn't happen as I was writing a response). Is there any way to make such a query more efficient? I suppose I could save the resulting table as a materialized view perhaps?
– Floris
Nov 19 '18 at 22:14
Thank you for introducing me to "scalar subqueries". I had no idea there was such a thing. Another user posted a very similar solution a minute ago, however he/she noted that it could be quite inefficient. Unfortunately now deleted (which I really wished didn't happen as I was writing a response). Is there any way to make such a query more efficient? I suppose I could save the resulting table as a materialized view perhaps?
– Floris
Nov 19 '18 at 22:14
Thank you for introducing me to "scalar subqueries". I had no idea there was such a thing. Another user posted a very similar solution a minute ago, however he/she noted that it could be quite inefficient. Unfortunately now deleted (which I really wished didn't happen as I was writing a response). Is there any way to make such a query more efficient? I suppose I could save the resulting table as a materialized view perhaps?
– Floris
Nov 19 '18 at 22:14
add a comment |
I don't know what causes your slowness, as I cannot see table sizes and/or explain plans. Presuming both tables are large enough to make nested loops inefficient and to not dare thinking about joining values to itself, I'd try to rewrite it free from scalar subqueries like this:
select p.*,
coalesce(stats."union", 0) "union",
coalesce(stats.intersection, 0) intersection,
coalesce(stats.left_cnt - stats.intersection, 0) left_diff,
coalesce(stats.right_cnt - stats.intersection, 0) right_diff
from pairs p
left join (
select left_id,
right_id,
count(*) "union",
count(has_left and has_right) intersection,
count(has_left) left_cnt,
count(has_right) right_cnt
from (
select p.*,
v."value" the_value,
true has_left
from pairs p
join "values" v on v.id = p.left_id
) l
full join (
select p.*,
v."value" the_value,
true has_right
from pairs p
join "values" v on v.id = p.right_id
) r using(left_id, right_id, the_value)
group by left_id,
right_id
) stats on p.left_id = stats.left_id
and p.right_id = stats.right_id;
Each join condition here allows hash and/or merge join, so the planner will have a chance to avoid nested loops.
Very cool. Thank you @alexey-bashtanov and @a_horse_with_no_name both for your answers. I learned a lot from both of these. In the end, this solution performs slightly faster than "scalar subquery", completing in 1m17s versus 1m36s. It's a bit of a mess to show the query plans for my dataset because thepairs
table is a view aggregating data from multiple tables (376 rows) so the plan is huge. Thevalues
table has 2,9 million rows btw. Too bad my question got downvoted.
– Floris
Nov 19 '18 at 23:52
1
in case pairs is such a small thing and aggregated from somewhere, maybe it would make sense to have a CTE for it, for it to be calculated only once. For plan optimization purposes, just create a temp table with the view contents and ANALYZE it, then use it instead of the view.
– Alexey Bashtanov
Nov 20 '18 at 0:00
add a comment |
I don't know what causes your slowness, as I cannot see table sizes and/or explain plans. Presuming both tables are large enough to make nested loops inefficient and to not dare thinking about joining values to itself, I'd try to rewrite it free from scalar subqueries like this:
select p.*,
coalesce(stats."union", 0) "union",
coalesce(stats.intersection, 0) intersection,
coalesce(stats.left_cnt - stats.intersection, 0) left_diff,
coalesce(stats.right_cnt - stats.intersection, 0) right_diff
from pairs p
left join (
select left_id,
right_id,
count(*) "union",
count(has_left and has_right) intersection,
count(has_left) left_cnt,
count(has_right) right_cnt
from (
select p.*,
v."value" the_value,
true has_left
from pairs p
join "values" v on v.id = p.left_id
) l
full join (
select p.*,
v."value" the_value,
true has_right
from pairs p
join "values" v on v.id = p.right_id
) r using(left_id, right_id, the_value)
group by left_id,
right_id
) stats on p.left_id = stats.left_id
and p.right_id = stats.right_id;
Each join condition here allows hash and/or merge join, so the planner will have a chance to avoid nested loops.
Very cool. Thank you @alexey-bashtanov and @a_horse_with_no_name both for your answers. I learned a lot from both of these. In the end, this solution performs slightly faster than "scalar subquery", completing in 1m17s versus 1m36s. It's a bit of a mess to show the query plans for my dataset because thepairs
table is a view aggregating data from multiple tables (376 rows) so the plan is huge. Thevalues
table has 2,9 million rows btw. Too bad my question got downvoted.
– Floris
Nov 19 '18 at 23:52
1
in case pairs is such a small thing and aggregated from somewhere, maybe it would make sense to have a CTE for it, for it to be calculated only once. For plan optimization purposes, just create a temp table with the view contents and ANALYZE it, then use it instead of the view.
– Alexey Bashtanov
Nov 20 '18 at 0:00
add a comment |
I don't know what causes your slowness, as I cannot see table sizes and/or explain plans. Presuming both tables are large enough to make nested loops inefficient and to not dare thinking about joining values to itself, I'd try to rewrite it free from scalar subqueries like this:
select p.*,
coalesce(stats."union", 0) "union",
coalesce(stats.intersection, 0) intersection,
coalesce(stats.left_cnt - stats.intersection, 0) left_diff,
coalesce(stats.right_cnt - stats.intersection, 0) right_diff
from pairs p
left join (
select left_id,
right_id,
count(*) "union",
count(has_left and has_right) intersection,
count(has_left) left_cnt,
count(has_right) right_cnt
from (
select p.*,
v."value" the_value,
true has_left
from pairs p
join "values" v on v.id = p.left_id
) l
full join (
select p.*,
v."value" the_value,
true has_right
from pairs p
join "values" v on v.id = p.right_id
) r using(left_id, right_id, the_value)
group by left_id,
right_id
) stats on p.left_id = stats.left_id
and p.right_id = stats.right_id;
Each join condition here allows hash and/or merge join, so the planner will have a chance to avoid nested loops.
I don't know what causes your slowness, as I cannot see table sizes and/or explain plans. Presuming both tables are large enough to make nested loops inefficient and to not dare thinking about joining values to itself, I'd try to rewrite it free from scalar subqueries like this:
select p.*,
coalesce(stats."union", 0) "union",
coalesce(stats.intersection, 0) intersection,
coalesce(stats.left_cnt - stats.intersection, 0) left_diff,
coalesce(stats.right_cnt - stats.intersection, 0) right_diff
from pairs p
left join (
select left_id,
right_id,
count(*) "union",
count(has_left and has_right) intersection,
count(has_left) left_cnt,
count(has_right) right_cnt
from (
select p.*,
v."value" the_value,
true has_left
from pairs p
join "values" v on v.id = p.left_id
) l
full join (
select p.*,
v."value" the_value,
true has_right
from pairs p
join "values" v on v.id = p.right_id
) r using(left_id, right_id, the_value)
group by left_id,
right_id
) stats on p.left_id = stats.left_id
and p.right_id = stats.right_id;
Each join condition here allows hash and/or merge join, so the planner will have a chance to avoid nested loops.
edited Nov 20 '18 at 12:04
a_horse_with_no_name
296k46451546
296k46451546
answered Nov 19 '18 at 23:06
Alexey BashtanovAlexey Bashtanov
40945
40945
Very cool. Thank you @alexey-bashtanov and @a_horse_with_no_name both for your answers. I learned a lot from both of these. In the end, this solution performs slightly faster than "scalar subquery", completing in 1m17s versus 1m36s. It's a bit of a mess to show the query plans for my dataset because thepairs
table is a view aggregating data from multiple tables (376 rows) so the plan is huge. Thevalues
table has 2,9 million rows btw. Too bad my question got downvoted.
– Floris
Nov 19 '18 at 23:52
1
in case pairs is such a small thing and aggregated from somewhere, maybe it would make sense to have a CTE for it, for it to be calculated only once. For plan optimization purposes, just create a temp table with the view contents and ANALYZE it, then use it instead of the view.
– Alexey Bashtanov
Nov 20 '18 at 0:00
add a comment |
Very cool. Thank you @alexey-bashtanov and @a_horse_with_no_name both for your answers. I learned a lot from both of these. In the end, this solution performs slightly faster than "scalar subquery", completing in 1m17s versus 1m36s. It's a bit of a mess to show the query plans for my dataset because thepairs
table is a view aggregating data from multiple tables (376 rows) so the plan is huge. Thevalues
table has 2,9 million rows btw. Too bad my question got downvoted.
– Floris
Nov 19 '18 at 23:52
1
in case pairs is such a small thing and aggregated from somewhere, maybe it would make sense to have a CTE for it, for it to be calculated only once. For plan optimization purposes, just create a temp table with the view contents and ANALYZE it, then use it instead of the view.
– Alexey Bashtanov
Nov 20 '18 at 0:00
Very cool. Thank you @alexey-bashtanov and @a_horse_with_no_name both for your answers. I learned a lot from both of these. In the end, this solution performs slightly faster than "scalar subquery", completing in 1m17s versus 1m36s. It's a bit of a mess to show the query plans for my dataset because the
pairs
table is a view aggregating data from multiple tables (376 rows) so the plan is huge. The values
table has 2,9 million rows btw. Too bad my question got downvoted.– Floris
Nov 19 '18 at 23:52
Very cool. Thank you @alexey-bashtanov and @a_horse_with_no_name both for your answers. I learned a lot from both of these. In the end, this solution performs slightly faster than "scalar subquery", completing in 1m17s versus 1m36s. It's a bit of a mess to show the query plans for my dataset because the
pairs
table is a view aggregating data from multiple tables (376 rows) so the plan is huge. The values
table has 2,9 million rows btw. Too bad my question got downvoted.– Floris
Nov 19 '18 at 23:52
1
1
in case pairs is such a small thing and aggregated from somewhere, maybe it would make sense to have a CTE for it, for it to be calculated only once. For plan optimization purposes, just create a temp table with the view contents and ANALYZE it, then use it instead of the view.
– Alexey Bashtanov
Nov 20 '18 at 0:00
in case pairs is such a small thing and aggregated from somewhere, maybe it would make sense to have a CTE for it, for it to be calculated only once. For plan optimization purposes, just create a temp table with the view contents and ANALYZE it, then use it instead of the view.
– Alexey Bashtanov
Nov 20 '18 at 0:00
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53382788%2fpostgresql-aggregate-union-intersection-and-set-differences%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
I don't understand what the "pairwise union" is supposed to be. Can you explain the logic what that is
5
fora,b
?– a_horse_with_no_name
Nov 19 '18 at 21:43
The number of unique values if you combine all the values from a (1,2,3) and b (1,4,5). Union = (1,2,3,4,5) = 5 values. Intersection is (1) = 1 value, left diff (2,3) = 2 values, right diff (4,5) = 2 values.
– Floris
Nov 19 '18 at 21:45