Getting data over time grouped by latest related_id for each interval
up vote
0
down vote
favorite
I'm trying to write a query that retrieves cumulative results over time that only fetches the latest instance of a result set per interval for each related id.
Example:
Imagine there is a users table, and each user can create reports with issues in them. The issues get aggregated in an report_totals table with a sum of the issue category in them. The tables might look something like this
users
id, email
reports
id, user_id, date
report_totals
id, report_id, errors, alerts
Here's the part I'm struggling with, if a user hasn't submitted a report in the current time interval, it should backfill that data with the previous interval's sum. So let's say we have data that looks like this
reports
1, 1, 2018-1-1
2, 2, 2018-1-1
3, 1, 2018-1-4
4, 1, 2018-2-1
5, 1, 2018-3-1
6, 2, 2018-3-1
report_totals
1, 1, 5, 5
2, 2, 3, 0
3, 3, 2, 0
4, 4, 10, 2
5, 5, 30, 15
6, 6, 1, 2
I want to write a query to return results that look like this
date, errors, alerts
2018-1-1, 5, 0
2018-2-1, 13, 2
2018-3-1, 31, 17
The report interval is 1 month, so it only uses the latest result from each month, for each user, sums them, and will backfill from previous intervals if there isn't a record for that user.
Is something like this possible in MySQL and is it the right way to go about it? Thanks in advance, sorry if this has been answered before, I haven't found anything that quite accomplishes what I'm looking for.
mysql
add a comment |
up vote
0
down vote
favorite
I'm trying to write a query that retrieves cumulative results over time that only fetches the latest instance of a result set per interval for each related id.
Example:
Imagine there is a users table, and each user can create reports with issues in them. The issues get aggregated in an report_totals table with a sum of the issue category in them. The tables might look something like this
users
id, email
reports
id, user_id, date
report_totals
id, report_id, errors, alerts
Here's the part I'm struggling with, if a user hasn't submitted a report in the current time interval, it should backfill that data with the previous interval's sum. So let's say we have data that looks like this
reports
1, 1, 2018-1-1
2, 2, 2018-1-1
3, 1, 2018-1-4
4, 1, 2018-2-1
5, 1, 2018-3-1
6, 2, 2018-3-1
report_totals
1, 1, 5, 5
2, 2, 3, 0
3, 3, 2, 0
4, 4, 10, 2
5, 5, 30, 15
6, 6, 1, 2
I want to write a query to return results that look like this
date, errors, alerts
2018-1-1, 5, 0
2018-2-1, 13, 2
2018-3-1, 31, 17
The report interval is 1 month, so it only uses the latest result from each month, for each user, sums them, and will backfill from previous intervals if there isn't a record for that user.
Is something like this possible in MySQL and is it the right way to go about it? Thanks in advance, sorry if this has been answered before, I haven't found anything that quite accomplishes what I'm looking for.
mysql
add a comment |
up vote
0
down vote
favorite
up vote
0
down vote
favorite
I'm trying to write a query that retrieves cumulative results over time that only fetches the latest instance of a result set per interval for each related id.
Example:
Imagine there is a users table, and each user can create reports with issues in them. The issues get aggregated in an report_totals table with a sum of the issue category in them. The tables might look something like this
users
id, email
reports
id, user_id, date
report_totals
id, report_id, errors, alerts
Here's the part I'm struggling with, if a user hasn't submitted a report in the current time interval, it should backfill that data with the previous interval's sum. So let's say we have data that looks like this
reports
1, 1, 2018-1-1
2, 2, 2018-1-1
3, 1, 2018-1-4
4, 1, 2018-2-1
5, 1, 2018-3-1
6, 2, 2018-3-1
report_totals
1, 1, 5, 5
2, 2, 3, 0
3, 3, 2, 0
4, 4, 10, 2
5, 5, 30, 15
6, 6, 1, 2
I want to write a query to return results that look like this
date, errors, alerts
2018-1-1, 5, 0
2018-2-1, 13, 2
2018-3-1, 31, 17
The report interval is 1 month, so it only uses the latest result from each month, for each user, sums them, and will backfill from previous intervals if there isn't a record for that user.
Is something like this possible in MySQL and is it the right way to go about it? Thanks in advance, sorry if this has been answered before, I haven't found anything that quite accomplishes what I'm looking for.
mysql
I'm trying to write a query that retrieves cumulative results over time that only fetches the latest instance of a result set per interval for each related id.
Example:
Imagine there is a users table, and each user can create reports with issues in them. The issues get aggregated in an report_totals table with a sum of the issue category in them. The tables might look something like this
users
id, email
reports
id, user_id, date
report_totals
id, report_id, errors, alerts
Here's the part I'm struggling with, if a user hasn't submitted a report in the current time interval, it should backfill that data with the previous interval's sum. So let's say we have data that looks like this
reports
1, 1, 2018-1-1
2, 2, 2018-1-1
3, 1, 2018-1-4
4, 1, 2018-2-1
5, 1, 2018-3-1
6, 2, 2018-3-1
report_totals
1, 1, 5, 5
2, 2, 3, 0
3, 3, 2, 0
4, 4, 10, 2
5, 5, 30, 15
6, 6, 1, 2
I want to write a query to return results that look like this
date, errors, alerts
2018-1-1, 5, 0
2018-2-1, 13, 2
2018-3-1, 31, 17
The report interval is 1 month, so it only uses the latest result from each month, for each user, sums them, and will backfill from previous intervals if there isn't a record for that user.
Is something like this possible in MySQL and is it the right way to go about it? Thanks in advance, sorry if this has been answered before, I haven't found anything that quite accomplishes what I'm looking for.
mysql
mysql
asked Nov 14 at 22:43
MattM
346
346
add a comment |
add a comment |
2 Answers
2
active
oldest
votes
up vote
2
down vote
It is a tricky problem, but not unsolvable using MySQL :-) It may be solved in less verbose query and possibly performant manner using Window functions with Frames, available in MySQL version 8.0.2 and above. However, we can also solve this problem using Correlated Subqueries, using a mix of CROSS JOIN
and Derived Tables. I will break down the query and try to explain it in steps.
Since you want to consider previous months' reporting values, even if there is no reporting done in the current month, so our first step would be to generate a "Master Table" which basically contains all possible combinations of user_id
and first date of a month
. This can be done within the query itself.
We can get all the unique user_id
values from the users
table. And, the start dates of all the reporting months can be determined using the following query.
SELECT MIN(DATE_FORMAT(date, '%Y-%m-01')) AS date,
MONTH(date) AS month
FROM reports
GROUP BY month
Now, there may be a possibility that there is no reporting done for a particular month at all. In that case, you will rather have to use a Master Calendar table. However, for practical purposes, it will be extremely rare to have no reporting done at all in a whole month.
Now, we can get all possible combinations using CROSS JOIN
:
(SELECT MIN(DATE_FORMAT(date, '%Y-%m-01')) AS date,
MONTH(date) AS month
FROM reports
GROUP BY month) AS all_mon
CROSS JOIN
users AS u
Now, we can use Correlated Subquery to determine errors
and alerts
for every row in the above-generated table. We will look for the latest row from the report_totals
table where the user_id
matches, and month of the reporting being less than or equal to the current month. For errors
, the subquery would look as follows:
SELECT rt1.errors
FROM report_totals AS rt1
JOIN reports AS r1 ON r1.id = rt1.report_id
WHERE r1.user_id = u.id AND
MONTH(r1.date) <= all_mon.month
ORDER BY r1.date DESC LIMIT 1
A similar subquery would be used for determining the alerts
.
Finally, we will take this complete result-set as a Derived Table, and do a GROUP BY
on the month (first date of the month), and compute the SUM()
on alerts
and errors
for all the users.
Final (and complete) query would look as follows:
SELECT dt.date,
Sum(dt.errors) AS errors,
Sum(dt.alerts) AS alerts
FROM (SELECT all_mon.date,
u.id,
(SELECT rt1.errors
FROM report_totals AS rt1
JOIN reports AS r1
ON r1.id = rt1.report_id
WHERE r1.user_id = u.id
AND Month(r1.date) <= all_mon.month
ORDER BY r1.date DESC
LIMIT 1) AS errors,
(SELECT rt1.alerts
FROM report_totals AS rt1
JOIN reports AS r1
ON r1.id = rt1.report_id
WHERE r1.user_id = u.id
AND Month(r1.date) <= all_mon.month
ORDER BY r1.date DESC
LIMIT 1) AS alerts
FROM (SELECT Min(Date_format(date, '%Y-%m-01')) AS date,
Month(date) AS month
FROM reports
GROUP BY month) AS all_mon
CROSS JOIN users AS u) AS dt
GROUP BY dt.date
View on DB Fiddle
Result:
| date | errors | alerts |
| ---------- | ------ | ------ |
| 2018-01-01 | 5 | 0 |
| 2018-02-01 | 13 | 2 |
| 2018-03-01 | 31 | 17 |
Edit 1: First Optimization
I did not like the usage of Two similar correlated subqueries to get errors
and alerts
separately. But, this is a limitation with MySQL, that it does not allow more than one operand in such subqueries. So instead, as a hack, we can Concat()
them into a single string, using some delimiter, such as |
. This will reduce the subqueries to be used into one.
Now, in the outermost query, we can use string functions like Substring_Index()
and Cast()
to extract the respective values out as number(s) and do Sum()
operations accordingly.
Query #2
SELECT dt.date,
Sum(Cast(Substring_index(dt.error_alerts, '|', 1) AS UNSIGNED)) AS
errors,
Sum(Cast(Substring_index(dt.error_alerts, '|', -1) AS UNSIGNED)) AS
alerts
FROM (SELECT all_mon.date,
u.id,
(SELECT Concat(rt1.errors, '|', rt1.alerts)
FROM report_totals AS rt1
JOIN reports AS r1
ON r1.id = rt1.report_id
WHERE r1.user_id = u.id
AND Month(r1.date) <= all_mon.month
ORDER BY r1.date DESC
LIMIT 1) AS error_alerts
FROM (SELECT Min(Date_format(date, '%Y-%m-01')) AS date,
Month(date) AS month
FROM reports
GROUP BY month) AS all_mon
CROSS JOIN users AS u) AS dt
GROUP BY dt.date
Result
| date | errors | alerts |
| ---------- | ------ | ------ |
| 2018-01-01 | 5 | 0 |
| 2018-02-01 | 13 | 2 |
| 2018-03-01 | 31 | 17 |
View on DB Fiddle
add a comment |
up vote
0
down vote
I think below query will work :
select r.date,sum(rt.alerts),sum(rt.errors) from reports r join report_totals rt on r.id = rt.report_id group by r.date;
I don't think you read my full requirements. I need only 1 result per interval, and it should only grab the latest report per user. That query fulfilled none of those requirements. Here is a sqlfiddle. Please look at my desired output vs the output from your query. sqlfiddle.com/#!9/c142ba/1
– MattM
Nov 15 at 15:35
add a comment |
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
2
down vote
It is a tricky problem, but not unsolvable using MySQL :-) It may be solved in less verbose query and possibly performant manner using Window functions with Frames, available in MySQL version 8.0.2 and above. However, we can also solve this problem using Correlated Subqueries, using a mix of CROSS JOIN
and Derived Tables. I will break down the query and try to explain it in steps.
Since you want to consider previous months' reporting values, even if there is no reporting done in the current month, so our first step would be to generate a "Master Table" which basically contains all possible combinations of user_id
and first date of a month
. This can be done within the query itself.
We can get all the unique user_id
values from the users
table. And, the start dates of all the reporting months can be determined using the following query.
SELECT MIN(DATE_FORMAT(date, '%Y-%m-01')) AS date,
MONTH(date) AS month
FROM reports
GROUP BY month
Now, there may be a possibility that there is no reporting done for a particular month at all. In that case, you will rather have to use a Master Calendar table. However, for practical purposes, it will be extremely rare to have no reporting done at all in a whole month.
Now, we can get all possible combinations using CROSS JOIN
:
(SELECT MIN(DATE_FORMAT(date, '%Y-%m-01')) AS date,
MONTH(date) AS month
FROM reports
GROUP BY month) AS all_mon
CROSS JOIN
users AS u
Now, we can use Correlated Subquery to determine errors
and alerts
for every row in the above-generated table. We will look for the latest row from the report_totals
table where the user_id
matches, and month of the reporting being less than or equal to the current month. For errors
, the subquery would look as follows:
SELECT rt1.errors
FROM report_totals AS rt1
JOIN reports AS r1 ON r1.id = rt1.report_id
WHERE r1.user_id = u.id AND
MONTH(r1.date) <= all_mon.month
ORDER BY r1.date DESC LIMIT 1
A similar subquery would be used for determining the alerts
.
Finally, we will take this complete result-set as a Derived Table, and do a GROUP BY
on the month (first date of the month), and compute the SUM()
on alerts
and errors
for all the users.
Final (and complete) query would look as follows:
SELECT dt.date,
Sum(dt.errors) AS errors,
Sum(dt.alerts) AS alerts
FROM (SELECT all_mon.date,
u.id,
(SELECT rt1.errors
FROM report_totals AS rt1
JOIN reports AS r1
ON r1.id = rt1.report_id
WHERE r1.user_id = u.id
AND Month(r1.date) <= all_mon.month
ORDER BY r1.date DESC
LIMIT 1) AS errors,
(SELECT rt1.alerts
FROM report_totals AS rt1
JOIN reports AS r1
ON r1.id = rt1.report_id
WHERE r1.user_id = u.id
AND Month(r1.date) <= all_mon.month
ORDER BY r1.date DESC
LIMIT 1) AS alerts
FROM (SELECT Min(Date_format(date, '%Y-%m-01')) AS date,
Month(date) AS month
FROM reports
GROUP BY month) AS all_mon
CROSS JOIN users AS u) AS dt
GROUP BY dt.date
View on DB Fiddle
Result:
| date | errors | alerts |
| ---------- | ------ | ------ |
| 2018-01-01 | 5 | 0 |
| 2018-02-01 | 13 | 2 |
| 2018-03-01 | 31 | 17 |
Edit 1: First Optimization
I did not like the usage of Two similar correlated subqueries to get errors
and alerts
separately. But, this is a limitation with MySQL, that it does not allow more than one operand in such subqueries. So instead, as a hack, we can Concat()
them into a single string, using some delimiter, such as |
. This will reduce the subqueries to be used into one.
Now, in the outermost query, we can use string functions like Substring_Index()
and Cast()
to extract the respective values out as number(s) and do Sum()
operations accordingly.
Query #2
SELECT dt.date,
Sum(Cast(Substring_index(dt.error_alerts, '|', 1) AS UNSIGNED)) AS
errors,
Sum(Cast(Substring_index(dt.error_alerts, '|', -1) AS UNSIGNED)) AS
alerts
FROM (SELECT all_mon.date,
u.id,
(SELECT Concat(rt1.errors, '|', rt1.alerts)
FROM report_totals AS rt1
JOIN reports AS r1
ON r1.id = rt1.report_id
WHERE r1.user_id = u.id
AND Month(r1.date) <= all_mon.month
ORDER BY r1.date DESC
LIMIT 1) AS error_alerts
FROM (SELECT Min(Date_format(date, '%Y-%m-01')) AS date,
Month(date) AS month
FROM reports
GROUP BY month) AS all_mon
CROSS JOIN users AS u) AS dt
GROUP BY dt.date
Result
| date | errors | alerts |
| ---------- | ------ | ------ |
| 2018-01-01 | 5 | 0 |
| 2018-02-01 | 13 | 2 |
| 2018-03-01 | 31 | 17 |
View on DB Fiddle
add a comment |
up vote
2
down vote
It is a tricky problem, but not unsolvable using MySQL :-) It may be solved in less verbose query and possibly performant manner using Window functions with Frames, available in MySQL version 8.0.2 and above. However, we can also solve this problem using Correlated Subqueries, using a mix of CROSS JOIN
and Derived Tables. I will break down the query and try to explain it in steps.
Since you want to consider previous months' reporting values, even if there is no reporting done in the current month, so our first step would be to generate a "Master Table" which basically contains all possible combinations of user_id
and first date of a month
. This can be done within the query itself.
We can get all the unique user_id
values from the users
table. And, the start dates of all the reporting months can be determined using the following query.
SELECT MIN(DATE_FORMAT(date, '%Y-%m-01')) AS date,
MONTH(date) AS month
FROM reports
GROUP BY month
Now, there may be a possibility that there is no reporting done for a particular month at all. In that case, you will rather have to use a Master Calendar table. However, for practical purposes, it will be extremely rare to have no reporting done at all in a whole month.
Now, we can get all possible combinations using CROSS JOIN
:
(SELECT MIN(DATE_FORMAT(date, '%Y-%m-01')) AS date,
MONTH(date) AS month
FROM reports
GROUP BY month) AS all_mon
CROSS JOIN
users AS u
Now, we can use Correlated Subquery to determine errors
and alerts
for every row in the above-generated table. We will look for the latest row from the report_totals
table where the user_id
matches, and month of the reporting being less than or equal to the current month. For errors
, the subquery would look as follows:
SELECT rt1.errors
FROM report_totals AS rt1
JOIN reports AS r1 ON r1.id = rt1.report_id
WHERE r1.user_id = u.id AND
MONTH(r1.date) <= all_mon.month
ORDER BY r1.date DESC LIMIT 1
A similar subquery would be used for determining the alerts
.
Finally, we will take this complete result-set as a Derived Table, and do a GROUP BY
on the month (first date of the month), and compute the SUM()
on alerts
and errors
for all the users.
Final (and complete) query would look as follows:
SELECT dt.date,
Sum(dt.errors) AS errors,
Sum(dt.alerts) AS alerts
FROM (SELECT all_mon.date,
u.id,
(SELECT rt1.errors
FROM report_totals AS rt1
JOIN reports AS r1
ON r1.id = rt1.report_id
WHERE r1.user_id = u.id
AND Month(r1.date) <= all_mon.month
ORDER BY r1.date DESC
LIMIT 1) AS errors,
(SELECT rt1.alerts
FROM report_totals AS rt1
JOIN reports AS r1
ON r1.id = rt1.report_id
WHERE r1.user_id = u.id
AND Month(r1.date) <= all_mon.month
ORDER BY r1.date DESC
LIMIT 1) AS alerts
FROM (SELECT Min(Date_format(date, '%Y-%m-01')) AS date,
Month(date) AS month
FROM reports
GROUP BY month) AS all_mon
CROSS JOIN users AS u) AS dt
GROUP BY dt.date
View on DB Fiddle
Result:
| date | errors | alerts |
| ---------- | ------ | ------ |
| 2018-01-01 | 5 | 0 |
| 2018-02-01 | 13 | 2 |
| 2018-03-01 | 31 | 17 |
Edit 1: First Optimization
I did not like the usage of Two similar correlated subqueries to get errors
and alerts
separately. But, this is a limitation with MySQL, that it does not allow more than one operand in such subqueries. So instead, as a hack, we can Concat()
them into a single string, using some delimiter, such as |
. This will reduce the subqueries to be used into one.
Now, in the outermost query, we can use string functions like Substring_Index()
and Cast()
to extract the respective values out as number(s) and do Sum()
operations accordingly.
Query #2
SELECT dt.date,
Sum(Cast(Substring_index(dt.error_alerts, '|', 1) AS UNSIGNED)) AS
errors,
Sum(Cast(Substring_index(dt.error_alerts, '|', -1) AS UNSIGNED)) AS
alerts
FROM (SELECT all_mon.date,
u.id,
(SELECT Concat(rt1.errors, '|', rt1.alerts)
FROM report_totals AS rt1
JOIN reports AS r1
ON r1.id = rt1.report_id
WHERE r1.user_id = u.id
AND Month(r1.date) <= all_mon.month
ORDER BY r1.date DESC
LIMIT 1) AS error_alerts
FROM (SELECT Min(Date_format(date, '%Y-%m-01')) AS date,
Month(date) AS month
FROM reports
GROUP BY month) AS all_mon
CROSS JOIN users AS u) AS dt
GROUP BY dt.date
Result
| date | errors | alerts |
| ---------- | ------ | ------ |
| 2018-01-01 | 5 | 0 |
| 2018-02-01 | 13 | 2 |
| 2018-03-01 | 31 | 17 |
View on DB Fiddle
add a comment |
up vote
2
down vote
up vote
2
down vote
It is a tricky problem, but not unsolvable using MySQL :-) It may be solved in less verbose query and possibly performant manner using Window functions with Frames, available in MySQL version 8.0.2 and above. However, we can also solve this problem using Correlated Subqueries, using a mix of CROSS JOIN
and Derived Tables. I will break down the query and try to explain it in steps.
Since you want to consider previous months' reporting values, even if there is no reporting done in the current month, so our first step would be to generate a "Master Table" which basically contains all possible combinations of user_id
and first date of a month
. This can be done within the query itself.
We can get all the unique user_id
values from the users
table. And, the start dates of all the reporting months can be determined using the following query.
SELECT MIN(DATE_FORMAT(date, '%Y-%m-01')) AS date,
MONTH(date) AS month
FROM reports
GROUP BY month
Now, there may be a possibility that there is no reporting done for a particular month at all. In that case, you will rather have to use a Master Calendar table. However, for practical purposes, it will be extremely rare to have no reporting done at all in a whole month.
Now, we can get all possible combinations using CROSS JOIN
:
(SELECT MIN(DATE_FORMAT(date, '%Y-%m-01')) AS date,
MONTH(date) AS month
FROM reports
GROUP BY month) AS all_mon
CROSS JOIN
users AS u
Now, we can use Correlated Subquery to determine errors
and alerts
for every row in the above-generated table. We will look for the latest row from the report_totals
table where the user_id
matches, and month of the reporting being less than or equal to the current month. For errors
, the subquery would look as follows:
SELECT rt1.errors
FROM report_totals AS rt1
JOIN reports AS r1 ON r1.id = rt1.report_id
WHERE r1.user_id = u.id AND
MONTH(r1.date) <= all_mon.month
ORDER BY r1.date DESC LIMIT 1
A similar subquery would be used for determining the alerts
.
Finally, we will take this complete result-set as a Derived Table, and do a GROUP BY
on the month (first date of the month), and compute the SUM()
on alerts
and errors
for all the users.
Final (and complete) query would look as follows:
SELECT dt.date,
Sum(dt.errors) AS errors,
Sum(dt.alerts) AS alerts
FROM (SELECT all_mon.date,
u.id,
(SELECT rt1.errors
FROM report_totals AS rt1
JOIN reports AS r1
ON r1.id = rt1.report_id
WHERE r1.user_id = u.id
AND Month(r1.date) <= all_mon.month
ORDER BY r1.date DESC
LIMIT 1) AS errors,
(SELECT rt1.alerts
FROM report_totals AS rt1
JOIN reports AS r1
ON r1.id = rt1.report_id
WHERE r1.user_id = u.id
AND Month(r1.date) <= all_mon.month
ORDER BY r1.date DESC
LIMIT 1) AS alerts
FROM (SELECT Min(Date_format(date, '%Y-%m-01')) AS date,
Month(date) AS month
FROM reports
GROUP BY month) AS all_mon
CROSS JOIN users AS u) AS dt
GROUP BY dt.date
View on DB Fiddle
Result:
| date | errors | alerts |
| ---------- | ------ | ------ |
| 2018-01-01 | 5 | 0 |
| 2018-02-01 | 13 | 2 |
| 2018-03-01 | 31 | 17 |
Edit 1: First Optimization
I did not like the usage of Two similar correlated subqueries to get errors
and alerts
separately. But, this is a limitation with MySQL, that it does not allow more than one operand in such subqueries. So instead, as a hack, we can Concat()
them into a single string, using some delimiter, such as |
. This will reduce the subqueries to be used into one.
Now, in the outermost query, we can use string functions like Substring_Index()
and Cast()
to extract the respective values out as number(s) and do Sum()
operations accordingly.
Query #2
SELECT dt.date,
Sum(Cast(Substring_index(dt.error_alerts, '|', 1) AS UNSIGNED)) AS
errors,
Sum(Cast(Substring_index(dt.error_alerts, '|', -1) AS UNSIGNED)) AS
alerts
FROM (SELECT all_mon.date,
u.id,
(SELECT Concat(rt1.errors, '|', rt1.alerts)
FROM report_totals AS rt1
JOIN reports AS r1
ON r1.id = rt1.report_id
WHERE r1.user_id = u.id
AND Month(r1.date) <= all_mon.month
ORDER BY r1.date DESC
LIMIT 1) AS error_alerts
FROM (SELECT Min(Date_format(date, '%Y-%m-01')) AS date,
Month(date) AS month
FROM reports
GROUP BY month) AS all_mon
CROSS JOIN users AS u) AS dt
GROUP BY dt.date
Result
| date | errors | alerts |
| ---------- | ------ | ------ |
| 2018-01-01 | 5 | 0 |
| 2018-02-01 | 13 | 2 |
| 2018-03-01 | 31 | 17 |
View on DB Fiddle
It is a tricky problem, but not unsolvable using MySQL :-) It may be solved in less verbose query and possibly performant manner using Window functions with Frames, available in MySQL version 8.0.2 and above. However, we can also solve this problem using Correlated Subqueries, using a mix of CROSS JOIN
and Derived Tables. I will break down the query and try to explain it in steps.
Since you want to consider previous months' reporting values, even if there is no reporting done in the current month, so our first step would be to generate a "Master Table" which basically contains all possible combinations of user_id
and first date of a month
. This can be done within the query itself.
We can get all the unique user_id
values from the users
table. And, the start dates of all the reporting months can be determined using the following query.
SELECT MIN(DATE_FORMAT(date, '%Y-%m-01')) AS date,
MONTH(date) AS month
FROM reports
GROUP BY month
Now, there may be a possibility that there is no reporting done for a particular month at all. In that case, you will rather have to use a Master Calendar table. However, for practical purposes, it will be extremely rare to have no reporting done at all in a whole month.
Now, we can get all possible combinations using CROSS JOIN
:
(SELECT MIN(DATE_FORMAT(date, '%Y-%m-01')) AS date,
MONTH(date) AS month
FROM reports
GROUP BY month) AS all_mon
CROSS JOIN
users AS u
Now, we can use Correlated Subquery to determine errors
and alerts
for every row in the above-generated table. We will look for the latest row from the report_totals
table where the user_id
matches, and month of the reporting being less than or equal to the current month. For errors
, the subquery would look as follows:
SELECT rt1.errors
FROM report_totals AS rt1
JOIN reports AS r1 ON r1.id = rt1.report_id
WHERE r1.user_id = u.id AND
MONTH(r1.date) <= all_mon.month
ORDER BY r1.date DESC LIMIT 1
A similar subquery would be used for determining the alerts
.
Finally, we will take this complete result-set as a Derived Table, and do a GROUP BY
on the month (first date of the month), and compute the SUM()
on alerts
and errors
for all the users.
Final (and complete) query would look as follows:
SELECT dt.date,
Sum(dt.errors) AS errors,
Sum(dt.alerts) AS alerts
FROM (SELECT all_mon.date,
u.id,
(SELECT rt1.errors
FROM report_totals AS rt1
JOIN reports AS r1
ON r1.id = rt1.report_id
WHERE r1.user_id = u.id
AND Month(r1.date) <= all_mon.month
ORDER BY r1.date DESC
LIMIT 1) AS errors,
(SELECT rt1.alerts
FROM report_totals AS rt1
JOIN reports AS r1
ON r1.id = rt1.report_id
WHERE r1.user_id = u.id
AND Month(r1.date) <= all_mon.month
ORDER BY r1.date DESC
LIMIT 1) AS alerts
FROM (SELECT Min(Date_format(date, '%Y-%m-01')) AS date,
Month(date) AS month
FROM reports
GROUP BY month) AS all_mon
CROSS JOIN users AS u) AS dt
GROUP BY dt.date
View on DB Fiddle
Result:
| date | errors | alerts |
| ---------- | ------ | ------ |
| 2018-01-01 | 5 | 0 |
| 2018-02-01 | 13 | 2 |
| 2018-03-01 | 31 | 17 |
Edit 1: First Optimization
I did not like the usage of Two similar correlated subqueries to get errors
and alerts
separately. But, this is a limitation with MySQL, that it does not allow more than one operand in such subqueries. So instead, as a hack, we can Concat()
them into a single string, using some delimiter, such as |
. This will reduce the subqueries to be used into one.
Now, in the outermost query, we can use string functions like Substring_Index()
and Cast()
to extract the respective values out as number(s) and do Sum()
operations accordingly.
Query #2
SELECT dt.date,
Sum(Cast(Substring_index(dt.error_alerts, '|', 1) AS UNSIGNED)) AS
errors,
Sum(Cast(Substring_index(dt.error_alerts, '|', -1) AS UNSIGNED)) AS
alerts
FROM (SELECT all_mon.date,
u.id,
(SELECT Concat(rt1.errors, '|', rt1.alerts)
FROM report_totals AS rt1
JOIN reports AS r1
ON r1.id = rt1.report_id
WHERE r1.user_id = u.id
AND Month(r1.date) <= all_mon.month
ORDER BY r1.date DESC
LIMIT 1) AS error_alerts
FROM (SELECT Min(Date_format(date, '%Y-%m-01')) AS date,
Month(date) AS month
FROM reports
GROUP BY month) AS all_mon
CROSS JOIN users AS u) AS dt
GROUP BY dt.date
Result
| date | errors | alerts |
| ---------- | ------ | ------ |
| 2018-01-01 | 5 | 0 |
| 2018-02-01 | 13 | 2 |
| 2018-03-01 | 31 | 17 |
View on DB Fiddle
edited Nov 17 at 12:48
answered Nov 17 at 10:58
Madhur Bhaiya
19.1k62236
19.1k62236
add a comment |
add a comment |
up vote
0
down vote
I think below query will work :
select r.date,sum(rt.alerts),sum(rt.errors) from reports r join report_totals rt on r.id = rt.report_id group by r.date;
I don't think you read my full requirements. I need only 1 result per interval, and it should only grab the latest report per user. That query fulfilled none of those requirements. Here is a sqlfiddle. Please look at my desired output vs the output from your query. sqlfiddle.com/#!9/c142ba/1
– MattM
Nov 15 at 15:35
add a comment |
up vote
0
down vote
I think below query will work :
select r.date,sum(rt.alerts),sum(rt.errors) from reports r join report_totals rt on r.id = rt.report_id group by r.date;
I don't think you read my full requirements. I need only 1 result per interval, and it should only grab the latest report per user. That query fulfilled none of those requirements. Here is a sqlfiddle. Please look at my desired output vs the output from your query. sqlfiddle.com/#!9/c142ba/1
– MattM
Nov 15 at 15:35
add a comment |
up vote
0
down vote
up vote
0
down vote
I think below query will work :
select r.date,sum(rt.alerts),sum(rt.errors) from reports r join report_totals rt on r.id = rt.report_id group by r.date;
I think below query will work :
select r.date,sum(rt.alerts),sum(rt.errors) from reports r join report_totals rt on r.id = rt.report_id group by r.date;
answered Nov 15 at 4:33
Vishw Patel
37119
37119
I don't think you read my full requirements. I need only 1 result per interval, and it should only grab the latest report per user. That query fulfilled none of those requirements. Here is a sqlfiddle. Please look at my desired output vs the output from your query. sqlfiddle.com/#!9/c142ba/1
– MattM
Nov 15 at 15:35
add a comment |
I don't think you read my full requirements. I need only 1 result per interval, and it should only grab the latest report per user. That query fulfilled none of those requirements. Here is a sqlfiddle. Please look at my desired output vs the output from your query. sqlfiddle.com/#!9/c142ba/1
– MattM
Nov 15 at 15:35
I don't think you read my full requirements. I need only 1 result per interval, and it should only grab the latest report per user. That query fulfilled none of those requirements. Here is a sqlfiddle. Please look at my desired output vs the output from your query. sqlfiddle.com/#!9/c142ba/1
– MattM
Nov 15 at 15:35
I don't think you read my full requirements. I need only 1 result per interval, and it should only grab the latest report per user. That query fulfilled none of those requirements. Here is a sqlfiddle. Please look at my desired output vs the output from your query. sqlfiddle.com/#!9/c142ba/1
– MattM
Nov 15 at 15:35
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Some of your past answers have not been well-received, and you're in danger of being blocked from answering.
Please pay close attention to the following guidance:
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53309843%2fgetting-data-over-time-grouped-by-latest-related-id-for-each-interval%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown