Getting data over time grouped by latest related_id for each interval











up vote
0
down vote

favorite












I'm trying to write a query that retrieves cumulative results over time that only fetches the latest instance of a result set per interval for each related id.



Example:



Imagine there is a users table, and each user can create reports with issues in them. The issues get aggregated in an report_totals table with a sum of the issue category in them. The tables might look something like this



users
id, email

reports
id, user_id, date

report_totals
id, report_id, errors, alerts


Here's the part I'm struggling with, if a user hasn't submitted a report in the current time interval, it should backfill that data with the previous interval's sum. So let's say we have data that looks like this



reports
1, 1, 2018-1-1
2, 2, 2018-1-1
3, 1, 2018-1-4
4, 1, 2018-2-1
5, 1, 2018-3-1
6, 2, 2018-3-1

report_totals
1, 1, 5, 5
2, 2, 3, 0
3, 3, 2, 0
4, 4, 10, 2
5, 5, 30, 15
6, 6, 1, 2


I want to write a query to return results that look like this



date, errors, alerts
2018-1-1, 5, 0
2018-2-1, 13, 2
2018-3-1, 31, 17


The report interval is 1 month, so it only uses the latest result from each month, for each user, sums them, and will backfill from previous intervals if there isn't a record for that user.



Is something like this possible in MySQL and is it the right way to go about it? Thanks in advance, sorry if this has been answered before, I haven't found anything that quite accomplishes what I'm looking for.










share|improve this question


























    up vote
    0
    down vote

    favorite












    I'm trying to write a query that retrieves cumulative results over time that only fetches the latest instance of a result set per interval for each related id.



    Example:



    Imagine there is a users table, and each user can create reports with issues in them. The issues get aggregated in an report_totals table with a sum of the issue category in them. The tables might look something like this



    users
    id, email

    reports
    id, user_id, date

    report_totals
    id, report_id, errors, alerts


    Here's the part I'm struggling with, if a user hasn't submitted a report in the current time interval, it should backfill that data with the previous interval's sum. So let's say we have data that looks like this



    reports
    1, 1, 2018-1-1
    2, 2, 2018-1-1
    3, 1, 2018-1-4
    4, 1, 2018-2-1
    5, 1, 2018-3-1
    6, 2, 2018-3-1

    report_totals
    1, 1, 5, 5
    2, 2, 3, 0
    3, 3, 2, 0
    4, 4, 10, 2
    5, 5, 30, 15
    6, 6, 1, 2


    I want to write a query to return results that look like this



    date, errors, alerts
    2018-1-1, 5, 0
    2018-2-1, 13, 2
    2018-3-1, 31, 17


    The report interval is 1 month, so it only uses the latest result from each month, for each user, sums them, and will backfill from previous intervals if there isn't a record for that user.



    Is something like this possible in MySQL and is it the right way to go about it? Thanks in advance, sorry if this has been answered before, I haven't found anything that quite accomplishes what I'm looking for.










    share|improve this question
























      up vote
      0
      down vote

      favorite









      up vote
      0
      down vote

      favorite











      I'm trying to write a query that retrieves cumulative results over time that only fetches the latest instance of a result set per interval for each related id.



      Example:



      Imagine there is a users table, and each user can create reports with issues in them. The issues get aggregated in an report_totals table with a sum of the issue category in them. The tables might look something like this



      users
      id, email

      reports
      id, user_id, date

      report_totals
      id, report_id, errors, alerts


      Here's the part I'm struggling with, if a user hasn't submitted a report in the current time interval, it should backfill that data with the previous interval's sum. So let's say we have data that looks like this



      reports
      1, 1, 2018-1-1
      2, 2, 2018-1-1
      3, 1, 2018-1-4
      4, 1, 2018-2-1
      5, 1, 2018-3-1
      6, 2, 2018-3-1

      report_totals
      1, 1, 5, 5
      2, 2, 3, 0
      3, 3, 2, 0
      4, 4, 10, 2
      5, 5, 30, 15
      6, 6, 1, 2


      I want to write a query to return results that look like this



      date, errors, alerts
      2018-1-1, 5, 0
      2018-2-1, 13, 2
      2018-3-1, 31, 17


      The report interval is 1 month, so it only uses the latest result from each month, for each user, sums them, and will backfill from previous intervals if there isn't a record for that user.



      Is something like this possible in MySQL and is it the right way to go about it? Thanks in advance, sorry if this has been answered before, I haven't found anything that quite accomplishes what I'm looking for.










      share|improve this question













      I'm trying to write a query that retrieves cumulative results over time that only fetches the latest instance of a result set per interval for each related id.



      Example:



      Imagine there is a users table, and each user can create reports with issues in them. The issues get aggregated in an report_totals table with a sum of the issue category in them. The tables might look something like this



      users
      id, email

      reports
      id, user_id, date

      report_totals
      id, report_id, errors, alerts


      Here's the part I'm struggling with, if a user hasn't submitted a report in the current time interval, it should backfill that data with the previous interval's sum. So let's say we have data that looks like this



      reports
      1, 1, 2018-1-1
      2, 2, 2018-1-1
      3, 1, 2018-1-4
      4, 1, 2018-2-1
      5, 1, 2018-3-1
      6, 2, 2018-3-1

      report_totals
      1, 1, 5, 5
      2, 2, 3, 0
      3, 3, 2, 0
      4, 4, 10, 2
      5, 5, 30, 15
      6, 6, 1, 2


      I want to write a query to return results that look like this



      date, errors, alerts
      2018-1-1, 5, 0
      2018-2-1, 13, 2
      2018-3-1, 31, 17


      The report interval is 1 month, so it only uses the latest result from each month, for each user, sums them, and will backfill from previous intervals if there isn't a record for that user.



      Is something like this possible in MySQL and is it the right way to go about it? Thanks in advance, sorry if this has been answered before, I haven't found anything that quite accomplishes what I'm looking for.







      mysql






      share|improve this question













      share|improve this question











      share|improve this question




      share|improve this question










      asked Nov 14 at 22:43









      MattM

      346




      346
























          2 Answers
          2






          active

          oldest

          votes

















          up vote
          2
          down vote













          It is a tricky problem, but not unsolvable using MySQL :-) It may be solved in less verbose query and possibly performant manner using Window functions with Frames, available in MySQL version 8.0.2 and above. However, we can also solve this problem using Correlated Subqueries, using a mix of CROSS JOIN and Derived Tables. I will break down the query and try to explain it in steps.



          Since you want to consider previous months' reporting values, even if there is no reporting done in the current month, so our first step would be to generate a "Master Table" which basically contains all possible combinations of user_id and first date of a month. This can be done within the query itself.



          We can get all the unique user_id values from the users table. And, the start dates of all the reporting months can be determined using the following query.



          SELECT MIN(DATE_FORMAT(date, '%Y-%m-01')) AS date, 
          MONTH(date) AS month
          FROM reports
          GROUP BY month


          Now, there may be a possibility that there is no reporting done for a particular month at all. In that case, you will rather have to use a Master Calendar table. However, for practical purposes, it will be extremely rare to have no reporting done at all in a whole month.



          Now, we can get all possible combinations using CROSS JOIN:



          (SELECT MIN(DATE_FORMAT(date, '%Y-%m-01')) AS date, 
          MONTH(date) AS month
          FROM reports
          GROUP BY month) AS all_mon
          CROSS JOIN
          users AS u


          Now, we can use Correlated Subquery to determine errors and alerts for every row in the above-generated table. We will look for the latest row from the report_totals table where the user_id matches, and month of the reporting being less than or equal to the current month. For errors, the subquery would look as follows:



          SELECT rt1.errors 
          FROM report_totals AS rt1
          JOIN reports AS r1 ON r1.id = rt1.report_id
          WHERE r1.user_id = u.id AND
          MONTH(r1.date) <= all_mon.month
          ORDER BY r1.date DESC LIMIT 1


          A similar subquery would be used for determining the alerts.



          Finally, we will take this complete result-set as a Derived Table, and do a GROUP BY on the month (first date of the month), and compute the SUM() on alerts and errors for all the users.



          Final (and complete) query would look as follows:



          SELECT dt.date,
          Sum(dt.errors) AS errors,
          Sum(dt.alerts) AS alerts
          FROM (SELECT all_mon.date,
          u.id,
          (SELECT rt1.errors
          FROM report_totals AS rt1
          JOIN reports AS r1
          ON r1.id = rt1.report_id
          WHERE r1.user_id = u.id
          AND Month(r1.date) <= all_mon.month
          ORDER BY r1.date DESC
          LIMIT 1) AS errors,
          (SELECT rt1.alerts
          FROM report_totals AS rt1
          JOIN reports AS r1
          ON r1.id = rt1.report_id
          WHERE r1.user_id = u.id
          AND Month(r1.date) <= all_mon.month
          ORDER BY r1.date DESC
          LIMIT 1) AS alerts
          FROM (SELECT Min(Date_format(date, '%Y-%m-01')) AS date,
          Month(date) AS month
          FROM reports
          GROUP BY month) AS all_mon
          CROSS JOIN users AS u) AS dt
          GROUP BY dt.date


          View on DB Fiddle





          Result:



          | date       | errors | alerts |
          | ---------- | ------ | ------ |
          | 2018-01-01 | 5 | 0 |
          | 2018-02-01 | 13 | 2 |
          | 2018-03-01 | 31 | 17 |




          Edit 1: First Optimization



          I did not like the usage of Two similar correlated subqueries to get errors and alerts separately. But, this is a limitation with MySQL, that it does not allow more than one operand in such subqueries. So instead, as a hack, we can Concat() them into a single string, using some delimiter, such as |. This will reduce the subqueries to be used into one.



          Now, in the outermost query, we can use string functions like Substring_Index() and Cast() to extract the respective values out as number(s) and do Sum() operations accordingly.



          Query #2



          SELECT dt.date,
          Sum(Cast(Substring_index(dt.error_alerts, '|', 1) AS UNSIGNED)) AS
          errors,
          Sum(Cast(Substring_index(dt.error_alerts, '|', -1) AS UNSIGNED)) AS
          alerts
          FROM (SELECT all_mon.date,
          u.id,
          (SELECT Concat(rt1.errors, '|', rt1.alerts)
          FROM report_totals AS rt1
          JOIN reports AS r1
          ON r1.id = rt1.report_id
          WHERE r1.user_id = u.id
          AND Month(r1.date) <= all_mon.month
          ORDER BY r1.date DESC
          LIMIT 1) AS error_alerts
          FROM (SELECT Min(Date_format(date, '%Y-%m-01')) AS date,
          Month(date) AS month
          FROM reports
          GROUP BY month) AS all_mon
          CROSS JOIN users AS u) AS dt
          GROUP BY dt.date


          Result



          | date       | errors | alerts |
          | ---------- | ------ | ------ |
          | 2018-01-01 | 5 | 0 |
          | 2018-02-01 | 13 | 2 |
          | 2018-03-01 | 31 | 17 |


          View on DB Fiddle






          share|improve this answer






























            up vote
            0
            down vote













            I think below query will work :



            select r.date,sum(rt.alerts),sum(rt.errors) from reports r join report_totals rt on r.id = rt.report_id group by r.date;





            share|improve this answer





















            • I don't think you read my full requirements. I need only 1 result per interval, and it should only grab the latest report per user. That query fulfilled none of those requirements. Here is a sqlfiddle. Please look at my desired output vs the output from your query. sqlfiddle.com/#!9/c142ba/1
              – MattM
              Nov 15 at 15:35













            Your Answer






            StackExchange.ifUsing("editor", function () {
            StackExchange.using("externalEditor", function () {
            StackExchange.using("snippets", function () {
            StackExchange.snippets.init();
            });
            });
            }, "code-snippets");

            StackExchange.ready(function() {
            var channelOptions = {
            tags: "".split(" "),
            id: "1"
            };
            initTagRenderer("".split(" "), "".split(" "), channelOptions);

            StackExchange.using("externalEditor", function() {
            // Have to fire editor after snippets, if snippets enabled
            if (StackExchange.settings.snippets.snippetsEnabled) {
            StackExchange.using("snippets", function() {
            createEditor();
            });
            }
            else {
            createEditor();
            }
            });

            function createEditor() {
            StackExchange.prepareEditor({
            heartbeatType: 'answer',
            convertImagesToLinks: true,
            noModals: true,
            showLowRepImageUploadWarning: true,
            reputationToPostImages: 10,
            bindNavPrevention: true,
            postfix: "",
            imageUploader: {
            brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
            contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
            allowUrls: true
            },
            onDemand: true,
            discardSelector: ".discard-answer"
            ,immediatelyShowMarkdownHelp:true
            });


            }
            });














            draft saved

            draft discarded


















            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53309843%2fgetting-data-over-time-grouped-by-latest-related-id-for-each-interval%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown

























            2 Answers
            2






            active

            oldest

            votes








            2 Answers
            2






            active

            oldest

            votes









            active

            oldest

            votes






            active

            oldest

            votes








            up vote
            2
            down vote













            It is a tricky problem, but not unsolvable using MySQL :-) It may be solved in less verbose query and possibly performant manner using Window functions with Frames, available in MySQL version 8.0.2 and above. However, we can also solve this problem using Correlated Subqueries, using a mix of CROSS JOIN and Derived Tables. I will break down the query and try to explain it in steps.



            Since you want to consider previous months' reporting values, even if there is no reporting done in the current month, so our first step would be to generate a "Master Table" which basically contains all possible combinations of user_id and first date of a month. This can be done within the query itself.



            We can get all the unique user_id values from the users table. And, the start dates of all the reporting months can be determined using the following query.



            SELECT MIN(DATE_FORMAT(date, '%Y-%m-01')) AS date, 
            MONTH(date) AS month
            FROM reports
            GROUP BY month


            Now, there may be a possibility that there is no reporting done for a particular month at all. In that case, you will rather have to use a Master Calendar table. However, for practical purposes, it will be extremely rare to have no reporting done at all in a whole month.



            Now, we can get all possible combinations using CROSS JOIN:



            (SELECT MIN(DATE_FORMAT(date, '%Y-%m-01')) AS date, 
            MONTH(date) AS month
            FROM reports
            GROUP BY month) AS all_mon
            CROSS JOIN
            users AS u


            Now, we can use Correlated Subquery to determine errors and alerts for every row in the above-generated table. We will look for the latest row from the report_totals table where the user_id matches, and month of the reporting being less than or equal to the current month. For errors, the subquery would look as follows:



            SELECT rt1.errors 
            FROM report_totals AS rt1
            JOIN reports AS r1 ON r1.id = rt1.report_id
            WHERE r1.user_id = u.id AND
            MONTH(r1.date) <= all_mon.month
            ORDER BY r1.date DESC LIMIT 1


            A similar subquery would be used for determining the alerts.



            Finally, we will take this complete result-set as a Derived Table, and do a GROUP BY on the month (first date of the month), and compute the SUM() on alerts and errors for all the users.



            Final (and complete) query would look as follows:



            SELECT dt.date,
            Sum(dt.errors) AS errors,
            Sum(dt.alerts) AS alerts
            FROM (SELECT all_mon.date,
            u.id,
            (SELECT rt1.errors
            FROM report_totals AS rt1
            JOIN reports AS r1
            ON r1.id = rt1.report_id
            WHERE r1.user_id = u.id
            AND Month(r1.date) <= all_mon.month
            ORDER BY r1.date DESC
            LIMIT 1) AS errors,
            (SELECT rt1.alerts
            FROM report_totals AS rt1
            JOIN reports AS r1
            ON r1.id = rt1.report_id
            WHERE r1.user_id = u.id
            AND Month(r1.date) <= all_mon.month
            ORDER BY r1.date DESC
            LIMIT 1) AS alerts
            FROM (SELECT Min(Date_format(date, '%Y-%m-01')) AS date,
            Month(date) AS month
            FROM reports
            GROUP BY month) AS all_mon
            CROSS JOIN users AS u) AS dt
            GROUP BY dt.date


            View on DB Fiddle





            Result:



            | date       | errors | alerts |
            | ---------- | ------ | ------ |
            | 2018-01-01 | 5 | 0 |
            | 2018-02-01 | 13 | 2 |
            | 2018-03-01 | 31 | 17 |




            Edit 1: First Optimization



            I did not like the usage of Two similar correlated subqueries to get errors and alerts separately. But, this is a limitation with MySQL, that it does not allow more than one operand in such subqueries. So instead, as a hack, we can Concat() them into a single string, using some delimiter, such as |. This will reduce the subqueries to be used into one.



            Now, in the outermost query, we can use string functions like Substring_Index() and Cast() to extract the respective values out as number(s) and do Sum() operations accordingly.



            Query #2



            SELECT dt.date,
            Sum(Cast(Substring_index(dt.error_alerts, '|', 1) AS UNSIGNED)) AS
            errors,
            Sum(Cast(Substring_index(dt.error_alerts, '|', -1) AS UNSIGNED)) AS
            alerts
            FROM (SELECT all_mon.date,
            u.id,
            (SELECT Concat(rt1.errors, '|', rt1.alerts)
            FROM report_totals AS rt1
            JOIN reports AS r1
            ON r1.id = rt1.report_id
            WHERE r1.user_id = u.id
            AND Month(r1.date) <= all_mon.month
            ORDER BY r1.date DESC
            LIMIT 1) AS error_alerts
            FROM (SELECT Min(Date_format(date, '%Y-%m-01')) AS date,
            Month(date) AS month
            FROM reports
            GROUP BY month) AS all_mon
            CROSS JOIN users AS u) AS dt
            GROUP BY dt.date


            Result



            | date       | errors | alerts |
            | ---------- | ------ | ------ |
            | 2018-01-01 | 5 | 0 |
            | 2018-02-01 | 13 | 2 |
            | 2018-03-01 | 31 | 17 |


            View on DB Fiddle






            share|improve this answer



























              up vote
              2
              down vote













              It is a tricky problem, but not unsolvable using MySQL :-) It may be solved in less verbose query and possibly performant manner using Window functions with Frames, available in MySQL version 8.0.2 and above. However, we can also solve this problem using Correlated Subqueries, using a mix of CROSS JOIN and Derived Tables. I will break down the query and try to explain it in steps.



              Since you want to consider previous months' reporting values, even if there is no reporting done in the current month, so our first step would be to generate a "Master Table" which basically contains all possible combinations of user_id and first date of a month. This can be done within the query itself.



              We can get all the unique user_id values from the users table. And, the start dates of all the reporting months can be determined using the following query.



              SELECT MIN(DATE_FORMAT(date, '%Y-%m-01')) AS date, 
              MONTH(date) AS month
              FROM reports
              GROUP BY month


              Now, there may be a possibility that there is no reporting done for a particular month at all. In that case, you will rather have to use a Master Calendar table. However, for practical purposes, it will be extremely rare to have no reporting done at all in a whole month.



              Now, we can get all possible combinations using CROSS JOIN:



              (SELECT MIN(DATE_FORMAT(date, '%Y-%m-01')) AS date, 
              MONTH(date) AS month
              FROM reports
              GROUP BY month) AS all_mon
              CROSS JOIN
              users AS u


              Now, we can use Correlated Subquery to determine errors and alerts for every row in the above-generated table. We will look for the latest row from the report_totals table where the user_id matches, and month of the reporting being less than or equal to the current month. For errors, the subquery would look as follows:



              SELECT rt1.errors 
              FROM report_totals AS rt1
              JOIN reports AS r1 ON r1.id = rt1.report_id
              WHERE r1.user_id = u.id AND
              MONTH(r1.date) <= all_mon.month
              ORDER BY r1.date DESC LIMIT 1


              A similar subquery would be used for determining the alerts.



              Finally, we will take this complete result-set as a Derived Table, and do a GROUP BY on the month (first date of the month), and compute the SUM() on alerts and errors for all the users.



              Final (and complete) query would look as follows:



              SELECT dt.date,
              Sum(dt.errors) AS errors,
              Sum(dt.alerts) AS alerts
              FROM (SELECT all_mon.date,
              u.id,
              (SELECT rt1.errors
              FROM report_totals AS rt1
              JOIN reports AS r1
              ON r1.id = rt1.report_id
              WHERE r1.user_id = u.id
              AND Month(r1.date) <= all_mon.month
              ORDER BY r1.date DESC
              LIMIT 1) AS errors,
              (SELECT rt1.alerts
              FROM report_totals AS rt1
              JOIN reports AS r1
              ON r1.id = rt1.report_id
              WHERE r1.user_id = u.id
              AND Month(r1.date) <= all_mon.month
              ORDER BY r1.date DESC
              LIMIT 1) AS alerts
              FROM (SELECT Min(Date_format(date, '%Y-%m-01')) AS date,
              Month(date) AS month
              FROM reports
              GROUP BY month) AS all_mon
              CROSS JOIN users AS u) AS dt
              GROUP BY dt.date


              View on DB Fiddle





              Result:



              | date       | errors | alerts |
              | ---------- | ------ | ------ |
              | 2018-01-01 | 5 | 0 |
              | 2018-02-01 | 13 | 2 |
              | 2018-03-01 | 31 | 17 |




              Edit 1: First Optimization



              I did not like the usage of Two similar correlated subqueries to get errors and alerts separately. But, this is a limitation with MySQL, that it does not allow more than one operand in such subqueries. So instead, as a hack, we can Concat() them into a single string, using some delimiter, such as |. This will reduce the subqueries to be used into one.



              Now, in the outermost query, we can use string functions like Substring_Index() and Cast() to extract the respective values out as number(s) and do Sum() operations accordingly.



              Query #2



              SELECT dt.date,
              Sum(Cast(Substring_index(dt.error_alerts, '|', 1) AS UNSIGNED)) AS
              errors,
              Sum(Cast(Substring_index(dt.error_alerts, '|', -1) AS UNSIGNED)) AS
              alerts
              FROM (SELECT all_mon.date,
              u.id,
              (SELECT Concat(rt1.errors, '|', rt1.alerts)
              FROM report_totals AS rt1
              JOIN reports AS r1
              ON r1.id = rt1.report_id
              WHERE r1.user_id = u.id
              AND Month(r1.date) <= all_mon.month
              ORDER BY r1.date DESC
              LIMIT 1) AS error_alerts
              FROM (SELECT Min(Date_format(date, '%Y-%m-01')) AS date,
              Month(date) AS month
              FROM reports
              GROUP BY month) AS all_mon
              CROSS JOIN users AS u) AS dt
              GROUP BY dt.date


              Result



              | date       | errors | alerts |
              | ---------- | ------ | ------ |
              | 2018-01-01 | 5 | 0 |
              | 2018-02-01 | 13 | 2 |
              | 2018-03-01 | 31 | 17 |


              View on DB Fiddle






              share|improve this answer

























                up vote
                2
                down vote










                up vote
                2
                down vote









                It is a tricky problem, but not unsolvable using MySQL :-) It may be solved in less verbose query and possibly performant manner using Window functions with Frames, available in MySQL version 8.0.2 and above. However, we can also solve this problem using Correlated Subqueries, using a mix of CROSS JOIN and Derived Tables. I will break down the query and try to explain it in steps.



                Since you want to consider previous months' reporting values, even if there is no reporting done in the current month, so our first step would be to generate a "Master Table" which basically contains all possible combinations of user_id and first date of a month. This can be done within the query itself.



                We can get all the unique user_id values from the users table. And, the start dates of all the reporting months can be determined using the following query.



                SELECT MIN(DATE_FORMAT(date, '%Y-%m-01')) AS date, 
                MONTH(date) AS month
                FROM reports
                GROUP BY month


                Now, there may be a possibility that there is no reporting done for a particular month at all. In that case, you will rather have to use a Master Calendar table. However, for practical purposes, it will be extremely rare to have no reporting done at all in a whole month.



                Now, we can get all possible combinations using CROSS JOIN:



                (SELECT MIN(DATE_FORMAT(date, '%Y-%m-01')) AS date, 
                MONTH(date) AS month
                FROM reports
                GROUP BY month) AS all_mon
                CROSS JOIN
                users AS u


                Now, we can use Correlated Subquery to determine errors and alerts for every row in the above-generated table. We will look for the latest row from the report_totals table where the user_id matches, and month of the reporting being less than or equal to the current month. For errors, the subquery would look as follows:



                SELECT rt1.errors 
                FROM report_totals AS rt1
                JOIN reports AS r1 ON r1.id = rt1.report_id
                WHERE r1.user_id = u.id AND
                MONTH(r1.date) <= all_mon.month
                ORDER BY r1.date DESC LIMIT 1


                A similar subquery would be used for determining the alerts.



                Finally, we will take this complete result-set as a Derived Table, and do a GROUP BY on the month (first date of the month), and compute the SUM() on alerts and errors for all the users.



                Final (and complete) query would look as follows:



                SELECT dt.date,
                Sum(dt.errors) AS errors,
                Sum(dt.alerts) AS alerts
                FROM (SELECT all_mon.date,
                u.id,
                (SELECT rt1.errors
                FROM report_totals AS rt1
                JOIN reports AS r1
                ON r1.id = rt1.report_id
                WHERE r1.user_id = u.id
                AND Month(r1.date) <= all_mon.month
                ORDER BY r1.date DESC
                LIMIT 1) AS errors,
                (SELECT rt1.alerts
                FROM report_totals AS rt1
                JOIN reports AS r1
                ON r1.id = rt1.report_id
                WHERE r1.user_id = u.id
                AND Month(r1.date) <= all_mon.month
                ORDER BY r1.date DESC
                LIMIT 1) AS alerts
                FROM (SELECT Min(Date_format(date, '%Y-%m-01')) AS date,
                Month(date) AS month
                FROM reports
                GROUP BY month) AS all_mon
                CROSS JOIN users AS u) AS dt
                GROUP BY dt.date


                View on DB Fiddle





                Result:



                | date       | errors | alerts |
                | ---------- | ------ | ------ |
                | 2018-01-01 | 5 | 0 |
                | 2018-02-01 | 13 | 2 |
                | 2018-03-01 | 31 | 17 |




                Edit 1: First Optimization



                I did not like the usage of Two similar correlated subqueries to get errors and alerts separately. But, this is a limitation with MySQL, that it does not allow more than one operand in such subqueries. So instead, as a hack, we can Concat() them into a single string, using some delimiter, such as |. This will reduce the subqueries to be used into one.



                Now, in the outermost query, we can use string functions like Substring_Index() and Cast() to extract the respective values out as number(s) and do Sum() operations accordingly.



                Query #2



                SELECT dt.date,
                Sum(Cast(Substring_index(dt.error_alerts, '|', 1) AS UNSIGNED)) AS
                errors,
                Sum(Cast(Substring_index(dt.error_alerts, '|', -1) AS UNSIGNED)) AS
                alerts
                FROM (SELECT all_mon.date,
                u.id,
                (SELECT Concat(rt1.errors, '|', rt1.alerts)
                FROM report_totals AS rt1
                JOIN reports AS r1
                ON r1.id = rt1.report_id
                WHERE r1.user_id = u.id
                AND Month(r1.date) <= all_mon.month
                ORDER BY r1.date DESC
                LIMIT 1) AS error_alerts
                FROM (SELECT Min(Date_format(date, '%Y-%m-01')) AS date,
                Month(date) AS month
                FROM reports
                GROUP BY month) AS all_mon
                CROSS JOIN users AS u) AS dt
                GROUP BY dt.date


                Result



                | date       | errors | alerts |
                | ---------- | ------ | ------ |
                | 2018-01-01 | 5 | 0 |
                | 2018-02-01 | 13 | 2 |
                | 2018-03-01 | 31 | 17 |


                View on DB Fiddle






                share|improve this answer














                It is a tricky problem, but not unsolvable using MySQL :-) It may be solved in less verbose query and possibly performant manner using Window functions with Frames, available in MySQL version 8.0.2 and above. However, we can also solve this problem using Correlated Subqueries, using a mix of CROSS JOIN and Derived Tables. I will break down the query and try to explain it in steps.



                Since you want to consider previous months' reporting values, even if there is no reporting done in the current month, so our first step would be to generate a "Master Table" which basically contains all possible combinations of user_id and first date of a month. This can be done within the query itself.



                We can get all the unique user_id values from the users table. And, the start dates of all the reporting months can be determined using the following query.



                SELECT MIN(DATE_FORMAT(date, '%Y-%m-01')) AS date, 
                MONTH(date) AS month
                FROM reports
                GROUP BY month


                Now, there may be a possibility that there is no reporting done for a particular month at all. In that case, you will rather have to use a Master Calendar table. However, for practical purposes, it will be extremely rare to have no reporting done at all in a whole month.



                Now, we can get all possible combinations using CROSS JOIN:



                (SELECT MIN(DATE_FORMAT(date, '%Y-%m-01')) AS date, 
                MONTH(date) AS month
                FROM reports
                GROUP BY month) AS all_mon
                CROSS JOIN
                users AS u


                Now, we can use Correlated Subquery to determine errors and alerts for every row in the above-generated table. We will look for the latest row from the report_totals table where the user_id matches, and month of the reporting being less than or equal to the current month. For errors, the subquery would look as follows:



                SELECT rt1.errors 
                FROM report_totals AS rt1
                JOIN reports AS r1 ON r1.id = rt1.report_id
                WHERE r1.user_id = u.id AND
                MONTH(r1.date) <= all_mon.month
                ORDER BY r1.date DESC LIMIT 1


                A similar subquery would be used for determining the alerts.



                Finally, we will take this complete result-set as a Derived Table, and do a GROUP BY on the month (first date of the month), and compute the SUM() on alerts and errors for all the users.



                Final (and complete) query would look as follows:



                SELECT dt.date,
                Sum(dt.errors) AS errors,
                Sum(dt.alerts) AS alerts
                FROM (SELECT all_mon.date,
                u.id,
                (SELECT rt1.errors
                FROM report_totals AS rt1
                JOIN reports AS r1
                ON r1.id = rt1.report_id
                WHERE r1.user_id = u.id
                AND Month(r1.date) <= all_mon.month
                ORDER BY r1.date DESC
                LIMIT 1) AS errors,
                (SELECT rt1.alerts
                FROM report_totals AS rt1
                JOIN reports AS r1
                ON r1.id = rt1.report_id
                WHERE r1.user_id = u.id
                AND Month(r1.date) <= all_mon.month
                ORDER BY r1.date DESC
                LIMIT 1) AS alerts
                FROM (SELECT Min(Date_format(date, '%Y-%m-01')) AS date,
                Month(date) AS month
                FROM reports
                GROUP BY month) AS all_mon
                CROSS JOIN users AS u) AS dt
                GROUP BY dt.date


                View on DB Fiddle





                Result:



                | date       | errors | alerts |
                | ---------- | ------ | ------ |
                | 2018-01-01 | 5 | 0 |
                | 2018-02-01 | 13 | 2 |
                | 2018-03-01 | 31 | 17 |




                Edit 1: First Optimization



                I did not like the usage of Two similar correlated subqueries to get errors and alerts separately. But, this is a limitation with MySQL, that it does not allow more than one operand in such subqueries. So instead, as a hack, we can Concat() them into a single string, using some delimiter, such as |. This will reduce the subqueries to be used into one.



                Now, in the outermost query, we can use string functions like Substring_Index() and Cast() to extract the respective values out as number(s) and do Sum() operations accordingly.



                Query #2



                SELECT dt.date,
                Sum(Cast(Substring_index(dt.error_alerts, '|', 1) AS UNSIGNED)) AS
                errors,
                Sum(Cast(Substring_index(dt.error_alerts, '|', -1) AS UNSIGNED)) AS
                alerts
                FROM (SELECT all_mon.date,
                u.id,
                (SELECT Concat(rt1.errors, '|', rt1.alerts)
                FROM report_totals AS rt1
                JOIN reports AS r1
                ON r1.id = rt1.report_id
                WHERE r1.user_id = u.id
                AND Month(r1.date) <= all_mon.month
                ORDER BY r1.date DESC
                LIMIT 1) AS error_alerts
                FROM (SELECT Min(Date_format(date, '%Y-%m-01')) AS date,
                Month(date) AS month
                FROM reports
                GROUP BY month) AS all_mon
                CROSS JOIN users AS u) AS dt
                GROUP BY dt.date


                Result



                | date       | errors | alerts |
                | ---------- | ------ | ------ |
                | 2018-01-01 | 5 | 0 |
                | 2018-02-01 | 13 | 2 |
                | 2018-03-01 | 31 | 17 |


                View on DB Fiddle







                share|improve this answer














                share|improve this answer



                share|improve this answer








                edited Nov 17 at 12:48

























                answered Nov 17 at 10:58









                Madhur Bhaiya

                19.1k62236




                19.1k62236
























                    up vote
                    0
                    down vote













                    I think below query will work :



                    select r.date,sum(rt.alerts),sum(rt.errors) from reports r join report_totals rt on r.id = rt.report_id group by r.date;





                    share|improve this answer





















                    • I don't think you read my full requirements. I need only 1 result per interval, and it should only grab the latest report per user. That query fulfilled none of those requirements. Here is a sqlfiddle. Please look at my desired output vs the output from your query. sqlfiddle.com/#!9/c142ba/1
                      – MattM
                      Nov 15 at 15:35

















                    up vote
                    0
                    down vote













                    I think below query will work :



                    select r.date,sum(rt.alerts),sum(rt.errors) from reports r join report_totals rt on r.id = rt.report_id group by r.date;





                    share|improve this answer





















                    • I don't think you read my full requirements. I need only 1 result per interval, and it should only grab the latest report per user. That query fulfilled none of those requirements. Here is a sqlfiddle. Please look at my desired output vs the output from your query. sqlfiddle.com/#!9/c142ba/1
                      – MattM
                      Nov 15 at 15:35















                    up vote
                    0
                    down vote










                    up vote
                    0
                    down vote









                    I think below query will work :



                    select r.date,sum(rt.alerts),sum(rt.errors) from reports r join report_totals rt on r.id = rt.report_id group by r.date;





                    share|improve this answer












                    I think below query will work :



                    select r.date,sum(rt.alerts),sum(rt.errors) from reports r join report_totals rt on r.id = rt.report_id group by r.date;






                    share|improve this answer












                    share|improve this answer



                    share|improve this answer










                    answered Nov 15 at 4:33









                    Vishw Patel

                    37119




                    37119












                    • I don't think you read my full requirements. I need only 1 result per interval, and it should only grab the latest report per user. That query fulfilled none of those requirements. Here is a sqlfiddle. Please look at my desired output vs the output from your query. sqlfiddle.com/#!9/c142ba/1
                      – MattM
                      Nov 15 at 15:35




















                    • I don't think you read my full requirements. I need only 1 result per interval, and it should only grab the latest report per user. That query fulfilled none of those requirements. Here is a sqlfiddle. Please look at my desired output vs the output from your query. sqlfiddle.com/#!9/c142ba/1
                      – MattM
                      Nov 15 at 15:35


















                    I don't think you read my full requirements. I need only 1 result per interval, and it should only grab the latest report per user. That query fulfilled none of those requirements. Here is a sqlfiddle. Please look at my desired output vs the output from your query. sqlfiddle.com/#!9/c142ba/1
                    – MattM
                    Nov 15 at 15:35






                    I don't think you read my full requirements. I need only 1 result per interval, and it should only grab the latest report per user. That query fulfilled none of those requirements. Here is a sqlfiddle. Please look at my desired output vs the output from your query. sqlfiddle.com/#!9/c142ba/1
                    – MattM
                    Nov 15 at 15:35




















                    draft saved

                    draft discarded




















































                    Thanks for contributing an answer to Stack Overflow!


                    • Please be sure to answer the question. Provide details and share your research!

                    But avoid



                    • Asking for help, clarification, or responding to other answers.

                    • Making statements based on opinion; back them up with references or personal experience.


                    To learn more, see our tips on writing great answers.





                    Some of your past answers have not been well-received, and you're in danger of being blocked from answering.


                    Please pay close attention to the following guidance:


                    • Please be sure to answer the question. Provide details and share your research!

                    But avoid



                    • Asking for help, clarification, or responding to other answers.

                    • Making statements based on opinion; back them up with references or personal experience.


                    To learn more, see our tips on writing great answers.




                    draft saved


                    draft discarded














                    StackExchange.ready(
                    function () {
                    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53309843%2fgetting-data-over-time-grouped-by-latest-related-id-for-each-interval%23new-answer', 'question_page');
                    }
                    );

                    Post as a guest















                    Required, but never shown





















































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown

































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown







                    Popular posts from this blog

                    How to change which sound is reproduced for terminal bell?

                    Can I use Tabulator js library in my java Spring + Thymeleaf project?

                    Title Spacing in Bjornstrup Chapter, Removing Chapter Number From Contents