Day N retention in BigQuery, error message: Invalid time zone












1















I'm trying to calculate Day N retention on a dataset in google Big Query. The table consists of one month of data from a mobile app and I want to find out how many users returned each day. I am using standardSQL. So far the code I have is



SELECT date(d1.eventDate) as dt,
COUNT(distinct d1.userID) as total_users,
COUNT(distinct d2.userID) as retained_users
FROM `dataset` as d1
LEFT JOIN `dataset` as d2 ON
d1.userID = d2.userID
AND date(d1.eventDate) = date(datetime(d2.eventDate, '-1 day'))
GROUP BY 1
ORDER BY 1"


When I try to execute I get the error message



  Error: Invalid time zone: -1 day [invalidQuery]


My table structure is



    eventDate           | UserID | 
2016-05-06 00:00:00 UTC | 100000 |
2016-05-06 00:00:00 UTC | 200000 |
2016-05-06 00:00:00 UTC | 300000 |


What should I be using instead of '-1 day'?










share|improve this question



























    1















    I'm trying to calculate Day N retention on a dataset in google Big Query. The table consists of one month of data from a mobile app and I want to find out how many users returned each day. I am using standardSQL. So far the code I have is



    SELECT date(d1.eventDate) as dt,
    COUNT(distinct d1.userID) as total_users,
    COUNT(distinct d2.userID) as retained_users
    FROM `dataset` as d1
    LEFT JOIN `dataset` as d2 ON
    d1.userID = d2.userID
    AND date(d1.eventDate) = date(datetime(d2.eventDate, '-1 day'))
    GROUP BY 1
    ORDER BY 1"


    When I try to execute I get the error message



      Error: Invalid time zone: -1 day [invalidQuery]


    My table structure is



        eventDate           | UserID | 
    2016-05-06 00:00:00 UTC | 100000 |
    2016-05-06 00:00:00 UTC | 200000 |
    2016-05-06 00:00:00 UTC | 300000 |


    What should I be using instead of '-1 day'?










    share|improve this question

























      1












      1








      1








      I'm trying to calculate Day N retention on a dataset in google Big Query. The table consists of one month of data from a mobile app and I want to find out how many users returned each day. I am using standardSQL. So far the code I have is



      SELECT date(d1.eventDate) as dt,
      COUNT(distinct d1.userID) as total_users,
      COUNT(distinct d2.userID) as retained_users
      FROM `dataset` as d1
      LEFT JOIN `dataset` as d2 ON
      d1.userID = d2.userID
      AND date(d1.eventDate) = date(datetime(d2.eventDate, '-1 day'))
      GROUP BY 1
      ORDER BY 1"


      When I try to execute I get the error message



        Error: Invalid time zone: -1 day [invalidQuery]


      My table structure is



          eventDate           | UserID | 
      2016-05-06 00:00:00 UTC | 100000 |
      2016-05-06 00:00:00 UTC | 200000 |
      2016-05-06 00:00:00 UTC | 300000 |


      What should I be using instead of '-1 day'?










      share|improve this question














      I'm trying to calculate Day N retention on a dataset in google Big Query. The table consists of one month of data from a mobile app and I want to find out how many users returned each day. I am using standardSQL. So far the code I have is



      SELECT date(d1.eventDate) as dt,
      COUNT(distinct d1.userID) as total_users,
      COUNT(distinct d2.userID) as retained_users
      FROM `dataset` as d1
      LEFT JOIN `dataset` as d2 ON
      d1.userID = d2.userID
      AND date(d1.eventDate) = date(datetime(d2.eventDate, '-1 day'))
      GROUP BY 1
      ORDER BY 1"


      When I try to execute I get the error message



        Error: Invalid time zone: -1 day [invalidQuery]


      My table structure is



          eventDate           | UserID | 
      2016-05-06 00:00:00 UTC | 100000 |
      2016-05-06 00:00:00 UTC | 200000 |
      2016-05-06 00:00:00 UTC | 300000 |


      What should I be using instead of '-1 day'?







      sql datetime timezone google-bigquery standard-sql






      share|improve this question













      share|improve this question











      share|improve this question




      share|improve this question










      asked Nov 19 '18 at 21:52









      SophieVMCSophieVMC

      61




      61
























          2 Answers
          2






          active

          oldest

          votes


















          1














          TIMESTAMP_SUB would fix the query as written, but might not be good enough as a solution for performance reasons. But at least it gets you the 1 day substraction:



          SELECT date(d1.created_at) as dt,
          COUNT(distinct d1.actor.id) as total_users,
          COUNT(distinct d2.actor.id) as retained_users
          FROM `githubarchive.month.201810` as d1
          LEFT JOIN `githubarchive.month.201810` as d2 ON
          d1.actor.id = d2.actor.id
          AND date(d1.created_at) = date(TIMESTAMP_SUB(d2.created_at, INTERVAL -24 HOUR))
          GROUP BY 1
          ORDER BY 1


          To improve performance, do some de-duping before the JOIN:



          SELECT day as dt,
          COUNT(distinct d1.id) as total_users,
          COUNT(distinct d2.id) as retained_users
          FROM (SELECT DISTINCT actor.id, DATE(created_at) day FROM `githubarchive.month.201810`)as d1
          LEFT JOIN (SELECT DISTINCT actor.id, DATE(TIMESTAMP_SUB(created_at, INTERVAL -24 HOUR)) day FROM `githubarchive.month.201810`) as d2
          USING (id, day)
          GROUP BY 1
          ORDER BY 1


          enter image description here






          share|improve this answer

































            0














            Below is for BigQuery Standard SQL and is further optimized to not to use any JOINs but rather using analytic functions



            #standardSQL
            SELECT
            day,
            COUNT(1) total_users,
            COUNTIF(delta = 1) retained_users
            FROM (
            SELECT
            day, id,
            DATE_DIFF(day, LAG(day) OVER(PARTITION BY id ORDER BY day), DAY) delta
            FROM (
            SELECT DISTINCT
            DATE(created_at) day,
            actor.id
            FROM `githubarchive.month.201810`
            )
            )
            GROUP BY day
            ORDER BY day


            or, if to use original question's notation:



            #standardSQL
            SELECT
            day,
            COUNT(1) total_users,
            COUNTIF(delta = 1) retained_users
            FROM (
            SELECT
            day, userID,
            DATE_DIFF(day, LAG(day) OVER(PARTITION BY userID ORDER BY day), DAY) delta
            FROM (
            SELECT DISTINCT
            DATE(eventDate) day,
            userID
            FROM `project.dataset.table`
            )
            )
            GROUP BY day
            ORDER BY day





            share|improve this answer























              Your Answer






              StackExchange.ifUsing("editor", function () {
              StackExchange.using("externalEditor", function () {
              StackExchange.using("snippets", function () {
              StackExchange.snippets.init();
              });
              });
              }, "code-snippets");

              StackExchange.ready(function() {
              var channelOptions = {
              tags: "".split(" "),
              id: "1"
              };
              initTagRenderer("".split(" "), "".split(" "), channelOptions);

              StackExchange.using("externalEditor", function() {
              // Have to fire editor after snippets, if snippets enabled
              if (StackExchange.settings.snippets.snippetsEnabled) {
              StackExchange.using("snippets", function() {
              createEditor();
              });
              }
              else {
              createEditor();
              }
              });

              function createEditor() {
              StackExchange.prepareEditor({
              heartbeatType: 'answer',
              autoActivateHeartbeat: false,
              convertImagesToLinks: true,
              noModals: true,
              showLowRepImageUploadWarning: true,
              reputationToPostImages: 10,
              bindNavPrevention: true,
              postfix: "",
              imageUploader: {
              brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
              contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
              allowUrls: true
              },
              onDemand: true,
              discardSelector: ".discard-answer"
              ,immediatelyShowMarkdownHelp:true
              });


              }
              });














              draft saved

              draft discarded


















              StackExchange.ready(
              function () {
              StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53383156%2fday-n-retention-in-bigquery-error-message-invalid-time-zone%23new-answer', 'question_page');
              }
              );

              Post as a guest















              Required, but never shown

























              2 Answers
              2






              active

              oldest

              votes








              2 Answers
              2






              active

              oldest

              votes









              active

              oldest

              votes






              active

              oldest

              votes









              1














              TIMESTAMP_SUB would fix the query as written, but might not be good enough as a solution for performance reasons. But at least it gets you the 1 day substraction:



              SELECT date(d1.created_at) as dt,
              COUNT(distinct d1.actor.id) as total_users,
              COUNT(distinct d2.actor.id) as retained_users
              FROM `githubarchive.month.201810` as d1
              LEFT JOIN `githubarchive.month.201810` as d2 ON
              d1.actor.id = d2.actor.id
              AND date(d1.created_at) = date(TIMESTAMP_SUB(d2.created_at, INTERVAL -24 HOUR))
              GROUP BY 1
              ORDER BY 1


              To improve performance, do some de-duping before the JOIN:



              SELECT day as dt,
              COUNT(distinct d1.id) as total_users,
              COUNT(distinct d2.id) as retained_users
              FROM (SELECT DISTINCT actor.id, DATE(created_at) day FROM `githubarchive.month.201810`)as d1
              LEFT JOIN (SELECT DISTINCT actor.id, DATE(TIMESTAMP_SUB(created_at, INTERVAL -24 HOUR)) day FROM `githubarchive.month.201810`) as d2
              USING (id, day)
              GROUP BY 1
              ORDER BY 1


              enter image description here






              share|improve this answer






























                1














                TIMESTAMP_SUB would fix the query as written, but might not be good enough as a solution for performance reasons. But at least it gets you the 1 day substraction:



                SELECT date(d1.created_at) as dt,
                COUNT(distinct d1.actor.id) as total_users,
                COUNT(distinct d2.actor.id) as retained_users
                FROM `githubarchive.month.201810` as d1
                LEFT JOIN `githubarchive.month.201810` as d2 ON
                d1.actor.id = d2.actor.id
                AND date(d1.created_at) = date(TIMESTAMP_SUB(d2.created_at, INTERVAL -24 HOUR))
                GROUP BY 1
                ORDER BY 1


                To improve performance, do some de-duping before the JOIN:



                SELECT day as dt,
                COUNT(distinct d1.id) as total_users,
                COUNT(distinct d2.id) as retained_users
                FROM (SELECT DISTINCT actor.id, DATE(created_at) day FROM `githubarchive.month.201810`)as d1
                LEFT JOIN (SELECT DISTINCT actor.id, DATE(TIMESTAMP_SUB(created_at, INTERVAL -24 HOUR)) day FROM `githubarchive.month.201810`) as d2
                USING (id, day)
                GROUP BY 1
                ORDER BY 1


                enter image description here






                share|improve this answer




























                  1












                  1








                  1







                  TIMESTAMP_SUB would fix the query as written, but might not be good enough as a solution for performance reasons. But at least it gets you the 1 day substraction:



                  SELECT date(d1.created_at) as dt,
                  COUNT(distinct d1.actor.id) as total_users,
                  COUNT(distinct d2.actor.id) as retained_users
                  FROM `githubarchive.month.201810` as d1
                  LEFT JOIN `githubarchive.month.201810` as d2 ON
                  d1.actor.id = d2.actor.id
                  AND date(d1.created_at) = date(TIMESTAMP_SUB(d2.created_at, INTERVAL -24 HOUR))
                  GROUP BY 1
                  ORDER BY 1


                  To improve performance, do some de-duping before the JOIN:



                  SELECT day as dt,
                  COUNT(distinct d1.id) as total_users,
                  COUNT(distinct d2.id) as retained_users
                  FROM (SELECT DISTINCT actor.id, DATE(created_at) day FROM `githubarchive.month.201810`)as d1
                  LEFT JOIN (SELECT DISTINCT actor.id, DATE(TIMESTAMP_SUB(created_at, INTERVAL -24 HOUR)) day FROM `githubarchive.month.201810`) as d2
                  USING (id, day)
                  GROUP BY 1
                  ORDER BY 1


                  enter image description here






                  share|improve this answer















                  TIMESTAMP_SUB would fix the query as written, but might not be good enough as a solution for performance reasons. But at least it gets you the 1 day substraction:



                  SELECT date(d1.created_at) as dt,
                  COUNT(distinct d1.actor.id) as total_users,
                  COUNT(distinct d2.actor.id) as retained_users
                  FROM `githubarchive.month.201810` as d1
                  LEFT JOIN `githubarchive.month.201810` as d2 ON
                  d1.actor.id = d2.actor.id
                  AND date(d1.created_at) = date(TIMESTAMP_SUB(d2.created_at, INTERVAL -24 HOUR))
                  GROUP BY 1
                  ORDER BY 1


                  To improve performance, do some de-duping before the JOIN:



                  SELECT day as dt,
                  COUNT(distinct d1.id) as total_users,
                  COUNT(distinct d2.id) as retained_users
                  FROM (SELECT DISTINCT actor.id, DATE(created_at) day FROM `githubarchive.month.201810`)as d1
                  LEFT JOIN (SELECT DISTINCT actor.id, DATE(TIMESTAMP_SUB(created_at, INTERVAL -24 HOUR)) day FROM `githubarchive.month.201810`) as d2
                  USING (id, day)
                  GROUP BY 1
                  ORDER BY 1


                  enter image description here







                  share|improve this answer














                  share|improve this answer



                  share|improve this answer








                  edited Nov 19 '18 at 22:02

























                  answered Nov 19 '18 at 21:57









                  Felipe HoffaFelipe Hoffa

                  21.4k251109




                  21.4k251109

























                      0














                      Below is for BigQuery Standard SQL and is further optimized to not to use any JOINs but rather using analytic functions



                      #standardSQL
                      SELECT
                      day,
                      COUNT(1) total_users,
                      COUNTIF(delta = 1) retained_users
                      FROM (
                      SELECT
                      day, id,
                      DATE_DIFF(day, LAG(day) OVER(PARTITION BY id ORDER BY day), DAY) delta
                      FROM (
                      SELECT DISTINCT
                      DATE(created_at) day,
                      actor.id
                      FROM `githubarchive.month.201810`
                      )
                      )
                      GROUP BY day
                      ORDER BY day


                      or, if to use original question's notation:



                      #standardSQL
                      SELECT
                      day,
                      COUNT(1) total_users,
                      COUNTIF(delta = 1) retained_users
                      FROM (
                      SELECT
                      day, userID,
                      DATE_DIFF(day, LAG(day) OVER(PARTITION BY userID ORDER BY day), DAY) delta
                      FROM (
                      SELECT DISTINCT
                      DATE(eventDate) day,
                      userID
                      FROM `project.dataset.table`
                      )
                      )
                      GROUP BY day
                      ORDER BY day





                      share|improve this answer




























                        0














                        Below is for BigQuery Standard SQL and is further optimized to not to use any JOINs but rather using analytic functions



                        #standardSQL
                        SELECT
                        day,
                        COUNT(1) total_users,
                        COUNTIF(delta = 1) retained_users
                        FROM (
                        SELECT
                        day, id,
                        DATE_DIFF(day, LAG(day) OVER(PARTITION BY id ORDER BY day), DAY) delta
                        FROM (
                        SELECT DISTINCT
                        DATE(created_at) day,
                        actor.id
                        FROM `githubarchive.month.201810`
                        )
                        )
                        GROUP BY day
                        ORDER BY day


                        or, if to use original question's notation:



                        #standardSQL
                        SELECT
                        day,
                        COUNT(1) total_users,
                        COUNTIF(delta = 1) retained_users
                        FROM (
                        SELECT
                        day, userID,
                        DATE_DIFF(day, LAG(day) OVER(PARTITION BY userID ORDER BY day), DAY) delta
                        FROM (
                        SELECT DISTINCT
                        DATE(eventDate) day,
                        userID
                        FROM `project.dataset.table`
                        )
                        )
                        GROUP BY day
                        ORDER BY day





                        share|improve this answer


























                          0












                          0








                          0







                          Below is for BigQuery Standard SQL and is further optimized to not to use any JOINs but rather using analytic functions



                          #standardSQL
                          SELECT
                          day,
                          COUNT(1) total_users,
                          COUNTIF(delta = 1) retained_users
                          FROM (
                          SELECT
                          day, id,
                          DATE_DIFF(day, LAG(day) OVER(PARTITION BY id ORDER BY day), DAY) delta
                          FROM (
                          SELECT DISTINCT
                          DATE(created_at) day,
                          actor.id
                          FROM `githubarchive.month.201810`
                          )
                          )
                          GROUP BY day
                          ORDER BY day


                          or, if to use original question's notation:



                          #standardSQL
                          SELECT
                          day,
                          COUNT(1) total_users,
                          COUNTIF(delta = 1) retained_users
                          FROM (
                          SELECT
                          day, userID,
                          DATE_DIFF(day, LAG(day) OVER(PARTITION BY userID ORDER BY day), DAY) delta
                          FROM (
                          SELECT DISTINCT
                          DATE(eventDate) day,
                          userID
                          FROM `project.dataset.table`
                          )
                          )
                          GROUP BY day
                          ORDER BY day





                          share|improve this answer













                          Below is for BigQuery Standard SQL and is further optimized to not to use any JOINs but rather using analytic functions



                          #standardSQL
                          SELECT
                          day,
                          COUNT(1) total_users,
                          COUNTIF(delta = 1) retained_users
                          FROM (
                          SELECT
                          day, id,
                          DATE_DIFF(day, LAG(day) OVER(PARTITION BY id ORDER BY day), DAY) delta
                          FROM (
                          SELECT DISTINCT
                          DATE(created_at) day,
                          actor.id
                          FROM `githubarchive.month.201810`
                          )
                          )
                          GROUP BY day
                          ORDER BY day


                          or, if to use original question's notation:



                          #standardSQL
                          SELECT
                          day,
                          COUNT(1) total_users,
                          COUNTIF(delta = 1) retained_users
                          FROM (
                          SELECT
                          day, userID,
                          DATE_DIFF(day, LAG(day) OVER(PARTITION BY userID ORDER BY day), DAY) delta
                          FROM (
                          SELECT DISTINCT
                          DATE(eventDate) day,
                          userID
                          FROM `project.dataset.table`
                          )
                          )
                          GROUP BY day
                          ORDER BY day






                          share|improve this answer












                          share|improve this answer



                          share|improve this answer










                          answered Nov 19 '18 at 22:59









                          Mikhail BerlyantMikhail Berlyant

                          58.1k43671




                          58.1k43671






























                              draft saved

                              draft discarded




















































                              Thanks for contributing an answer to Stack Overflow!


                              • Please be sure to answer the question. Provide details and share your research!

                              But avoid



                              • Asking for help, clarification, or responding to other answers.

                              • Making statements based on opinion; back them up with references or personal experience.


                              To learn more, see our tips on writing great answers.




                              draft saved


                              draft discarded














                              StackExchange.ready(
                              function () {
                              StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53383156%2fday-n-retention-in-bigquery-error-message-invalid-time-zone%23new-answer', 'question_page');
                              }
                              );

                              Post as a guest















                              Required, but never shown





















































                              Required, but never shown














                              Required, but never shown












                              Required, but never shown







                              Required, but never shown

































                              Required, but never shown














                              Required, but never shown












                              Required, but never shown







                              Required, but never shown







                              Popular posts from this blog

                              How to send String Array data to Server using php in android

                              Title Spacing in Bjornstrup Chapter, Removing Chapter Number From Contents

                              Is anime1.com a legal site for watching anime?