Day N retention in BigQuery, error message: Invalid time zone

I'm trying to calculate Day N retention on a dataset in google Big Query. The table consists of one month of data from a mobile app and I want to find out how many users returned each day. I am using standardSQL. So far the code I have is

SELECT date(d1.eventDate) as dt,

        COUNT(distinct d1.userID) as total_users,

        COUNT(distinct d2.userID) as retained_users

         FROM `dataset` as d1

        LEFT JOIN `dataset` as d2 ON 

        d1.userID = d2.userID

        AND date(d1.eventDate) = date(datetime(d2.eventDate, '-1 day'))

          GROUP BY 1

          ORDER BY 1"

When I try to execute I get the error message

  Error: Invalid time zone: -1 day [invalidQuery]

My table structure is

    eventDate           | UserID | 

2016-05-06 00:00:00 UTC | 100000 |

2016-05-06 00:00:00 UTC | 200000 |

2016-05-06 00:00:00 UTC | 300000 |

What should I be using instead of '-1 day'?

asked Nov 19 '18 at 21:52

SophieVMC

add a comment |

SELECT date(d1.eventDate) as dt,

        COUNT(distinct d1.userID) as total_users,

        COUNT(distinct d2.userID) as retained_users

         FROM `dataset` as d1

        LEFT JOIN `dataset` as d2 ON 

        d1.userID = d2.userID

        AND date(d1.eventDate) = date(datetime(d2.eventDate, '-1 day'))

          GROUP BY 1

          ORDER BY 1"

When I try to execute I get the error message

  Error: Invalid time zone: -1 day [invalidQuery]

My table structure is

    eventDate           | UserID | 

2016-05-06 00:00:00 UTC | 100000 |

2016-05-06 00:00:00 UTC | 200000 |

2016-05-06 00:00:00 UTC | 300000 |

What should I be using instead of '-1 day'?

asked Nov 19 '18 at 21:52

SophieVMC

add a comment |

SELECT date(d1.eventDate) as dt,

        COUNT(distinct d1.userID) as total_users,

        COUNT(distinct d2.userID) as retained_users

         FROM `dataset` as d1

        LEFT JOIN `dataset` as d2 ON 

        d1.userID = d2.userID

        AND date(d1.eventDate) = date(datetime(d2.eventDate, '-1 day'))

          GROUP BY 1

          ORDER BY 1"

When I try to execute I get the error message

  Error: Invalid time zone: -1 day [invalidQuery]

My table structure is

    eventDate           | UserID | 

2016-05-06 00:00:00 UTC | 100000 |

2016-05-06 00:00:00 UTC | 200000 |

2016-05-06 00:00:00 UTC | 300000 |

What should I be using instead of '-1 day'?

asked Nov 19 '18 at 21:52

SophieVMC

SELECT date(d1.eventDate) as dt,

        COUNT(distinct d1.userID) as total_users,

        COUNT(distinct d2.userID) as retained_users

         FROM `dataset` as d1

        LEFT JOIN `dataset` as d2 ON 

        d1.userID = d2.userID

        AND date(d1.eventDate) = date(datetime(d2.eventDate, '-1 day'))

          GROUP BY 1

          ORDER BY 1"

When I try to execute I get the error message

  Error: Invalid time zone: -1 day [invalidQuery]

My table structure is

    eventDate           | UserID | 

2016-05-06 00:00:00 UTC | 100000 |

2016-05-06 00:00:00 UTC | 200000 |

2016-05-06 00:00:00 UTC | 300000 |

What should I be using instead of '-1 day'?

sql datetime timezone google-bigquery standard-sql

asked Nov 19 '18 at 21:52

SophieVMC

asked Nov 19 '18 at 21:52

SophieVMC

asked Nov 19 '18 at 21:52

SophieVMC

asked Nov 19 '18 at 21:52

SophieVMC

asked Nov 19 '18 at 21:52

SophieVMC

add a comment |

2 Answers
2

active

oldest

votes

TIMESTAMP_SUB would fix the query as written, but might not be good enough as a solution for performance reasons. But at least it gets you the 1 day substraction:

SELECT date(d1.created_at) as dt,

        COUNT(distinct d1.actor.id) as total_users,

        COUNT(distinct d2.actor.id) as retained_users

         FROM `githubarchive.month.201810` as d1

        LEFT JOIN `githubarchive.month.201810` as d2 ON 

        d1.actor.id = d2.actor.id

        AND date(d1.created_at) = date(TIMESTAMP_SUB(d2.created_at, INTERVAL -24 HOUR))

          GROUP BY 1

          ORDER BY 1

To improve performance, do some de-duping before the JOIN:

SELECT day as dt,

    COUNT(distinct d1.id) as total_users,

    COUNT(distinct d2.id) as retained_users

FROM (SELECT DISTINCT actor.id, DATE(created_at) day FROM `githubarchive.month.201810`)as d1

LEFT JOIN (SELECT DISTINCT actor.id,  DATE(TIMESTAMP_SUB(created_at, INTERVAL -24 HOUR)) day FROM `githubarchive.month.201810`) as d2 

USING (id, day)

GROUP BY 1

ORDER BY 1

enter image description here

edited Nov 19 '18 at 22:02

answered Nov 19 '18 at 21:57

Felipe Hoffa

21.4k251109

add a comment |

Below is for BigQuery Standard SQL and is further optimized to not to use any JOINs but rather using analytic functions

#standardSQL

SELECT

  day, 

  COUNT(1) total_users,

  COUNTIF(delta = 1) retained_users

FROM (

  SELECT

    day, id, 

    DATE_DIFF(day, LAG(day) OVER(PARTITION BY id ORDER BY day), DAY) delta

  FROM (

    SELECT DISTINCT

      DATE(created_at) day,

      actor.id

    FROM `githubarchive.month.201810`

  )

)

GROUP BY day

ORDER BY day

or, if to use original question's notation:

#standardSQL

SELECT

  day, 

  COUNT(1) total_users,

  COUNTIF(delta = 1) retained_users

FROM (

  SELECT

    day, userID, 

    DATE_DIFF(day, LAG(day) OVER(PARTITION BY userID ORDER BY day), DAY) delta

  FROM (

    SELECT DISTINCT

      DATE(eventDate) day,

      userID

    FROM `project.dataset.table`

  )

)

GROUP BY day

ORDER BY day

answered Nov 19 '18 at 22:59

Mikhail Berlyant

58.1k43671

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53383156%2fday-n-retention-in-bigquery-error-message-invalid-time-zone%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

2 Answers
2

active

oldest

votes

2 Answers
2

active

oldest

votes

TIMESTAMP_SUB would fix the query as written, but might not be good enough as a solution for performance reasons. But at least it gets you the 1 day substraction:

SELECT date(d1.created_at) as dt,

        COUNT(distinct d1.actor.id) as total_users,

        COUNT(distinct d2.actor.id) as retained_users

         FROM `githubarchive.month.201810` as d1

        LEFT JOIN `githubarchive.month.201810` as d2 ON 

        d1.actor.id = d2.actor.id

        AND date(d1.created_at) = date(TIMESTAMP_SUB(d2.created_at, INTERVAL -24 HOUR))

          GROUP BY 1

          ORDER BY 1

To improve performance, do some de-duping before the JOIN:

SELECT day as dt,

    COUNT(distinct d1.id) as total_users,

    COUNT(distinct d2.id) as retained_users

FROM (SELECT DISTINCT actor.id, DATE(created_at) day FROM `githubarchive.month.201810`)as d1

LEFT JOIN (SELECT DISTINCT actor.id,  DATE(TIMESTAMP_SUB(created_at, INTERVAL -24 HOUR)) day FROM `githubarchive.month.201810`) as d2 

USING (id, day)

GROUP BY 1

ORDER BY 1

enter image description here

edited Nov 19 '18 at 22:02

answered Nov 19 '18 at 21:57

Felipe Hoffa

21.4k251109

add a comment |

TIMESTAMP_SUB would fix the query as written, but might not be good enough as a solution for performance reasons. But at least it gets you the 1 day substraction:

SELECT date(d1.created_at) as dt,

        COUNT(distinct d1.actor.id) as total_users,

        COUNT(distinct d2.actor.id) as retained_users

         FROM `githubarchive.month.201810` as d1

        LEFT JOIN `githubarchive.month.201810` as d2 ON 

        d1.actor.id = d2.actor.id

        AND date(d1.created_at) = date(TIMESTAMP_SUB(d2.created_at, INTERVAL -24 HOUR))

          GROUP BY 1

          ORDER BY 1

To improve performance, do some de-duping before the JOIN:

SELECT day as dt,

    COUNT(distinct d1.id) as total_users,

    COUNT(distinct d2.id) as retained_users

FROM (SELECT DISTINCT actor.id, DATE(created_at) day FROM `githubarchive.month.201810`)as d1

LEFT JOIN (SELECT DISTINCT actor.id,  DATE(TIMESTAMP_SUB(created_at, INTERVAL -24 HOUR)) day FROM `githubarchive.month.201810`) as d2 

USING (id, day)

GROUP BY 1

ORDER BY 1

enter image description here

edited Nov 19 '18 at 22:02

answered Nov 19 '18 at 21:57

Felipe Hoffa

21.4k251109

add a comment |

TIMESTAMP_SUB would fix the query as written, but might not be good enough as a solution for performance reasons. But at least it gets you the 1 day substraction:

SELECT date(d1.created_at) as dt,

        COUNT(distinct d1.actor.id) as total_users,

        COUNT(distinct d2.actor.id) as retained_users

         FROM `githubarchive.month.201810` as d1

        LEFT JOIN `githubarchive.month.201810` as d2 ON 

        d1.actor.id = d2.actor.id

        AND date(d1.created_at) = date(TIMESTAMP_SUB(d2.created_at, INTERVAL -24 HOUR))

          GROUP BY 1

          ORDER BY 1

To improve performance, do some de-duping before the JOIN:

SELECT day as dt,

    COUNT(distinct d1.id) as total_users,

    COUNT(distinct d2.id) as retained_users

FROM (SELECT DISTINCT actor.id, DATE(created_at) day FROM `githubarchive.month.201810`)as d1

LEFT JOIN (SELECT DISTINCT actor.id,  DATE(TIMESTAMP_SUB(created_at, INTERVAL -24 HOUR)) day FROM `githubarchive.month.201810`) as d2 

USING (id, day)

GROUP BY 1

ORDER BY 1

enter image description here

edited Nov 19 '18 at 22:02

answered Nov 19 '18 at 21:57

Felipe Hoffa

21.4k251109

TIMESTAMP_SUB would fix the query as written, but might not be good enough as a solution for performance reasons. But at least it gets you the 1 day substraction:

SELECT date(d1.created_at) as dt,

        COUNT(distinct d1.actor.id) as total_users,

        COUNT(distinct d2.actor.id) as retained_users

         FROM `githubarchive.month.201810` as d1

        LEFT JOIN `githubarchive.month.201810` as d2 ON 

        d1.actor.id = d2.actor.id

        AND date(d1.created_at) = date(TIMESTAMP_SUB(d2.created_at, INTERVAL -24 HOUR))

          GROUP BY 1

          ORDER BY 1

To improve performance, do some de-duping before the JOIN:

SELECT day as dt,

    COUNT(distinct d1.id) as total_users,

    COUNT(distinct d2.id) as retained_users

FROM (SELECT DISTINCT actor.id, DATE(created_at) day FROM `githubarchive.month.201810`)as d1

LEFT JOIN (SELECT DISTINCT actor.id,  DATE(TIMESTAMP_SUB(created_at, INTERVAL -24 HOUR)) day FROM `githubarchive.month.201810`) as d2 

USING (id, day)

GROUP BY 1

ORDER BY 1

enter image description here

edited Nov 19 '18 at 22:02

answered Nov 19 '18 at 21:57

Felipe Hoffa

21.4k251109

edited Nov 19 '18 at 22:02

answered Nov 19 '18 at 21:57

Felipe Hoffa

21.4k251109

answered Nov 19 '18 at 21:57

Felipe Hoffa

21.4k251109

answered Nov 19 '18 at 21:57

Felipe Hoffa

21.4k251109

add a comment |

Below is for BigQuery Standard SQL and is further optimized to not to use any JOINs but rather using analytic functions

#standardSQL

SELECT

  day, 

  COUNT(1) total_users,

  COUNTIF(delta = 1) retained_users

FROM (

  SELECT

    day, id, 

    DATE_DIFF(day, LAG(day) OVER(PARTITION BY id ORDER BY day), DAY) delta

  FROM (

    SELECT DISTINCT

      DATE(created_at) day,

      actor.id

    FROM `githubarchive.month.201810`

  )

)

GROUP BY day

ORDER BY day

or, if to use original question's notation:

#standardSQL

SELECT

  day, 

  COUNT(1) total_users,

  COUNTIF(delta = 1) retained_users

FROM (

  SELECT

    day, userID, 

    DATE_DIFF(day, LAG(day) OVER(PARTITION BY userID ORDER BY day), DAY) delta

  FROM (

    SELECT DISTINCT

      DATE(eventDate) day,

      userID

    FROM `project.dataset.table`

  )

)

GROUP BY day

ORDER BY day

answered Nov 19 '18 at 22:59

Mikhail Berlyant

58.1k43671

add a comment |

Below is for BigQuery Standard SQL and is further optimized to not to use any JOINs but rather using analytic functions

#standardSQL

SELECT

  day, 

  COUNT(1) total_users,

  COUNTIF(delta = 1) retained_users

FROM (

  SELECT

    day, id, 

    DATE_DIFF(day, LAG(day) OVER(PARTITION BY id ORDER BY day), DAY) delta

  FROM (

    SELECT DISTINCT

      DATE(created_at) day,

      actor.id

    FROM `githubarchive.month.201810`

  )

)

GROUP BY day

ORDER BY day

or, if to use original question's notation:

#standardSQL

SELECT

  day, 

  COUNT(1) total_users,

  COUNTIF(delta = 1) retained_users

FROM (

  SELECT

    day, userID, 

    DATE_DIFF(day, LAG(day) OVER(PARTITION BY userID ORDER BY day), DAY) delta

  FROM (

    SELECT DISTINCT

      DATE(eventDate) day,

      userID

    FROM `project.dataset.table`

  )

)

GROUP BY day

ORDER BY day

answered Nov 19 '18 at 22:59

Mikhail Berlyant

58.1k43671

add a comment |

Below is for BigQuery Standard SQL and is further optimized to not to use any JOINs but rather using analytic functions

#standardSQL

SELECT

  day, 

  COUNT(1) total_users,

  COUNTIF(delta = 1) retained_users

FROM (

  SELECT

    day, id, 

    DATE_DIFF(day, LAG(day) OVER(PARTITION BY id ORDER BY day), DAY) delta

  FROM (

    SELECT DISTINCT

      DATE(created_at) day,

      actor.id

    FROM `githubarchive.month.201810`

  )

)

GROUP BY day

ORDER BY day

or, if to use original question's notation:

#standardSQL

SELECT

  day, 

  COUNT(1) total_users,

  COUNTIF(delta = 1) retained_users

FROM (

  SELECT

    day, userID, 

    DATE_DIFF(day, LAG(day) OVER(PARTITION BY userID ORDER BY day), DAY) delta

  FROM (

    SELECT DISTINCT

      DATE(eventDate) day,

      userID

    FROM `project.dataset.table`

  )

)

GROUP BY day

ORDER BY day

answered Nov 19 '18 at 22:59

Mikhail Berlyant

58.1k43671

Below is for BigQuery Standard SQL and is further optimized to not to use any JOINs but rather using analytic functions

#standardSQL

SELECT

  day, 

  COUNT(1) total_users,

  COUNTIF(delta = 1) retained_users

FROM (

  SELECT

    day, id, 

    DATE_DIFF(day, LAG(day) OVER(PARTITION BY id ORDER BY day), DAY) delta

  FROM (

    SELECT DISTINCT

      DATE(created_at) day,

      actor.id

    FROM `githubarchive.month.201810`

  )

)

GROUP BY day

ORDER BY day

or, if to use original question's notation:

#standardSQL

SELECT

  day, 

  COUNT(1) total_users,

  COUNTIF(delta = 1) retained_users

FROM (

  SELECT

    day, userID, 

    DATE_DIFF(day, LAG(day) OVER(PARTITION BY userID ORDER BY day), DAY) delta

  FROM (

    SELECT DISTINCT

      DATE(eventDate) day,

      userID

    FROM `project.dataset.table`

  )

)

GROUP BY day

ORDER BY day

answered Nov 19 '18 at 22:59

Mikhail Berlyant

58.1k43671

answered Nov 19 '18 at 22:59

Mikhail Berlyant

58.1k43671

answered Nov 19 '18 at 22:59

Mikhail Berlyant

58.1k43671

answered Nov 19 '18 at 22:59

Mikhail Berlyant

58.1k43671

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Cfrgtkky