Day N retention in BigQuery, error message: Invalid time zone
I'm trying to calculate Day N retention on a dataset in google Big Query. The table consists of one month of data from a mobile app and I want to find out how many users returned each day. I am using standardSQL. So far the code I have is
SELECT date(d1.eventDate) as dt,
COUNT(distinct d1.userID) as total_users,
COUNT(distinct d2.userID) as retained_users
FROM `dataset` as d1
LEFT JOIN `dataset` as d2 ON
d1.userID = d2.userID
AND date(d1.eventDate) = date(datetime(d2.eventDate, '-1 day'))
GROUP BY 1
ORDER BY 1"
When I try to execute I get the error message
Error: Invalid time zone: -1 day [invalidQuery]
My table structure is
eventDate | UserID |
2016-05-06 00:00:00 UTC | 100000 |
2016-05-06 00:00:00 UTC | 200000 |
2016-05-06 00:00:00 UTC | 300000 |
What should I be using instead of '-1 day'?
sql datetime timezone google-bigquery standard-sql
add a comment |
I'm trying to calculate Day N retention on a dataset in google Big Query. The table consists of one month of data from a mobile app and I want to find out how many users returned each day. I am using standardSQL. So far the code I have is
SELECT date(d1.eventDate) as dt,
COUNT(distinct d1.userID) as total_users,
COUNT(distinct d2.userID) as retained_users
FROM `dataset` as d1
LEFT JOIN `dataset` as d2 ON
d1.userID = d2.userID
AND date(d1.eventDate) = date(datetime(d2.eventDate, '-1 day'))
GROUP BY 1
ORDER BY 1"
When I try to execute I get the error message
Error: Invalid time zone: -1 day [invalidQuery]
My table structure is
eventDate | UserID |
2016-05-06 00:00:00 UTC | 100000 |
2016-05-06 00:00:00 UTC | 200000 |
2016-05-06 00:00:00 UTC | 300000 |
What should I be using instead of '-1 day'?
sql datetime timezone google-bigquery standard-sql
add a comment |
I'm trying to calculate Day N retention on a dataset in google Big Query. The table consists of one month of data from a mobile app and I want to find out how many users returned each day. I am using standardSQL. So far the code I have is
SELECT date(d1.eventDate) as dt,
COUNT(distinct d1.userID) as total_users,
COUNT(distinct d2.userID) as retained_users
FROM `dataset` as d1
LEFT JOIN `dataset` as d2 ON
d1.userID = d2.userID
AND date(d1.eventDate) = date(datetime(d2.eventDate, '-1 day'))
GROUP BY 1
ORDER BY 1"
When I try to execute I get the error message
Error: Invalid time zone: -1 day [invalidQuery]
My table structure is
eventDate | UserID |
2016-05-06 00:00:00 UTC | 100000 |
2016-05-06 00:00:00 UTC | 200000 |
2016-05-06 00:00:00 UTC | 300000 |
What should I be using instead of '-1 day'?
sql datetime timezone google-bigquery standard-sql
I'm trying to calculate Day N retention on a dataset in google Big Query. The table consists of one month of data from a mobile app and I want to find out how many users returned each day. I am using standardSQL. So far the code I have is
SELECT date(d1.eventDate) as dt,
COUNT(distinct d1.userID) as total_users,
COUNT(distinct d2.userID) as retained_users
FROM `dataset` as d1
LEFT JOIN `dataset` as d2 ON
d1.userID = d2.userID
AND date(d1.eventDate) = date(datetime(d2.eventDate, '-1 day'))
GROUP BY 1
ORDER BY 1"
When I try to execute I get the error message
Error: Invalid time zone: -1 day [invalidQuery]
My table structure is
eventDate | UserID |
2016-05-06 00:00:00 UTC | 100000 |
2016-05-06 00:00:00 UTC | 200000 |
2016-05-06 00:00:00 UTC | 300000 |
What should I be using instead of '-1 day'?
sql datetime timezone google-bigquery standard-sql
sql datetime timezone google-bigquery standard-sql
asked Nov 19 '18 at 21:52
SophieVMCSophieVMC
61
61
add a comment |
add a comment |
2 Answers
2
active
oldest
votes
TIMESTAMP_SUB would fix the query as written, but might not be good enough as a solution for performance reasons. But at least it gets you the 1 day substraction:
SELECT date(d1.created_at) as dt,
COUNT(distinct d1.actor.id) as total_users,
COUNT(distinct d2.actor.id) as retained_users
FROM `githubarchive.month.201810` as d1
LEFT JOIN `githubarchive.month.201810` as d2 ON
d1.actor.id = d2.actor.id
AND date(d1.created_at) = date(TIMESTAMP_SUB(d2.created_at, INTERVAL -24 HOUR))
GROUP BY 1
ORDER BY 1
To improve performance, do some de-duping before the JOIN:
SELECT day as dt,
COUNT(distinct d1.id) as total_users,
COUNT(distinct d2.id) as retained_users
FROM (SELECT DISTINCT actor.id, DATE(created_at) day FROM `githubarchive.month.201810`)as d1
LEFT JOIN (SELECT DISTINCT actor.id, DATE(TIMESTAMP_SUB(created_at, INTERVAL -24 HOUR)) day FROM `githubarchive.month.201810`) as d2
USING (id, day)
GROUP BY 1
ORDER BY 1

add a comment |
Below is for BigQuery Standard SQL and is further optimized to not to use any JOINs but rather using analytic functions
#standardSQL
SELECT
day,
COUNT(1) total_users,
COUNTIF(delta = 1) retained_users
FROM (
SELECT
day, id,
DATE_DIFF(day, LAG(day) OVER(PARTITION BY id ORDER BY day), DAY) delta
FROM (
SELECT DISTINCT
DATE(created_at) day,
actor.id
FROM `githubarchive.month.201810`
)
)
GROUP BY day
ORDER BY day
or, if to use original question's notation:
#standardSQL
SELECT
day,
COUNT(1) total_users,
COUNTIF(delta = 1) retained_users
FROM (
SELECT
day, userID,
DATE_DIFF(day, LAG(day) OVER(PARTITION BY userID ORDER BY day), DAY) delta
FROM (
SELECT DISTINCT
DATE(eventDate) day,
userID
FROM `project.dataset.table`
)
)
GROUP BY day
ORDER BY day
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53383156%2fday-n-retention-in-bigquery-error-message-invalid-time-zone%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
TIMESTAMP_SUB would fix the query as written, but might not be good enough as a solution for performance reasons. But at least it gets you the 1 day substraction:
SELECT date(d1.created_at) as dt,
COUNT(distinct d1.actor.id) as total_users,
COUNT(distinct d2.actor.id) as retained_users
FROM `githubarchive.month.201810` as d1
LEFT JOIN `githubarchive.month.201810` as d2 ON
d1.actor.id = d2.actor.id
AND date(d1.created_at) = date(TIMESTAMP_SUB(d2.created_at, INTERVAL -24 HOUR))
GROUP BY 1
ORDER BY 1
To improve performance, do some de-duping before the JOIN:
SELECT day as dt,
COUNT(distinct d1.id) as total_users,
COUNT(distinct d2.id) as retained_users
FROM (SELECT DISTINCT actor.id, DATE(created_at) day FROM `githubarchive.month.201810`)as d1
LEFT JOIN (SELECT DISTINCT actor.id, DATE(TIMESTAMP_SUB(created_at, INTERVAL -24 HOUR)) day FROM `githubarchive.month.201810`) as d2
USING (id, day)
GROUP BY 1
ORDER BY 1

add a comment |
TIMESTAMP_SUB would fix the query as written, but might not be good enough as a solution for performance reasons. But at least it gets you the 1 day substraction:
SELECT date(d1.created_at) as dt,
COUNT(distinct d1.actor.id) as total_users,
COUNT(distinct d2.actor.id) as retained_users
FROM `githubarchive.month.201810` as d1
LEFT JOIN `githubarchive.month.201810` as d2 ON
d1.actor.id = d2.actor.id
AND date(d1.created_at) = date(TIMESTAMP_SUB(d2.created_at, INTERVAL -24 HOUR))
GROUP BY 1
ORDER BY 1
To improve performance, do some de-duping before the JOIN:
SELECT day as dt,
COUNT(distinct d1.id) as total_users,
COUNT(distinct d2.id) as retained_users
FROM (SELECT DISTINCT actor.id, DATE(created_at) day FROM `githubarchive.month.201810`)as d1
LEFT JOIN (SELECT DISTINCT actor.id, DATE(TIMESTAMP_SUB(created_at, INTERVAL -24 HOUR)) day FROM `githubarchive.month.201810`) as d2
USING (id, day)
GROUP BY 1
ORDER BY 1

add a comment |
TIMESTAMP_SUB would fix the query as written, but might not be good enough as a solution for performance reasons. But at least it gets you the 1 day substraction:
SELECT date(d1.created_at) as dt,
COUNT(distinct d1.actor.id) as total_users,
COUNT(distinct d2.actor.id) as retained_users
FROM `githubarchive.month.201810` as d1
LEFT JOIN `githubarchive.month.201810` as d2 ON
d1.actor.id = d2.actor.id
AND date(d1.created_at) = date(TIMESTAMP_SUB(d2.created_at, INTERVAL -24 HOUR))
GROUP BY 1
ORDER BY 1
To improve performance, do some de-duping before the JOIN:
SELECT day as dt,
COUNT(distinct d1.id) as total_users,
COUNT(distinct d2.id) as retained_users
FROM (SELECT DISTINCT actor.id, DATE(created_at) day FROM `githubarchive.month.201810`)as d1
LEFT JOIN (SELECT DISTINCT actor.id, DATE(TIMESTAMP_SUB(created_at, INTERVAL -24 HOUR)) day FROM `githubarchive.month.201810`) as d2
USING (id, day)
GROUP BY 1
ORDER BY 1

TIMESTAMP_SUB would fix the query as written, but might not be good enough as a solution for performance reasons. But at least it gets you the 1 day substraction:
SELECT date(d1.created_at) as dt,
COUNT(distinct d1.actor.id) as total_users,
COUNT(distinct d2.actor.id) as retained_users
FROM `githubarchive.month.201810` as d1
LEFT JOIN `githubarchive.month.201810` as d2 ON
d1.actor.id = d2.actor.id
AND date(d1.created_at) = date(TIMESTAMP_SUB(d2.created_at, INTERVAL -24 HOUR))
GROUP BY 1
ORDER BY 1
To improve performance, do some de-duping before the JOIN:
SELECT day as dt,
COUNT(distinct d1.id) as total_users,
COUNT(distinct d2.id) as retained_users
FROM (SELECT DISTINCT actor.id, DATE(created_at) day FROM `githubarchive.month.201810`)as d1
LEFT JOIN (SELECT DISTINCT actor.id, DATE(TIMESTAMP_SUB(created_at, INTERVAL -24 HOUR)) day FROM `githubarchive.month.201810`) as d2
USING (id, day)
GROUP BY 1
ORDER BY 1

edited Nov 19 '18 at 22:02
answered Nov 19 '18 at 21:57
Felipe HoffaFelipe Hoffa
21.4k251109
21.4k251109
add a comment |
add a comment |
Below is for BigQuery Standard SQL and is further optimized to not to use any JOINs but rather using analytic functions
#standardSQL
SELECT
day,
COUNT(1) total_users,
COUNTIF(delta = 1) retained_users
FROM (
SELECT
day, id,
DATE_DIFF(day, LAG(day) OVER(PARTITION BY id ORDER BY day), DAY) delta
FROM (
SELECT DISTINCT
DATE(created_at) day,
actor.id
FROM `githubarchive.month.201810`
)
)
GROUP BY day
ORDER BY day
or, if to use original question's notation:
#standardSQL
SELECT
day,
COUNT(1) total_users,
COUNTIF(delta = 1) retained_users
FROM (
SELECT
day, userID,
DATE_DIFF(day, LAG(day) OVER(PARTITION BY userID ORDER BY day), DAY) delta
FROM (
SELECT DISTINCT
DATE(eventDate) day,
userID
FROM `project.dataset.table`
)
)
GROUP BY day
ORDER BY day
add a comment |
Below is for BigQuery Standard SQL and is further optimized to not to use any JOINs but rather using analytic functions
#standardSQL
SELECT
day,
COUNT(1) total_users,
COUNTIF(delta = 1) retained_users
FROM (
SELECT
day, id,
DATE_DIFF(day, LAG(day) OVER(PARTITION BY id ORDER BY day), DAY) delta
FROM (
SELECT DISTINCT
DATE(created_at) day,
actor.id
FROM `githubarchive.month.201810`
)
)
GROUP BY day
ORDER BY day
or, if to use original question's notation:
#standardSQL
SELECT
day,
COUNT(1) total_users,
COUNTIF(delta = 1) retained_users
FROM (
SELECT
day, userID,
DATE_DIFF(day, LAG(day) OVER(PARTITION BY userID ORDER BY day), DAY) delta
FROM (
SELECT DISTINCT
DATE(eventDate) day,
userID
FROM `project.dataset.table`
)
)
GROUP BY day
ORDER BY day
add a comment |
Below is for BigQuery Standard SQL and is further optimized to not to use any JOINs but rather using analytic functions
#standardSQL
SELECT
day,
COUNT(1) total_users,
COUNTIF(delta = 1) retained_users
FROM (
SELECT
day, id,
DATE_DIFF(day, LAG(day) OVER(PARTITION BY id ORDER BY day), DAY) delta
FROM (
SELECT DISTINCT
DATE(created_at) day,
actor.id
FROM `githubarchive.month.201810`
)
)
GROUP BY day
ORDER BY day
or, if to use original question's notation:
#standardSQL
SELECT
day,
COUNT(1) total_users,
COUNTIF(delta = 1) retained_users
FROM (
SELECT
day, userID,
DATE_DIFF(day, LAG(day) OVER(PARTITION BY userID ORDER BY day), DAY) delta
FROM (
SELECT DISTINCT
DATE(eventDate) day,
userID
FROM `project.dataset.table`
)
)
GROUP BY day
ORDER BY day
Below is for BigQuery Standard SQL and is further optimized to not to use any JOINs but rather using analytic functions
#standardSQL
SELECT
day,
COUNT(1) total_users,
COUNTIF(delta = 1) retained_users
FROM (
SELECT
day, id,
DATE_DIFF(day, LAG(day) OVER(PARTITION BY id ORDER BY day), DAY) delta
FROM (
SELECT DISTINCT
DATE(created_at) day,
actor.id
FROM `githubarchive.month.201810`
)
)
GROUP BY day
ORDER BY day
or, if to use original question's notation:
#standardSQL
SELECT
day,
COUNT(1) total_users,
COUNTIF(delta = 1) retained_users
FROM (
SELECT
day, userID,
DATE_DIFF(day, LAG(day) OVER(PARTITION BY userID ORDER BY day), DAY) delta
FROM (
SELECT DISTINCT
DATE(eventDate) day,
userID
FROM `project.dataset.table`
)
)
GROUP BY day
ORDER BY day
answered Nov 19 '18 at 22:59
Mikhail BerlyantMikhail Berlyant
58.1k43671
58.1k43671
add a comment |
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53383156%2fday-n-retention-in-bigquery-error-message-invalid-time-zone%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown