Generating time series reports
I'm trying to work out how to create a solution that will allow me to query a table that has a timestamp, and in return get a time series data. The request consists of start/end date & time, granularity type (minute, hour, day, week, month and year) and granularity value. Having tried to use in a query something like
GROUP BY ROUND(UNIX_TIMESTAMP(created_at) DIV 60)
to get the results per one minute, or DIV 300 for every five minutes is fine. The problem lies further up for calculating months and years' seconds which will be inaccurate. I've stumbled upon the generate_series in PGSQL (MySQL alternative) and am stuck trying to tie them together. How do I calculate a count of rows, for example, for two days, on a 15 minute granularity? It's a complex question that I'll probably have to break down further.
I have already visited #1 and #2, but they are incomplete.
To me it seems that rounding will only be allowed to certain level and I'd have to restrict it (i.e .for 2 months period there cannot be hourly breakdown).
EDIT
It gave me the wrong impression - I would not have to calculate monthly figures based on seconds using the query like:
SELECT DATE_FORMAT(MIN(created_at),'%d/%m/%Y %H:%i:%s' as date,
COUNT(*) AS count FROM guests
GROUP BY ROUND(UNIX_TIMESTAMP(created_at) / 300)
It's only going to do grouping based on minimum value. But the question still stands - is the best approach really to go through the time period using granularity value and "slice" the data that way without loosing too much accuracy?
It seems that the only approach is to run sub-queries for a set of data (i.e. for a period of two months, generate 15 minute intervals timestamps, group the data into them and produce an aggregate) without dividing the original timestamp to produce the rounded approximation.
mysql aggregate-functions datestamp
|
show 4 more comments
I'm trying to work out how to create a solution that will allow me to query a table that has a timestamp, and in return get a time series data. The request consists of start/end date & time, granularity type (minute, hour, day, week, month and year) and granularity value. Having tried to use in a query something like
GROUP BY ROUND(UNIX_TIMESTAMP(created_at) DIV 60)
to get the results per one minute, or DIV 300 for every five minutes is fine. The problem lies further up for calculating months and years' seconds which will be inaccurate. I've stumbled upon the generate_series in PGSQL (MySQL alternative) and am stuck trying to tie them together. How do I calculate a count of rows, for example, for two days, on a 15 minute granularity? It's a complex question that I'll probably have to break down further.
I have already visited #1 and #2, but they are incomplete.
To me it seems that rounding will only be allowed to certain level and I'd have to restrict it (i.e .for 2 months period there cannot be hourly breakdown).
EDIT
It gave me the wrong impression - I would not have to calculate monthly figures based on seconds using the query like:
SELECT DATE_FORMAT(MIN(created_at),'%d/%m/%Y %H:%i:%s' as date,
COUNT(*) AS count FROM guests
GROUP BY ROUND(UNIX_TIMESTAMP(created_at) / 300)
It's only going to do grouping based on minimum value. But the question still stands - is the best approach really to go through the time period using granularity value and "slice" the data that way without loosing too much accuracy?
It seems that the only approach is to run sub-queries for a set of data (i.e. for a period of two months, generate 15 minute intervals timestamps, group the data into them and produce an aggregate) without dividing the original timestamp to produce the rounded approximation.
mysql aggregate-functions datestamp
Can you please include some sample data in an sqlfiddle or something, then express your desired output. Context improves clarity/understanding.
– mickmackusa
Nov 16 '18 at 12:53
The data is stored through timestamps, it would be difficult to provide with proper example as there may be thousands of records. I'll edit the question with possible examples.
– Daniel Protopopov
Nov 16 '18 at 12:56
here is a basic structure of a MySQL number generator what you can use a subquery db-fiddle.com/f/mVJSpFHja4unD4fAsihADv/0 .. There are some out there but using a MySQL user variable is the most easy.. MySQL 8.0 is even more easy because it's supports recursive CTE's which can also be used to generate a number list... generating a date serie with it db-fiddle.com/f/mVJSpFHja4unD4fAsihADv/3
– Raymond Nijland
Nov 16 '18 at 13:09
@RaymondNijland, the question is not how to generate the time series, but how to use it in conjunction with data that has timestamps to produce accurate time series data.
– Daniel Protopopov
Nov 16 '18 at 13:10
"the question is not how to generate the time series, but how to use it in conjunction with data that has timestamps to produce accurate time series data." Ok in that case read "Why should I provide an MCVE for what seems to me to be a very simple SQL query?" meta.stackoverflow.com/questions/333952/…
– Raymond Nijland
Nov 16 '18 at 13:12
|
show 4 more comments
I'm trying to work out how to create a solution that will allow me to query a table that has a timestamp, and in return get a time series data. The request consists of start/end date & time, granularity type (minute, hour, day, week, month and year) and granularity value. Having tried to use in a query something like
GROUP BY ROUND(UNIX_TIMESTAMP(created_at) DIV 60)
to get the results per one minute, or DIV 300 for every five minutes is fine. The problem lies further up for calculating months and years' seconds which will be inaccurate. I've stumbled upon the generate_series in PGSQL (MySQL alternative) and am stuck trying to tie them together. How do I calculate a count of rows, for example, for two days, on a 15 minute granularity? It's a complex question that I'll probably have to break down further.
I have already visited #1 and #2, but they are incomplete.
To me it seems that rounding will only be allowed to certain level and I'd have to restrict it (i.e .for 2 months period there cannot be hourly breakdown).
EDIT
It gave me the wrong impression - I would not have to calculate monthly figures based on seconds using the query like:
SELECT DATE_FORMAT(MIN(created_at),'%d/%m/%Y %H:%i:%s' as date,
COUNT(*) AS count FROM guests
GROUP BY ROUND(UNIX_TIMESTAMP(created_at) / 300)
It's only going to do grouping based on minimum value. But the question still stands - is the best approach really to go through the time period using granularity value and "slice" the data that way without loosing too much accuracy?
It seems that the only approach is to run sub-queries for a set of data (i.e. for a period of two months, generate 15 minute intervals timestamps, group the data into them and produce an aggregate) without dividing the original timestamp to produce the rounded approximation.
mysql aggregate-functions datestamp
I'm trying to work out how to create a solution that will allow me to query a table that has a timestamp, and in return get a time series data. The request consists of start/end date & time, granularity type (minute, hour, day, week, month and year) and granularity value. Having tried to use in a query something like
GROUP BY ROUND(UNIX_TIMESTAMP(created_at) DIV 60)
to get the results per one minute, or DIV 300 for every five minutes is fine. The problem lies further up for calculating months and years' seconds which will be inaccurate. I've stumbled upon the generate_series in PGSQL (MySQL alternative) and am stuck trying to tie them together. How do I calculate a count of rows, for example, for two days, on a 15 minute granularity? It's a complex question that I'll probably have to break down further.
I have already visited #1 and #2, but they are incomplete.
To me it seems that rounding will only be allowed to certain level and I'd have to restrict it (i.e .for 2 months period there cannot be hourly breakdown).
EDIT
It gave me the wrong impression - I would not have to calculate monthly figures based on seconds using the query like:
SELECT DATE_FORMAT(MIN(created_at),'%d/%m/%Y %H:%i:%s' as date,
COUNT(*) AS count FROM guests
GROUP BY ROUND(UNIX_TIMESTAMP(created_at) / 300)
It's only going to do grouping based on minimum value. But the question still stands - is the best approach really to go through the time period using granularity value and "slice" the data that way without loosing too much accuracy?
It seems that the only approach is to run sub-queries for a set of data (i.e. for a period of two months, generate 15 minute intervals timestamps, group the data into them and produce an aggregate) without dividing the original timestamp to produce the rounded approximation.
mysql aggregate-functions datestamp
mysql aggregate-functions datestamp
edited Nov 16 '18 at 18:38
miken32
23.5k84672
23.5k84672
asked Nov 16 '18 at 12:46
Daniel Protopopov
2,76721526
2,76721526
Can you please include some sample data in an sqlfiddle or something, then express your desired output. Context improves clarity/understanding.
– mickmackusa
Nov 16 '18 at 12:53
The data is stored through timestamps, it would be difficult to provide with proper example as there may be thousands of records. I'll edit the question with possible examples.
– Daniel Protopopov
Nov 16 '18 at 12:56
here is a basic structure of a MySQL number generator what you can use a subquery db-fiddle.com/f/mVJSpFHja4unD4fAsihADv/0 .. There are some out there but using a MySQL user variable is the most easy.. MySQL 8.0 is even more easy because it's supports recursive CTE's which can also be used to generate a number list... generating a date serie with it db-fiddle.com/f/mVJSpFHja4unD4fAsihADv/3
– Raymond Nijland
Nov 16 '18 at 13:09
@RaymondNijland, the question is not how to generate the time series, but how to use it in conjunction with data that has timestamps to produce accurate time series data.
– Daniel Protopopov
Nov 16 '18 at 13:10
"the question is not how to generate the time series, but how to use it in conjunction with data that has timestamps to produce accurate time series data." Ok in that case read "Why should I provide an MCVE for what seems to me to be a very simple SQL query?" meta.stackoverflow.com/questions/333952/…
– Raymond Nijland
Nov 16 '18 at 13:12
|
show 4 more comments
Can you please include some sample data in an sqlfiddle or something, then express your desired output. Context improves clarity/understanding.
– mickmackusa
Nov 16 '18 at 12:53
The data is stored through timestamps, it would be difficult to provide with proper example as there may be thousands of records. I'll edit the question with possible examples.
– Daniel Protopopov
Nov 16 '18 at 12:56
here is a basic structure of a MySQL number generator what you can use a subquery db-fiddle.com/f/mVJSpFHja4unD4fAsihADv/0 .. There are some out there but using a MySQL user variable is the most easy.. MySQL 8.0 is even more easy because it's supports recursive CTE's which can also be used to generate a number list... generating a date serie with it db-fiddle.com/f/mVJSpFHja4unD4fAsihADv/3
– Raymond Nijland
Nov 16 '18 at 13:09
@RaymondNijland, the question is not how to generate the time series, but how to use it in conjunction with data that has timestamps to produce accurate time series data.
– Daniel Protopopov
Nov 16 '18 at 13:10
"the question is not how to generate the time series, but how to use it in conjunction with data that has timestamps to produce accurate time series data." Ok in that case read "Why should I provide an MCVE for what seems to me to be a very simple SQL query?" meta.stackoverflow.com/questions/333952/…
– Raymond Nijland
Nov 16 '18 at 13:12
Can you please include some sample data in an sqlfiddle or something, then express your desired output. Context improves clarity/understanding.
– mickmackusa
Nov 16 '18 at 12:53
Can you please include some sample data in an sqlfiddle or something, then express your desired output. Context improves clarity/understanding.
– mickmackusa
Nov 16 '18 at 12:53
The data is stored through timestamps, it would be difficult to provide with proper example as there may be thousands of records. I'll edit the question with possible examples.
– Daniel Protopopov
Nov 16 '18 at 12:56
The data is stored through timestamps, it would be difficult to provide with proper example as there may be thousands of records. I'll edit the question with possible examples.
– Daniel Protopopov
Nov 16 '18 at 12:56
here is a basic structure of a MySQL number generator what you can use a subquery db-fiddle.com/f/mVJSpFHja4unD4fAsihADv/0 .. There are some out there but using a MySQL user variable is the most easy.. MySQL 8.0 is even more easy because it's supports recursive CTE's which can also be used to generate a number list... generating a date serie with it db-fiddle.com/f/mVJSpFHja4unD4fAsihADv/3
– Raymond Nijland
Nov 16 '18 at 13:09
here is a basic structure of a MySQL number generator what you can use a subquery db-fiddle.com/f/mVJSpFHja4unD4fAsihADv/0 .. There are some out there but using a MySQL user variable is the most easy.. MySQL 8.0 is even more easy because it's supports recursive CTE's which can also be used to generate a number list... generating a date serie with it db-fiddle.com/f/mVJSpFHja4unD4fAsihADv/3
– Raymond Nijland
Nov 16 '18 at 13:09
@RaymondNijland, the question is not how to generate the time series, but how to use it in conjunction with data that has timestamps to produce accurate time series data.
– Daniel Protopopov
Nov 16 '18 at 13:10
@RaymondNijland, the question is not how to generate the time series, but how to use it in conjunction with data that has timestamps to produce accurate time series data.
– Daniel Protopopov
Nov 16 '18 at 13:10
"the question is not how to generate the time series, but how to use it in conjunction with data that has timestamps to produce accurate time series data." Ok in that case read "Why should I provide an MCVE for what seems to me to be a very simple SQL query?" meta.stackoverflow.com/questions/333952/…
– Raymond Nijland
Nov 16 '18 at 13:12
"the question is not how to generate the time series, but how to use it in conjunction with data that has timestamps to produce accurate time series data." Ok in that case read "Why should I provide an MCVE for what seems to me to be a very simple SQL query?" meta.stackoverflow.com/questions/333952/…
– Raymond Nijland
Nov 16 '18 at 13:12
|
show 4 more comments
1 Answer
1
active
oldest
votes
Let's say you have a gigantic table measure
with two columns datestamp
and temp
.
Let's say you want to see the temperature every six minutes (10x per hour) for the last week. You can do this sort of thing. We'll get to defining trunc
in a moment.
SELECT trunc(datestamp) datestamp, AVG(temp) temp
FROM measure
WHERE datestamp >= CURDATE() - INVERVAL 7 DAY
GROUP BY trunc(datestamp)
ORDER BY trunc(datestamp)
That works for any reasonable definition of trunc
. In this case trunc(t)
returns the beginning of the six-minute period in which t
occurs. So, trunc('1942-12-07 08:45:17')
gives 1942-12-07 08:42:00
).
Here's a query that works for every six minute interval.
SELECT DATE_FORMAT(datestamp,'%Y-%m-%d %H:00') +
INTERVAL (MINUTE(datestamp) -
MINUTE(datestamp) MOD 6) datestamp,
AVG(temp) temp
FROM measure
WHERE datestamp >= CURDATE() - INVERVAL 7 DAY
GROUP BY DATE_FORMAT(datestamp,'%Y-%m-%d %H:00') +
INTERVAL (MINUTE(datestamp) -
MINUTE(datestamp) MOD 6)
ORDER BY 1
This uses inbuilt date arithmetic rather than unix timestamp arithmetic.
You can use a stored function to make this easier to read.
DELIMITER $$
DROP FUNCTION IF EXISTS TRUNC_N_MINUTES$$
CREATE
FUNCTION TRUNC_N_MINUTES(datestamp DATETIME, n INT)
RETURNS DATETIME DETERMINISTIC NO SQL
COMMENT 'truncate to N minute boundary. For example,
TRUNCATE_N_MINUTES(sometime, 15) gives the nearest
preceding quarter hour'
RETURN DATE_FORMAT(datestamp,'%Y-%m-%d %H:00') +
INTERVAL (MINUTE(datestamp) -
MINUTE(datestamp) MOD n) MINUTE$$
DELIMITER ;
Then your query will say
SELECT TRUNC_N_MINUTES(datestamp, 6) datestamp, AVG(temp) temp
FROM measure
WHERE datestamp >= CURDATE() - INVERVAL 7 DAY
GROUP BY TRUNC_N_MINUTES(datestamp, 6)
ORDER BY TRUNC_N_MINUTES(datestamp, 6)
If you want to summarize by 5, 10, 15, or minute boundaries (three items per hour) simply use that number in place of 6
.
You'll need different trunc()
functions for hours, etc.
The trunc()
function for daily summaries is DATE(datestamp)
.
For monthly summaries it is LAST_DAY(datestamp)
. For example,
SELECT LAST_DAY(datestamp) month_ending, AVG(temp) temp
FROM measure
GROUP BY LAST_DAY(datestamp)
ORDER BY LAST_DAY(datestamp)
yields a month-by-month summary.
I’ll give it a try, however my main concern is due to inconsistent number of days (and lower timeframes) that’s done when doing any calculations with the time stamp. Because of that I tend to believe that records with time level granularity (and up to week) can be done with the time stamp, while date based ones (day and above) have to be done with grouping by those metrics and calculating aggregate of that.
– Daniel Protopopov
Nov 16 '18 at 19:08
It's certainly true that grouping by month has the flaw that months are different lengths. If your data areTIMESTAMPS
and you are in a jurisdiction with daylight saving time, you'll get anomalous results on the changeover days.
– O. Jones
Nov 16 '18 at 20:52
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53338217%2fgenerating-time-series-reports%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
Let's say you have a gigantic table measure
with two columns datestamp
and temp
.
Let's say you want to see the temperature every six minutes (10x per hour) for the last week. You can do this sort of thing. We'll get to defining trunc
in a moment.
SELECT trunc(datestamp) datestamp, AVG(temp) temp
FROM measure
WHERE datestamp >= CURDATE() - INVERVAL 7 DAY
GROUP BY trunc(datestamp)
ORDER BY trunc(datestamp)
That works for any reasonable definition of trunc
. In this case trunc(t)
returns the beginning of the six-minute period in which t
occurs. So, trunc('1942-12-07 08:45:17')
gives 1942-12-07 08:42:00
).
Here's a query that works for every six minute interval.
SELECT DATE_FORMAT(datestamp,'%Y-%m-%d %H:00') +
INTERVAL (MINUTE(datestamp) -
MINUTE(datestamp) MOD 6) datestamp,
AVG(temp) temp
FROM measure
WHERE datestamp >= CURDATE() - INVERVAL 7 DAY
GROUP BY DATE_FORMAT(datestamp,'%Y-%m-%d %H:00') +
INTERVAL (MINUTE(datestamp) -
MINUTE(datestamp) MOD 6)
ORDER BY 1
This uses inbuilt date arithmetic rather than unix timestamp arithmetic.
You can use a stored function to make this easier to read.
DELIMITER $$
DROP FUNCTION IF EXISTS TRUNC_N_MINUTES$$
CREATE
FUNCTION TRUNC_N_MINUTES(datestamp DATETIME, n INT)
RETURNS DATETIME DETERMINISTIC NO SQL
COMMENT 'truncate to N minute boundary. For example,
TRUNCATE_N_MINUTES(sometime, 15) gives the nearest
preceding quarter hour'
RETURN DATE_FORMAT(datestamp,'%Y-%m-%d %H:00') +
INTERVAL (MINUTE(datestamp) -
MINUTE(datestamp) MOD n) MINUTE$$
DELIMITER ;
Then your query will say
SELECT TRUNC_N_MINUTES(datestamp, 6) datestamp, AVG(temp) temp
FROM measure
WHERE datestamp >= CURDATE() - INVERVAL 7 DAY
GROUP BY TRUNC_N_MINUTES(datestamp, 6)
ORDER BY TRUNC_N_MINUTES(datestamp, 6)
If you want to summarize by 5, 10, 15, or minute boundaries (three items per hour) simply use that number in place of 6
.
You'll need different trunc()
functions for hours, etc.
The trunc()
function for daily summaries is DATE(datestamp)
.
For monthly summaries it is LAST_DAY(datestamp)
. For example,
SELECT LAST_DAY(datestamp) month_ending, AVG(temp) temp
FROM measure
GROUP BY LAST_DAY(datestamp)
ORDER BY LAST_DAY(datestamp)
yields a month-by-month summary.
I’ll give it a try, however my main concern is due to inconsistent number of days (and lower timeframes) that’s done when doing any calculations with the time stamp. Because of that I tend to believe that records with time level granularity (and up to week) can be done with the time stamp, while date based ones (day and above) have to be done with grouping by those metrics and calculating aggregate of that.
– Daniel Protopopov
Nov 16 '18 at 19:08
It's certainly true that grouping by month has the flaw that months are different lengths. If your data areTIMESTAMPS
and you are in a jurisdiction with daylight saving time, you'll get anomalous results on the changeover days.
– O. Jones
Nov 16 '18 at 20:52
add a comment |
Let's say you have a gigantic table measure
with two columns datestamp
and temp
.
Let's say you want to see the temperature every six minutes (10x per hour) for the last week. You can do this sort of thing. We'll get to defining trunc
in a moment.
SELECT trunc(datestamp) datestamp, AVG(temp) temp
FROM measure
WHERE datestamp >= CURDATE() - INVERVAL 7 DAY
GROUP BY trunc(datestamp)
ORDER BY trunc(datestamp)
That works for any reasonable definition of trunc
. In this case trunc(t)
returns the beginning of the six-minute period in which t
occurs. So, trunc('1942-12-07 08:45:17')
gives 1942-12-07 08:42:00
).
Here's a query that works for every six minute interval.
SELECT DATE_FORMAT(datestamp,'%Y-%m-%d %H:00') +
INTERVAL (MINUTE(datestamp) -
MINUTE(datestamp) MOD 6) datestamp,
AVG(temp) temp
FROM measure
WHERE datestamp >= CURDATE() - INVERVAL 7 DAY
GROUP BY DATE_FORMAT(datestamp,'%Y-%m-%d %H:00') +
INTERVAL (MINUTE(datestamp) -
MINUTE(datestamp) MOD 6)
ORDER BY 1
This uses inbuilt date arithmetic rather than unix timestamp arithmetic.
You can use a stored function to make this easier to read.
DELIMITER $$
DROP FUNCTION IF EXISTS TRUNC_N_MINUTES$$
CREATE
FUNCTION TRUNC_N_MINUTES(datestamp DATETIME, n INT)
RETURNS DATETIME DETERMINISTIC NO SQL
COMMENT 'truncate to N minute boundary. For example,
TRUNCATE_N_MINUTES(sometime, 15) gives the nearest
preceding quarter hour'
RETURN DATE_FORMAT(datestamp,'%Y-%m-%d %H:00') +
INTERVAL (MINUTE(datestamp) -
MINUTE(datestamp) MOD n) MINUTE$$
DELIMITER ;
Then your query will say
SELECT TRUNC_N_MINUTES(datestamp, 6) datestamp, AVG(temp) temp
FROM measure
WHERE datestamp >= CURDATE() - INVERVAL 7 DAY
GROUP BY TRUNC_N_MINUTES(datestamp, 6)
ORDER BY TRUNC_N_MINUTES(datestamp, 6)
If you want to summarize by 5, 10, 15, or minute boundaries (three items per hour) simply use that number in place of 6
.
You'll need different trunc()
functions for hours, etc.
The trunc()
function for daily summaries is DATE(datestamp)
.
For monthly summaries it is LAST_DAY(datestamp)
. For example,
SELECT LAST_DAY(datestamp) month_ending, AVG(temp) temp
FROM measure
GROUP BY LAST_DAY(datestamp)
ORDER BY LAST_DAY(datestamp)
yields a month-by-month summary.
I’ll give it a try, however my main concern is due to inconsistent number of days (and lower timeframes) that’s done when doing any calculations with the time stamp. Because of that I tend to believe that records with time level granularity (and up to week) can be done with the time stamp, while date based ones (day and above) have to be done with grouping by those metrics and calculating aggregate of that.
– Daniel Protopopov
Nov 16 '18 at 19:08
It's certainly true that grouping by month has the flaw that months are different lengths. If your data areTIMESTAMPS
and you are in a jurisdiction with daylight saving time, you'll get anomalous results on the changeover days.
– O. Jones
Nov 16 '18 at 20:52
add a comment |
Let's say you have a gigantic table measure
with two columns datestamp
and temp
.
Let's say you want to see the temperature every six minutes (10x per hour) for the last week. You can do this sort of thing. We'll get to defining trunc
in a moment.
SELECT trunc(datestamp) datestamp, AVG(temp) temp
FROM measure
WHERE datestamp >= CURDATE() - INVERVAL 7 DAY
GROUP BY trunc(datestamp)
ORDER BY trunc(datestamp)
That works for any reasonable definition of trunc
. In this case trunc(t)
returns the beginning of the six-minute period in which t
occurs. So, trunc('1942-12-07 08:45:17')
gives 1942-12-07 08:42:00
).
Here's a query that works for every six minute interval.
SELECT DATE_FORMAT(datestamp,'%Y-%m-%d %H:00') +
INTERVAL (MINUTE(datestamp) -
MINUTE(datestamp) MOD 6) datestamp,
AVG(temp) temp
FROM measure
WHERE datestamp >= CURDATE() - INVERVAL 7 DAY
GROUP BY DATE_FORMAT(datestamp,'%Y-%m-%d %H:00') +
INTERVAL (MINUTE(datestamp) -
MINUTE(datestamp) MOD 6)
ORDER BY 1
This uses inbuilt date arithmetic rather than unix timestamp arithmetic.
You can use a stored function to make this easier to read.
DELIMITER $$
DROP FUNCTION IF EXISTS TRUNC_N_MINUTES$$
CREATE
FUNCTION TRUNC_N_MINUTES(datestamp DATETIME, n INT)
RETURNS DATETIME DETERMINISTIC NO SQL
COMMENT 'truncate to N minute boundary. For example,
TRUNCATE_N_MINUTES(sometime, 15) gives the nearest
preceding quarter hour'
RETURN DATE_FORMAT(datestamp,'%Y-%m-%d %H:00') +
INTERVAL (MINUTE(datestamp) -
MINUTE(datestamp) MOD n) MINUTE$$
DELIMITER ;
Then your query will say
SELECT TRUNC_N_MINUTES(datestamp, 6) datestamp, AVG(temp) temp
FROM measure
WHERE datestamp >= CURDATE() - INVERVAL 7 DAY
GROUP BY TRUNC_N_MINUTES(datestamp, 6)
ORDER BY TRUNC_N_MINUTES(datestamp, 6)
If you want to summarize by 5, 10, 15, or minute boundaries (three items per hour) simply use that number in place of 6
.
You'll need different trunc()
functions for hours, etc.
The trunc()
function for daily summaries is DATE(datestamp)
.
For monthly summaries it is LAST_DAY(datestamp)
. For example,
SELECT LAST_DAY(datestamp) month_ending, AVG(temp) temp
FROM measure
GROUP BY LAST_DAY(datestamp)
ORDER BY LAST_DAY(datestamp)
yields a month-by-month summary.
Let's say you have a gigantic table measure
with two columns datestamp
and temp
.
Let's say you want to see the temperature every six minutes (10x per hour) for the last week. You can do this sort of thing. We'll get to defining trunc
in a moment.
SELECT trunc(datestamp) datestamp, AVG(temp) temp
FROM measure
WHERE datestamp >= CURDATE() - INVERVAL 7 DAY
GROUP BY trunc(datestamp)
ORDER BY trunc(datestamp)
That works for any reasonable definition of trunc
. In this case trunc(t)
returns the beginning of the six-minute period in which t
occurs. So, trunc('1942-12-07 08:45:17')
gives 1942-12-07 08:42:00
).
Here's a query that works for every six minute interval.
SELECT DATE_FORMAT(datestamp,'%Y-%m-%d %H:00') +
INTERVAL (MINUTE(datestamp) -
MINUTE(datestamp) MOD 6) datestamp,
AVG(temp) temp
FROM measure
WHERE datestamp >= CURDATE() - INVERVAL 7 DAY
GROUP BY DATE_FORMAT(datestamp,'%Y-%m-%d %H:00') +
INTERVAL (MINUTE(datestamp) -
MINUTE(datestamp) MOD 6)
ORDER BY 1
This uses inbuilt date arithmetic rather than unix timestamp arithmetic.
You can use a stored function to make this easier to read.
DELIMITER $$
DROP FUNCTION IF EXISTS TRUNC_N_MINUTES$$
CREATE
FUNCTION TRUNC_N_MINUTES(datestamp DATETIME, n INT)
RETURNS DATETIME DETERMINISTIC NO SQL
COMMENT 'truncate to N minute boundary. For example,
TRUNCATE_N_MINUTES(sometime, 15) gives the nearest
preceding quarter hour'
RETURN DATE_FORMAT(datestamp,'%Y-%m-%d %H:00') +
INTERVAL (MINUTE(datestamp) -
MINUTE(datestamp) MOD n) MINUTE$$
DELIMITER ;
Then your query will say
SELECT TRUNC_N_MINUTES(datestamp, 6) datestamp, AVG(temp) temp
FROM measure
WHERE datestamp >= CURDATE() - INVERVAL 7 DAY
GROUP BY TRUNC_N_MINUTES(datestamp, 6)
ORDER BY TRUNC_N_MINUTES(datestamp, 6)
If you want to summarize by 5, 10, 15, or minute boundaries (three items per hour) simply use that number in place of 6
.
You'll need different trunc()
functions for hours, etc.
The trunc()
function for daily summaries is DATE(datestamp)
.
For monthly summaries it is LAST_DAY(datestamp)
. For example,
SELECT LAST_DAY(datestamp) month_ending, AVG(temp) temp
FROM measure
GROUP BY LAST_DAY(datestamp)
ORDER BY LAST_DAY(datestamp)
yields a month-by-month summary.
edited Nov 16 '18 at 20:54
answered Nov 16 '18 at 18:34
O. Jones
59.4k971106
59.4k971106
I’ll give it a try, however my main concern is due to inconsistent number of days (and lower timeframes) that’s done when doing any calculations with the time stamp. Because of that I tend to believe that records with time level granularity (and up to week) can be done with the time stamp, while date based ones (day and above) have to be done with grouping by those metrics and calculating aggregate of that.
– Daniel Protopopov
Nov 16 '18 at 19:08
It's certainly true that grouping by month has the flaw that months are different lengths. If your data areTIMESTAMPS
and you are in a jurisdiction with daylight saving time, you'll get anomalous results on the changeover days.
– O. Jones
Nov 16 '18 at 20:52
add a comment |
I’ll give it a try, however my main concern is due to inconsistent number of days (and lower timeframes) that’s done when doing any calculations with the time stamp. Because of that I tend to believe that records with time level granularity (and up to week) can be done with the time stamp, while date based ones (day and above) have to be done with grouping by those metrics and calculating aggregate of that.
– Daniel Protopopov
Nov 16 '18 at 19:08
It's certainly true that grouping by month has the flaw that months are different lengths. If your data areTIMESTAMPS
and you are in a jurisdiction with daylight saving time, you'll get anomalous results on the changeover days.
– O. Jones
Nov 16 '18 at 20:52
I’ll give it a try, however my main concern is due to inconsistent number of days (and lower timeframes) that’s done when doing any calculations with the time stamp. Because of that I tend to believe that records with time level granularity (and up to week) can be done with the time stamp, while date based ones (day and above) have to be done with grouping by those metrics and calculating aggregate of that.
– Daniel Protopopov
Nov 16 '18 at 19:08
I’ll give it a try, however my main concern is due to inconsistent number of days (and lower timeframes) that’s done when doing any calculations with the time stamp. Because of that I tend to believe that records with time level granularity (and up to week) can be done with the time stamp, while date based ones (day and above) have to be done with grouping by those metrics and calculating aggregate of that.
– Daniel Protopopov
Nov 16 '18 at 19:08
It's certainly true that grouping by month has the flaw that months are different lengths. If your data are
TIMESTAMPS
and you are in a jurisdiction with daylight saving time, you'll get anomalous results on the changeover days.– O. Jones
Nov 16 '18 at 20:52
It's certainly true that grouping by month has the flaw that months are different lengths. If your data are
TIMESTAMPS
and you are in a jurisdiction with daylight saving time, you'll get anomalous results on the changeover days.– O. Jones
Nov 16 '18 at 20:52
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Some of your past answers have not been well-received, and you're in danger of being blocked from answering.
Please pay close attention to the following guidance:
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53338217%2fgenerating-time-series-reports%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Can you please include some sample data in an sqlfiddle or something, then express your desired output. Context improves clarity/understanding.
– mickmackusa
Nov 16 '18 at 12:53
The data is stored through timestamps, it would be difficult to provide with proper example as there may be thousands of records. I'll edit the question with possible examples.
– Daniel Protopopov
Nov 16 '18 at 12:56
here is a basic structure of a MySQL number generator what you can use a subquery db-fiddle.com/f/mVJSpFHja4unD4fAsihADv/0 .. There are some out there but using a MySQL user variable is the most easy.. MySQL 8.0 is even more easy because it's supports recursive CTE's which can also be used to generate a number list... generating a date serie with it db-fiddle.com/f/mVJSpFHja4unD4fAsihADv/3
– Raymond Nijland
Nov 16 '18 at 13:09
@RaymondNijland, the question is not how to generate the time series, but how to use it in conjunction with data that has timestamps to produce accurate time series data.
– Daniel Protopopov
Nov 16 '18 at 13:10
"the question is not how to generate the time series, but how to use it in conjunction with data that has timestamps to produce accurate time series data." Ok in that case read "Why should I provide an MCVE for what seems to me to be a very simple SQL query?" meta.stackoverflow.com/questions/333952/…
– Raymond Nijland
Nov 16 '18 at 13:12