Generating time series reports












3














I'm trying to work out how to create a solution that will allow me to query a table that has a timestamp, and in return get a time series data. The request consists of start/end date & time, granularity type (minute, hour, day, week, month and year) and granularity value. Having tried to use in a query something like



GROUP BY ROUND(UNIX_TIMESTAMP(created_at) DIV 60)


to get the results per one minute, or DIV 300 for every five minutes is fine. The problem lies further up for calculating months and years' seconds which will be inaccurate. I've stumbled upon the generate_series in PGSQL (MySQL alternative) and am stuck trying to tie them together. How do I calculate a count of rows, for example, for two days, on a 15 minute granularity? It's a complex question that I'll probably have to break down further.



I have already visited #1 and #2, but they are incomplete.
To me it seems that rounding will only be allowed to certain level and I'd have to restrict it (i.e .for 2 months period there cannot be hourly breakdown).



EDIT



It gave me the wrong impression - I would not have to calculate monthly figures based on seconds using the query like:



SELECT DATE_FORMAT(MIN(created_at),'%d/%m/%Y %H:%i:%s' as date,
COUNT(*) AS count FROM guests
GROUP BY ROUND(UNIX_TIMESTAMP(created_at) / 300)


It's only going to do grouping based on minimum value. But the question still stands - is the best approach really to go through the time period using granularity value and "slice" the data that way without loosing too much accuracy?



It seems that the only approach is to run sub-queries for a set of data (i.e. for a period of two months, generate 15 minute intervals timestamps, group the data into them and produce an aggregate) without dividing the original timestamp to produce the rounded approximation.










share|improve this question
























  • Can you please include some sample data in an sqlfiddle or something, then express your desired output. Context improves clarity/understanding.
    – mickmackusa
    Nov 16 '18 at 12:53










  • The data is stored through timestamps, it would be difficult to provide with proper example as there may be thousands of records. I'll edit the question with possible examples.
    – Daniel Protopopov
    Nov 16 '18 at 12:56










  • here is a basic structure of a MySQL number generator what you can use a subquery db-fiddle.com/f/mVJSpFHja4unD4fAsihADv/0 .. There are some out there but using a MySQL user variable is the most easy.. MySQL 8.0 is even more easy because it's supports recursive CTE's which can also be used to generate a number list... generating a date serie with it db-fiddle.com/f/mVJSpFHja4unD4fAsihADv/3
    – Raymond Nijland
    Nov 16 '18 at 13:09












  • @RaymondNijland, the question is not how to generate the time series, but how to use it in conjunction with data that has timestamps to produce accurate time series data.
    – Daniel Protopopov
    Nov 16 '18 at 13:10












  • "the question is not how to generate the time series, but how to use it in conjunction with data that has timestamps to produce accurate time series data." Ok in that case read "Why should I provide an MCVE for what seems to me to be a very simple SQL query?" meta.stackoverflow.com/questions/333952/…
    – Raymond Nijland
    Nov 16 '18 at 13:12


















3














I'm trying to work out how to create a solution that will allow me to query a table that has a timestamp, and in return get a time series data. The request consists of start/end date & time, granularity type (minute, hour, day, week, month and year) and granularity value. Having tried to use in a query something like



GROUP BY ROUND(UNIX_TIMESTAMP(created_at) DIV 60)


to get the results per one minute, or DIV 300 for every five minutes is fine. The problem lies further up for calculating months and years' seconds which will be inaccurate. I've stumbled upon the generate_series in PGSQL (MySQL alternative) and am stuck trying to tie them together. How do I calculate a count of rows, for example, for two days, on a 15 minute granularity? It's a complex question that I'll probably have to break down further.



I have already visited #1 and #2, but they are incomplete.
To me it seems that rounding will only be allowed to certain level and I'd have to restrict it (i.e .for 2 months period there cannot be hourly breakdown).



EDIT



It gave me the wrong impression - I would not have to calculate monthly figures based on seconds using the query like:



SELECT DATE_FORMAT(MIN(created_at),'%d/%m/%Y %H:%i:%s' as date,
COUNT(*) AS count FROM guests
GROUP BY ROUND(UNIX_TIMESTAMP(created_at) / 300)


It's only going to do grouping based on minimum value. But the question still stands - is the best approach really to go through the time period using granularity value and "slice" the data that way without loosing too much accuracy?



It seems that the only approach is to run sub-queries for a set of data (i.e. for a period of two months, generate 15 minute intervals timestamps, group the data into them and produce an aggregate) without dividing the original timestamp to produce the rounded approximation.










share|improve this question
























  • Can you please include some sample data in an sqlfiddle or something, then express your desired output. Context improves clarity/understanding.
    – mickmackusa
    Nov 16 '18 at 12:53










  • The data is stored through timestamps, it would be difficult to provide with proper example as there may be thousands of records. I'll edit the question with possible examples.
    – Daniel Protopopov
    Nov 16 '18 at 12:56










  • here is a basic structure of a MySQL number generator what you can use a subquery db-fiddle.com/f/mVJSpFHja4unD4fAsihADv/0 .. There are some out there but using a MySQL user variable is the most easy.. MySQL 8.0 is even more easy because it's supports recursive CTE's which can also be used to generate a number list... generating a date serie with it db-fiddle.com/f/mVJSpFHja4unD4fAsihADv/3
    – Raymond Nijland
    Nov 16 '18 at 13:09












  • @RaymondNijland, the question is not how to generate the time series, but how to use it in conjunction with data that has timestamps to produce accurate time series data.
    – Daniel Protopopov
    Nov 16 '18 at 13:10












  • "the question is not how to generate the time series, but how to use it in conjunction with data that has timestamps to produce accurate time series data." Ok in that case read "Why should I provide an MCVE for what seems to me to be a very simple SQL query?" meta.stackoverflow.com/questions/333952/…
    – Raymond Nijland
    Nov 16 '18 at 13:12
















3












3








3


1





I'm trying to work out how to create a solution that will allow me to query a table that has a timestamp, and in return get a time series data. The request consists of start/end date & time, granularity type (minute, hour, day, week, month and year) and granularity value. Having tried to use in a query something like



GROUP BY ROUND(UNIX_TIMESTAMP(created_at) DIV 60)


to get the results per one minute, or DIV 300 for every five minutes is fine. The problem lies further up for calculating months and years' seconds which will be inaccurate. I've stumbled upon the generate_series in PGSQL (MySQL alternative) and am stuck trying to tie them together. How do I calculate a count of rows, for example, for two days, on a 15 minute granularity? It's a complex question that I'll probably have to break down further.



I have already visited #1 and #2, but they are incomplete.
To me it seems that rounding will only be allowed to certain level and I'd have to restrict it (i.e .for 2 months period there cannot be hourly breakdown).



EDIT



It gave me the wrong impression - I would not have to calculate monthly figures based on seconds using the query like:



SELECT DATE_FORMAT(MIN(created_at),'%d/%m/%Y %H:%i:%s' as date,
COUNT(*) AS count FROM guests
GROUP BY ROUND(UNIX_TIMESTAMP(created_at) / 300)


It's only going to do grouping based on minimum value. But the question still stands - is the best approach really to go through the time period using granularity value and "slice" the data that way without loosing too much accuracy?



It seems that the only approach is to run sub-queries for a set of data (i.e. for a period of two months, generate 15 minute intervals timestamps, group the data into them and produce an aggregate) without dividing the original timestamp to produce the rounded approximation.










share|improve this question















I'm trying to work out how to create a solution that will allow me to query a table that has a timestamp, and in return get a time series data. The request consists of start/end date & time, granularity type (minute, hour, day, week, month and year) and granularity value. Having tried to use in a query something like



GROUP BY ROUND(UNIX_TIMESTAMP(created_at) DIV 60)


to get the results per one minute, or DIV 300 for every five minutes is fine. The problem lies further up for calculating months and years' seconds which will be inaccurate. I've stumbled upon the generate_series in PGSQL (MySQL alternative) and am stuck trying to tie them together. How do I calculate a count of rows, for example, for two days, on a 15 minute granularity? It's a complex question that I'll probably have to break down further.



I have already visited #1 and #2, but they are incomplete.
To me it seems that rounding will only be allowed to certain level and I'd have to restrict it (i.e .for 2 months period there cannot be hourly breakdown).



EDIT



It gave me the wrong impression - I would not have to calculate monthly figures based on seconds using the query like:



SELECT DATE_FORMAT(MIN(created_at),'%d/%m/%Y %H:%i:%s' as date,
COUNT(*) AS count FROM guests
GROUP BY ROUND(UNIX_TIMESTAMP(created_at) / 300)


It's only going to do grouping based on minimum value. But the question still stands - is the best approach really to go through the time period using granularity value and "slice" the data that way without loosing too much accuracy?



It seems that the only approach is to run sub-queries for a set of data (i.e. for a period of two months, generate 15 minute intervals timestamps, group the data into them and produce an aggregate) without dividing the original timestamp to produce the rounded approximation.







mysql aggregate-functions datestamp






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Nov 16 '18 at 18:38









miken32

23.5k84672




23.5k84672










asked Nov 16 '18 at 12:46









Daniel Protopopov

2,76721526




2,76721526












  • Can you please include some sample data in an sqlfiddle or something, then express your desired output. Context improves clarity/understanding.
    – mickmackusa
    Nov 16 '18 at 12:53










  • The data is stored through timestamps, it would be difficult to provide with proper example as there may be thousands of records. I'll edit the question with possible examples.
    – Daniel Protopopov
    Nov 16 '18 at 12:56










  • here is a basic structure of a MySQL number generator what you can use a subquery db-fiddle.com/f/mVJSpFHja4unD4fAsihADv/0 .. There are some out there but using a MySQL user variable is the most easy.. MySQL 8.0 is even more easy because it's supports recursive CTE's which can also be used to generate a number list... generating a date serie with it db-fiddle.com/f/mVJSpFHja4unD4fAsihADv/3
    – Raymond Nijland
    Nov 16 '18 at 13:09












  • @RaymondNijland, the question is not how to generate the time series, but how to use it in conjunction with data that has timestamps to produce accurate time series data.
    – Daniel Protopopov
    Nov 16 '18 at 13:10












  • "the question is not how to generate the time series, but how to use it in conjunction with data that has timestamps to produce accurate time series data." Ok in that case read "Why should I provide an MCVE for what seems to me to be a very simple SQL query?" meta.stackoverflow.com/questions/333952/…
    – Raymond Nijland
    Nov 16 '18 at 13:12




















  • Can you please include some sample data in an sqlfiddle or something, then express your desired output. Context improves clarity/understanding.
    – mickmackusa
    Nov 16 '18 at 12:53










  • The data is stored through timestamps, it would be difficult to provide with proper example as there may be thousands of records. I'll edit the question with possible examples.
    – Daniel Protopopov
    Nov 16 '18 at 12:56










  • here is a basic structure of a MySQL number generator what you can use a subquery db-fiddle.com/f/mVJSpFHja4unD4fAsihADv/0 .. There are some out there but using a MySQL user variable is the most easy.. MySQL 8.0 is even more easy because it's supports recursive CTE's which can also be used to generate a number list... generating a date serie with it db-fiddle.com/f/mVJSpFHja4unD4fAsihADv/3
    – Raymond Nijland
    Nov 16 '18 at 13:09












  • @RaymondNijland, the question is not how to generate the time series, but how to use it in conjunction with data that has timestamps to produce accurate time series data.
    – Daniel Protopopov
    Nov 16 '18 at 13:10












  • "the question is not how to generate the time series, but how to use it in conjunction with data that has timestamps to produce accurate time series data." Ok in that case read "Why should I provide an MCVE for what seems to me to be a very simple SQL query?" meta.stackoverflow.com/questions/333952/…
    – Raymond Nijland
    Nov 16 '18 at 13:12


















Can you please include some sample data in an sqlfiddle or something, then express your desired output. Context improves clarity/understanding.
– mickmackusa
Nov 16 '18 at 12:53




Can you please include some sample data in an sqlfiddle or something, then express your desired output. Context improves clarity/understanding.
– mickmackusa
Nov 16 '18 at 12:53












The data is stored through timestamps, it would be difficult to provide with proper example as there may be thousands of records. I'll edit the question with possible examples.
– Daniel Protopopov
Nov 16 '18 at 12:56




The data is stored through timestamps, it would be difficult to provide with proper example as there may be thousands of records. I'll edit the question with possible examples.
– Daniel Protopopov
Nov 16 '18 at 12:56












here is a basic structure of a MySQL number generator what you can use a subquery db-fiddle.com/f/mVJSpFHja4unD4fAsihADv/0 .. There are some out there but using a MySQL user variable is the most easy.. MySQL 8.0 is even more easy because it's supports recursive CTE's which can also be used to generate a number list... generating a date serie with it db-fiddle.com/f/mVJSpFHja4unD4fAsihADv/3
– Raymond Nijland
Nov 16 '18 at 13:09






here is a basic structure of a MySQL number generator what you can use a subquery db-fiddle.com/f/mVJSpFHja4unD4fAsihADv/0 .. There are some out there but using a MySQL user variable is the most easy.. MySQL 8.0 is even more easy because it's supports recursive CTE's which can also be used to generate a number list... generating a date serie with it db-fiddle.com/f/mVJSpFHja4unD4fAsihADv/3
– Raymond Nijland
Nov 16 '18 at 13:09














@RaymondNijland, the question is not how to generate the time series, but how to use it in conjunction with data that has timestamps to produce accurate time series data.
– Daniel Protopopov
Nov 16 '18 at 13:10






@RaymondNijland, the question is not how to generate the time series, but how to use it in conjunction with data that has timestamps to produce accurate time series data.
– Daniel Protopopov
Nov 16 '18 at 13:10














"the question is not how to generate the time series, but how to use it in conjunction with data that has timestamps to produce accurate time series data." Ok in that case read "Why should I provide an MCVE for what seems to me to be a very simple SQL query?" meta.stackoverflow.com/questions/333952/…
– Raymond Nijland
Nov 16 '18 at 13:12






"the question is not how to generate the time series, but how to use it in conjunction with data that has timestamps to produce accurate time series data." Ok in that case read "Why should I provide an MCVE for what seems to me to be a very simple SQL query?" meta.stackoverflow.com/questions/333952/…
– Raymond Nijland
Nov 16 '18 at 13:12














1 Answer
1






active

oldest

votes


















1














Let's say you have a gigantic table measure with two columns datestamp and temp.



Let's say you want to see the temperature every six minutes (10x per hour) for the last week. You can do this sort of thing. We'll get to defining trunc in a moment.



  SELECT trunc(datestamp) datestamp, AVG(temp) temp
FROM measure
WHERE datestamp >= CURDATE() - INVERVAL 7 DAY
GROUP BY trunc(datestamp)
ORDER BY trunc(datestamp)


That works for any reasonable definition of trunc. In this case trunc(t) returns the beginning of the six-minute period in which t occurs. So, trunc('1942-12-07 08:45:17') gives 1942-12-07 08:42:00).



Here's a query that works for every six minute interval.



  SELECT DATE_FORMAT(datestamp,'%Y-%m-%d %H:00') +
INTERVAL (MINUTE(datestamp) -
MINUTE(datestamp) MOD 6) datestamp,
AVG(temp) temp
FROM measure
WHERE datestamp >= CURDATE() - INVERVAL 7 DAY
GROUP BY DATE_FORMAT(datestamp,'%Y-%m-%d %H:00') +
INTERVAL (MINUTE(datestamp) -
MINUTE(datestamp) MOD 6)
ORDER BY 1


This uses inbuilt date arithmetic rather than unix timestamp arithmetic.



You can use a stored function to make this easier to read.



DELIMITER $$
DROP FUNCTION IF EXISTS TRUNC_N_MINUTES$$
CREATE
FUNCTION TRUNC_N_MINUTES(datestamp DATETIME, n INT)
RETURNS DATETIME DETERMINISTIC NO SQL
COMMENT 'truncate to N minute boundary. For example,
TRUNCATE_N_MINUTES(sometime, 15) gives the nearest
preceding quarter hour'
RETURN DATE_FORMAT(datestamp,'%Y-%m-%d %H:00') +
INTERVAL (MINUTE(datestamp) -
MINUTE(datestamp) MOD n) MINUTE$$
DELIMITER ;


Then your query will say



  SELECT TRUNC_N_MINUTES(datestamp, 6) datestamp, AVG(temp) temp
FROM measure
WHERE datestamp >= CURDATE() - INVERVAL 7 DAY
GROUP BY TRUNC_N_MINUTES(datestamp, 6)
ORDER BY TRUNC_N_MINUTES(datestamp, 6)


If you want to summarize by 5, 10, 15, or minute boundaries (three items per hour) simply use that number in place of 6.



You'll need different trunc() functions for hours, etc.



The trunc() function for daily summaries is DATE(datestamp).
For monthly summaries it is LAST_DAY(datestamp). For example,



  SELECT LAST_DAY(datestamp) month_ending, AVG(temp) temp
FROM measure
GROUP BY LAST_DAY(datestamp)
ORDER BY LAST_DAY(datestamp)


yields a month-by-month summary.






share|improve this answer























  • I’ll give it a try, however my main concern is due to inconsistent number of days (and lower timeframes) that’s done when doing any calculations with the time stamp. Because of that I tend to believe that records with time level granularity (and up to week) can be done with the time stamp, while date based ones (day and above) have to be done with grouping by those metrics and calculating aggregate of that.
    – Daniel Protopopov
    Nov 16 '18 at 19:08










  • It's certainly true that grouping by month has the flaw that months are different lengths. If your data are TIMESTAMPS and you are in a jurisdiction with daylight saving time, you'll get anomalous results on the changeover days.
    – O. Jones
    Nov 16 '18 at 20:52











Your Answer






StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});


}
});














draft saved

draft discarded


















StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53338217%2fgenerating-time-series-reports%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown

























1 Answer
1






active

oldest

votes








1 Answer
1






active

oldest

votes









active

oldest

votes






active

oldest

votes









1














Let's say you have a gigantic table measure with two columns datestamp and temp.



Let's say you want to see the temperature every six minutes (10x per hour) for the last week. You can do this sort of thing. We'll get to defining trunc in a moment.



  SELECT trunc(datestamp) datestamp, AVG(temp) temp
FROM measure
WHERE datestamp >= CURDATE() - INVERVAL 7 DAY
GROUP BY trunc(datestamp)
ORDER BY trunc(datestamp)


That works for any reasonable definition of trunc. In this case trunc(t) returns the beginning of the six-minute period in which t occurs. So, trunc('1942-12-07 08:45:17') gives 1942-12-07 08:42:00).



Here's a query that works for every six minute interval.



  SELECT DATE_FORMAT(datestamp,'%Y-%m-%d %H:00') +
INTERVAL (MINUTE(datestamp) -
MINUTE(datestamp) MOD 6) datestamp,
AVG(temp) temp
FROM measure
WHERE datestamp >= CURDATE() - INVERVAL 7 DAY
GROUP BY DATE_FORMAT(datestamp,'%Y-%m-%d %H:00') +
INTERVAL (MINUTE(datestamp) -
MINUTE(datestamp) MOD 6)
ORDER BY 1


This uses inbuilt date arithmetic rather than unix timestamp arithmetic.



You can use a stored function to make this easier to read.



DELIMITER $$
DROP FUNCTION IF EXISTS TRUNC_N_MINUTES$$
CREATE
FUNCTION TRUNC_N_MINUTES(datestamp DATETIME, n INT)
RETURNS DATETIME DETERMINISTIC NO SQL
COMMENT 'truncate to N minute boundary. For example,
TRUNCATE_N_MINUTES(sometime, 15) gives the nearest
preceding quarter hour'
RETURN DATE_FORMAT(datestamp,'%Y-%m-%d %H:00') +
INTERVAL (MINUTE(datestamp) -
MINUTE(datestamp) MOD n) MINUTE$$
DELIMITER ;


Then your query will say



  SELECT TRUNC_N_MINUTES(datestamp, 6) datestamp, AVG(temp) temp
FROM measure
WHERE datestamp >= CURDATE() - INVERVAL 7 DAY
GROUP BY TRUNC_N_MINUTES(datestamp, 6)
ORDER BY TRUNC_N_MINUTES(datestamp, 6)


If you want to summarize by 5, 10, 15, or minute boundaries (three items per hour) simply use that number in place of 6.



You'll need different trunc() functions for hours, etc.



The trunc() function for daily summaries is DATE(datestamp).
For monthly summaries it is LAST_DAY(datestamp). For example,



  SELECT LAST_DAY(datestamp) month_ending, AVG(temp) temp
FROM measure
GROUP BY LAST_DAY(datestamp)
ORDER BY LAST_DAY(datestamp)


yields a month-by-month summary.






share|improve this answer























  • I’ll give it a try, however my main concern is due to inconsistent number of days (and lower timeframes) that’s done when doing any calculations with the time stamp. Because of that I tend to believe that records with time level granularity (and up to week) can be done with the time stamp, while date based ones (day and above) have to be done with grouping by those metrics and calculating aggregate of that.
    – Daniel Protopopov
    Nov 16 '18 at 19:08










  • It's certainly true that grouping by month has the flaw that months are different lengths. If your data are TIMESTAMPS and you are in a jurisdiction with daylight saving time, you'll get anomalous results on the changeover days.
    – O. Jones
    Nov 16 '18 at 20:52
















1














Let's say you have a gigantic table measure with two columns datestamp and temp.



Let's say you want to see the temperature every six minutes (10x per hour) for the last week. You can do this sort of thing. We'll get to defining trunc in a moment.



  SELECT trunc(datestamp) datestamp, AVG(temp) temp
FROM measure
WHERE datestamp >= CURDATE() - INVERVAL 7 DAY
GROUP BY trunc(datestamp)
ORDER BY trunc(datestamp)


That works for any reasonable definition of trunc. In this case trunc(t) returns the beginning of the six-minute period in which t occurs. So, trunc('1942-12-07 08:45:17') gives 1942-12-07 08:42:00).



Here's a query that works for every six minute interval.



  SELECT DATE_FORMAT(datestamp,'%Y-%m-%d %H:00') +
INTERVAL (MINUTE(datestamp) -
MINUTE(datestamp) MOD 6) datestamp,
AVG(temp) temp
FROM measure
WHERE datestamp >= CURDATE() - INVERVAL 7 DAY
GROUP BY DATE_FORMAT(datestamp,'%Y-%m-%d %H:00') +
INTERVAL (MINUTE(datestamp) -
MINUTE(datestamp) MOD 6)
ORDER BY 1


This uses inbuilt date arithmetic rather than unix timestamp arithmetic.



You can use a stored function to make this easier to read.



DELIMITER $$
DROP FUNCTION IF EXISTS TRUNC_N_MINUTES$$
CREATE
FUNCTION TRUNC_N_MINUTES(datestamp DATETIME, n INT)
RETURNS DATETIME DETERMINISTIC NO SQL
COMMENT 'truncate to N minute boundary. For example,
TRUNCATE_N_MINUTES(sometime, 15) gives the nearest
preceding quarter hour'
RETURN DATE_FORMAT(datestamp,'%Y-%m-%d %H:00') +
INTERVAL (MINUTE(datestamp) -
MINUTE(datestamp) MOD n) MINUTE$$
DELIMITER ;


Then your query will say



  SELECT TRUNC_N_MINUTES(datestamp, 6) datestamp, AVG(temp) temp
FROM measure
WHERE datestamp >= CURDATE() - INVERVAL 7 DAY
GROUP BY TRUNC_N_MINUTES(datestamp, 6)
ORDER BY TRUNC_N_MINUTES(datestamp, 6)


If you want to summarize by 5, 10, 15, or minute boundaries (three items per hour) simply use that number in place of 6.



You'll need different trunc() functions for hours, etc.



The trunc() function for daily summaries is DATE(datestamp).
For monthly summaries it is LAST_DAY(datestamp). For example,



  SELECT LAST_DAY(datestamp) month_ending, AVG(temp) temp
FROM measure
GROUP BY LAST_DAY(datestamp)
ORDER BY LAST_DAY(datestamp)


yields a month-by-month summary.






share|improve this answer























  • I’ll give it a try, however my main concern is due to inconsistent number of days (and lower timeframes) that’s done when doing any calculations with the time stamp. Because of that I tend to believe that records with time level granularity (and up to week) can be done with the time stamp, while date based ones (day and above) have to be done with grouping by those metrics and calculating aggregate of that.
    – Daniel Protopopov
    Nov 16 '18 at 19:08










  • It's certainly true that grouping by month has the flaw that months are different lengths. If your data are TIMESTAMPS and you are in a jurisdiction with daylight saving time, you'll get anomalous results on the changeover days.
    – O. Jones
    Nov 16 '18 at 20:52














1












1








1






Let's say you have a gigantic table measure with two columns datestamp and temp.



Let's say you want to see the temperature every six minutes (10x per hour) for the last week. You can do this sort of thing. We'll get to defining trunc in a moment.



  SELECT trunc(datestamp) datestamp, AVG(temp) temp
FROM measure
WHERE datestamp >= CURDATE() - INVERVAL 7 DAY
GROUP BY trunc(datestamp)
ORDER BY trunc(datestamp)


That works for any reasonable definition of trunc. In this case trunc(t) returns the beginning of the six-minute period in which t occurs. So, trunc('1942-12-07 08:45:17') gives 1942-12-07 08:42:00).



Here's a query that works for every six minute interval.



  SELECT DATE_FORMAT(datestamp,'%Y-%m-%d %H:00') +
INTERVAL (MINUTE(datestamp) -
MINUTE(datestamp) MOD 6) datestamp,
AVG(temp) temp
FROM measure
WHERE datestamp >= CURDATE() - INVERVAL 7 DAY
GROUP BY DATE_FORMAT(datestamp,'%Y-%m-%d %H:00') +
INTERVAL (MINUTE(datestamp) -
MINUTE(datestamp) MOD 6)
ORDER BY 1


This uses inbuilt date arithmetic rather than unix timestamp arithmetic.



You can use a stored function to make this easier to read.



DELIMITER $$
DROP FUNCTION IF EXISTS TRUNC_N_MINUTES$$
CREATE
FUNCTION TRUNC_N_MINUTES(datestamp DATETIME, n INT)
RETURNS DATETIME DETERMINISTIC NO SQL
COMMENT 'truncate to N minute boundary. For example,
TRUNCATE_N_MINUTES(sometime, 15) gives the nearest
preceding quarter hour'
RETURN DATE_FORMAT(datestamp,'%Y-%m-%d %H:00') +
INTERVAL (MINUTE(datestamp) -
MINUTE(datestamp) MOD n) MINUTE$$
DELIMITER ;


Then your query will say



  SELECT TRUNC_N_MINUTES(datestamp, 6) datestamp, AVG(temp) temp
FROM measure
WHERE datestamp >= CURDATE() - INVERVAL 7 DAY
GROUP BY TRUNC_N_MINUTES(datestamp, 6)
ORDER BY TRUNC_N_MINUTES(datestamp, 6)


If you want to summarize by 5, 10, 15, or minute boundaries (three items per hour) simply use that number in place of 6.



You'll need different trunc() functions for hours, etc.



The trunc() function for daily summaries is DATE(datestamp).
For monthly summaries it is LAST_DAY(datestamp). For example,



  SELECT LAST_DAY(datestamp) month_ending, AVG(temp) temp
FROM measure
GROUP BY LAST_DAY(datestamp)
ORDER BY LAST_DAY(datestamp)


yields a month-by-month summary.






share|improve this answer














Let's say you have a gigantic table measure with two columns datestamp and temp.



Let's say you want to see the temperature every six minutes (10x per hour) for the last week. You can do this sort of thing. We'll get to defining trunc in a moment.



  SELECT trunc(datestamp) datestamp, AVG(temp) temp
FROM measure
WHERE datestamp >= CURDATE() - INVERVAL 7 DAY
GROUP BY trunc(datestamp)
ORDER BY trunc(datestamp)


That works for any reasonable definition of trunc. In this case trunc(t) returns the beginning of the six-minute period in which t occurs. So, trunc('1942-12-07 08:45:17') gives 1942-12-07 08:42:00).



Here's a query that works for every six minute interval.



  SELECT DATE_FORMAT(datestamp,'%Y-%m-%d %H:00') +
INTERVAL (MINUTE(datestamp) -
MINUTE(datestamp) MOD 6) datestamp,
AVG(temp) temp
FROM measure
WHERE datestamp >= CURDATE() - INVERVAL 7 DAY
GROUP BY DATE_FORMAT(datestamp,'%Y-%m-%d %H:00') +
INTERVAL (MINUTE(datestamp) -
MINUTE(datestamp) MOD 6)
ORDER BY 1


This uses inbuilt date arithmetic rather than unix timestamp arithmetic.



You can use a stored function to make this easier to read.



DELIMITER $$
DROP FUNCTION IF EXISTS TRUNC_N_MINUTES$$
CREATE
FUNCTION TRUNC_N_MINUTES(datestamp DATETIME, n INT)
RETURNS DATETIME DETERMINISTIC NO SQL
COMMENT 'truncate to N minute boundary. For example,
TRUNCATE_N_MINUTES(sometime, 15) gives the nearest
preceding quarter hour'
RETURN DATE_FORMAT(datestamp,'%Y-%m-%d %H:00') +
INTERVAL (MINUTE(datestamp) -
MINUTE(datestamp) MOD n) MINUTE$$
DELIMITER ;


Then your query will say



  SELECT TRUNC_N_MINUTES(datestamp, 6) datestamp, AVG(temp) temp
FROM measure
WHERE datestamp >= CURDATE() - INVERVAL 7 DAY
GROUP BY TRUNC_N_MINUTES(datestamp, 6)
ORDER BY TRUNC_N_MINUTES(datestamp, 6)


If you want to summarize by 5, 10, 15, or minute boundaries (three items per hour) simply use that number in place of 6.



You'll need different trunc() functions for hours, etc.



The trunc() function for daily summaries is DATE(datestamp).
For monthly summaries it is LAST_DAY(datestamp). For example,



  SELECT LAST_DAY(datestamp) month_ending, AVG(temp) temp
FROM measure
GROUP BY LAST_DAY(datestamp)
ORDER BY LAST_DAY(datestamp)


yields a month-by-month summary.







share|improve this answer














share|improve this answer



share|improve this answer








edited Nov 16 '18 at 20:54

























answered Nov 16 '18 at 18:34









O. Jones

59.4k971106




59.4k971106












  • I’ll give it a try, however my main concern is due to inconsistent number of days (and lower timeframes) that’s done when doing any calculations with the time stamp. Because of that I tend to believe that records with time level granularity (and up to week) can be done with the time stamp, while date based ones (day and above) have to be done with grouping by those metrics and calculating aggregate of that.
    – Daniel Protopopov
    Nov 16 '18 at 19:08










  • It's certainly true that grouping by month has the flaw that months are different lengths. If your data are TIMESTAMPS and you are in a jurisdiction with daylight saving time, you'll get anomalous results on the changeover days.
    – O. Jones
    Nov 16 '18 at 20:52


















  • I’ll give it a try, however my main concern is due to inconsistent number of days (and lower timeframes) that’s done when doing any calculations with the time stamp. Because of that I tend to believe that records with time level granularity (and up to week) can be done with the time stamp, while date based ones (day and above) have to be done with grouping by those metrics and calculating aggregate of that.
    – Daniel Protopopov
    Nov 16 '18 at 19:08










  • It's certainly true that grouping by month has the flaw that months are different lengths. If your data are TIMESTAMPS and you are in a jurisdiction with daylight saving time, you'll get anomalous results on the changeover days.
    – O. Jones
    Nov 16 '18 at 20:52
















I’ll give it a try, however my main concern is due to inconsistent number of days (and lower timeframes) that’s done when doing any calculations with the time stamp. Because of that I tend to believe that records with time level granularity (and up to week) can be done with the time stamp, while date based ones (day and above) have to be done with grouping by those metrics and calculating aggregate of that.
– Daniel Protopopov
Nov 16 '18 at 19:08




I’ll give it a try, however my main concern is due to inconsistent number of days (and lower timeframes) that’s done when doing any calculations with the time stamp. Because of that I tend to believe that records with time level granularity (and up to week) can be done with the time stamp, while date based ones (day and above) have to be done with grouping by those metrics and calculating aggregate of that.
– Daniel Protopopov
Nov 16 '18 at 19:08












It's certainly true that grouping by month has the flaw that months are different lengths. If your data are TIMESTAMPS and you are in a jurisdiction with daylight saving time, you'll get anomalous results on the changeover days.
– O. Jones
Nov 16 '18 at 20:52




It's certainly true that grouping by month has the flaw that months are different lengths. If your data are TIMESTAMPS and you are in a jurisdiction with daylight saving time, you'll get anomalous results on the changeover days.
– O. Jones
Nov 16 '18 at 20:52


















draft saved

draft discarded




















































Thanks for contributing an answer to Stack Overflow!


  • Please be sure to answer the question. Provide details and share your research!

But avoid



  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.


To learn more, see our tips on writing great answers.





Some of your past answers have not been well-received, and you're in danger of being blocked from answering.


Please pay close attention to the following guidance:


  • Please be sure to answer the question. Provide details and share your research!

But avoid



  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.


To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53338217%2fgenerating-time-series-reports%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

How to change which sound is reproduced for terminal bell?

Can I use Tabulator js library in my java Spring + Thymeleaf project?

Title Spacing in Bjornstrup Chapter, Removing Chapter Number From Contents