Plotting the duration of exceedances of a pollutant standard in ggplot2












2















I have a data frame with a Date column and a column of calculated concentrations of a parameter. I am trying to plot a time-series plot that has scatter points of all the concentrations and then have a horizontal line showing the standard of the pollutant(which is 500). I can do this no problem. The problem I am having is trying to plot a line showing the duration of the exceedances > 500. I can't seem to find anything to solve my question. I would appreciate any guidance.



Sample Data:



df<-structure(list(Date_Time = structure(c(1480093200, 1482660000, 
1395651343, 1329823800, 1326929400, 1331233200, 1490130000, 1476138600,
1474070400, 1489393800, 1483272000, 1393515068, 1480471200, 1332680400,
1471226400, 1470853800, 1396124591, 1496250000, 1394581991, 1438177553,
1332108000, 1493051400, 1475949600, 1491024600, 1488832200, 1473697800,
1475404200, 1488511800, 1490212800, 1477040400, 1494793740, 1389346885,
1473933600, 1390611191, 1486551600, 1476475200, 1473593400, 1388854543,
1327012200, 1493611140), class = c("POSIXct", "POSIXt"), tzone = "UTC"),
Calculated_TDS = c(271.3692, 634.3604, 634.246, 219.546,
674.286, 169.21, 506.118, 452.6932, 314.8412, 4640.3052,
358.0844, 734.918, 97.71, 460.358, 385.998, 283.9532, 370.554,
309.2356, 296.766, 137.079616, 24.494, 383.996, 321.2476,
784.6248, 642.1396, 1320.7032, 213.254, 462.1884, 547.6452,
376.274, 195.1216, 595.35, 320.1608, 411.166, 882.5512, 288.5292,
533.574, 1000.326, 124.022, 256.6116)), row.names = c(NA,
-40L), class = c("tbl_df", "tbl", "data.frame"), .Names = c("Date_Time",
"Calculated_TDS"))


Code:



library(tidyverse)

test_df<-df%>%
mutate(greater = Calculated_TDS > 500)%>%
group_by(Date_Time,Calculated_TDS)%>%
summarize(n_greater = sum(greater), duration = length(Date_Time))


plot<-ggplot() +
geom_point(data = test_df , aes(x = Date_Time, y = Calculated_TDS))+
geom_line(data= test_df,aes(x=Date_Time, y = duration),stat="identity")+
geom_hline(aes(yintercept = 500,color="red"),size=1.3)

plot


I know what I have doesn't make sense but I don't understand how to find the duration of exceedances.



enter image description here










share|improve this question

























  • FYI, you code didn't run. Can you draw your expected output in Paint or Word?

    – Tung
    Nov 19 '18 at 17:49











  • Are you looking for something similar to flow duration curve? stackoverflow.com/questions/52831687/…

    – Tung
    Nov 19 '18 at 17:51











  • @Tung fixed code sorry, should work now. That's not quite what I want. I will try to draw expected output

    – NBE
    Nov 19 '18 at 17:53











  • @Tung I included expected output. The lines should be the time duration of each event that went over the standard of 500.

    – NBE
    Nov 19 '18 at 18:04











  • Can you pls define for us what an "event" is? Let's say a reading is over 500, as is the case about 35% of the time in your sample. When should the duration start and when should it end? As long as the readings continue to be over 500? In your drawing it looks like there is grouping that includes readings before and after the > 500 readings.

    – Jon Spring
    Nov 19 '18 at 18:31


















2















I have a data frame with a Date column and a column of calculated concentrations of a parameter. I am trying to plot a time-series plot that has scatter points of all the concentrations and then have a horizontal line showing the standard of the pollutant(which is 500). I can do this no problem. The problem I am having is trying to plot a line showing the duration of the exceedances > 500. I can't seem to find anything to solve my question. I would appreciate any guidance.



Sample Data:



df<-structure(list(Date_Time = structure(c(1480093200, 1482660000, 
1395651343, 1329823800, 1326929400, 1331233200, 1490130000, 1476138600,
1474070400, 1489393800, 1483272000, 1393515068, 1480471200, 1332680400,
1471226400, 1470853800, 1396124591, 1496250000, 1394581991, 1438177553,
1332108000, 1493051400, 1475949600, 1491024600, 1488832200, 1473697800,
1475404200, 1488511800, 1490212800, 1477040400, 1494793740, 1389346885,
1473933600, 1390611191, 1486551600, 1476475200, 1473593400, 1388854543,
1327012200, 1493611140), class = c("POSIXct", "POSIXt"), tzone = "UTC"),
Calculated_TDS = c(271.3692, 634.3604, 634.246, 219.546,
674.286, 169.21, 506.118, 452.6932, 314.8412, 4640.3052,
358.0844, 734.918, 97.71, 460.358, 385.998, 283.9532, 370.554,
309.2356, 296.766, 137.079616, 24.494, 383.996, 321.2476,
784.6248, 642.1396, 1320.7032, 213.254, 462.1884, 547.6452,
376.274, 195.1216, 595.35, 320.1608, 411.166, 882.5512, 288.5292,
533.574, 1000.326, 124.022, 256.6116)), row.names = c(NA,
-40L), class = c("tbl_df", "tbl", "data.frame"), .Names = c("Date_Time",
"Calculated_TDS"))


Code:



library(tidyverse)

test_df<-df%>%
mutate(greater = Calculated_TDS > 500)%>%
group_by(Date_Time,Calculated_TDS)%>%
summarize(n_greater = sum(greater), duration = length(Date_Time))


plot<-ggplot() +
geom_point(data = test_df , aes(x = Date_Time, y = Calculated_TDS))+
geom_line(data= test_df,aes(x=Date_Time, y = duration),stat="identity")+
geom_hline(aes(yintercept = 500,color="red"),size=1.3)

plot


I know what I have doesn't make sense but I don't understand how to find the duration of exceedances.



enter image description here










share|improve this question

























  • FYI, you code didn't run. Can you draw your expected output in Paint or Word?

    – Tung
    Nov 19 '18 at 17:49











  • Are you looking for something similar to flow duration curve? stackoverflow.com/questions/52831687/…

    – Tung
    Nov 19 '18 at 17:51











  • @Tung fixed code sorry, should work now. That's not quite what I want. I will try to draw expected output

    – NBE
    Nov 19 '18 at 17:53











  • @Tung I included expected output. The lines should be the time duration of each event that went over the standard of 500.

    – NBE
    Nov 19 '18 at 18:04











  • Can you pls define for us what an "event" is? Let's say a reading is over 500, as is the case about 35% of the time in your sample. When should the duration start and when should it end? As long as the readings continue to be over 500? In your drawing it looks like there is grouping that includes readings before and after the > 500 readings.

    – Jon Spring
    Nov 19 '18 at 18:31
















2












2








2








I have a data frame with a Date column and a column of calculated concentrations of a parameter. I am trying to plot a time-series plot that has scatter points of all the concentrations and then have a horizontal line showing the standard of the pollutant(which is 500). I can do this no problem. The problem I am having is trying to plot a line showing the duration of the exceedances > 500. I can't seem to find anything to solve my question. I would appreciate any guidance.



Sample Data:



df<-structure(list(Date_Time = structure(c(1480093200, 1482660000, 
1395651343, 1329823800, 1326929400, 1331233200, 1490130000, 1476138600,
1474070400, 1489393800, 1483272000, 1393515068, 1480471200, 1332680400,
1471226400, 1470853800, 1396124591, 1496250000, 1394581991, 1438177553,
1332108000, 1493051400, 1475949600, 1491024600, 1488832200, 1473697800,
1475404200, 1488511800, 1490212800, 1477040400, 1494793740, 1389346885,
1473933600, 1390611191, 1486551600, 1476475200, 1473593400, 1388854543,
1327012200, 1493611140), class = c("POSIXct", "POSIXt"), tzone = "UTC"),
Calculated_TDS = c(271.3692, 634.3604, 634.246, 219.546,
674.286, 169.21, 506.118, 452.6932, 314.8412, 4640.3052,
358.0844, 734.918, 97.71, 460.358, 385.998, 283.9532, 370.554,
309.2356, 296.766, 137.079616, 24.494, 383.996, 321.2476,
784.6248, 642.1396, 1320.7032, 213.254, 462.1884, 547.6452,
376.274, 195.1216, 595.35, 320.1608, 411.166, 882.5512, 288.5292,
533.574, 1000.326, 124.022, 256.6116)), row.names = c(NA,
-40L), class = c("tbl_df", "tbl", "data.frame"), .Names = c("Date_Time",
"Calculated_TDS"))


Code:



library(tidyverse)

test_df<-df%>%
mutate(greater = Calculated_TDS > 500)%>%
group_by(Date_Time,Calculated_TDS)%>%
summarize(n_greater = sum(greater), duration = length(Date_Time))


plot<-ggplot() +
geom_point(data = test_df , aes(x = Date_Time, y = Calculated_TDS))+
geom_line(data= test_df,aes(x=Date_Time, y = duration),stat="identity")+
geom_hline(aes(yintercept = 500,color="red"),size=1.3)

plot


I know what I have doesn't make sense but I don't understand how to find the duration of exceedances.



enter image description here










share|improve this question
















I have a data frame with a Date column and a column of calculated concentrations of a parameter. I am trying to plot a time-series plot that has scatter points of all the concentrations and then have a horizontal line showing the standard of the pollutant(which is 500). I can do this no problem. The problem I am having is trying to plot a line showing the duration of the exceedances > 500. I can't seem to find anything to solve my question. I would appreciate any guidance.



Sample Data:



df<-structure(list(Date_Time = structure(c(1480093200, 1482660000, 
1395651343, 1329823800, 1326929400, 1331233200, 1490130000, 1476138600,
1474070400, 1489393800, 1483272000, 1393515068, 1480471200, 1332680400,
1471226400, 1470853800, 1396124591, 1496250000, 1394581991, 1438177553,
1332108000, 1493051400, 1475949600, 1491024600, 1488832200, 1473697800,
1475404200, 1488511800, 1490212800, 1477040400, 1494793740, 1389346885,
1473933600, 1390611191, 1486551600, 1476475200, 1473593400, 1388854543,
1327012200, 1493611140), class = c("POSIXct", "POSIXt"), tzone = "UTC"),
Calculated_TDS = c(271.3692, 634.3604, 634.246, 219.546,
674.286, 169.21, 506.118, 452.6932, 314.8412, 4640.3052,
358.0844, 734.918, 97.71, 460.358, 385.998, 283.9532, 370.554,
309.2356, 296.766, 137.079616, 24.494, 383.996, 321.2476,
784.6248, 642.1396, 1320.7032, 213.254, 462.1884, 547.6452,
376.274, 195.1216, 595.35, 320.1608, 411.166, 882.5512, 288.5292,
533.574, 1000.326, 124.022, 256.6116)), row.names = c(NA,
-40L), class = c("tbl_df", "tbl", "data.frame"), .Names = c("Date_Time",
"Calculated_TDS"))


Code:



library(tidyverse)

test_df<-df%>%
mutate(greater = Calculated_TDS > 500)%>%
group_by(Date_Time,Calculated_TDS)%>%
summarize(n_greater = sum(greater), duration = length(Date_Time))


plot<-ggplot() +
geom_point(data = test_df , aes(x = Date_Time, y = Calculated_TDS))+
geom_line(data= test_df,aes(x=Date_Time, y = duration),stat="identity")+
geom_hline(aes(yintercept = 500,color="red"),size=1.3)

plot


I know what I have doesn't make sense but I don't understand how to find the duration of exceedances.



enter image description here







r ggplot2 dplyr time-series






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Nov 19 '18 at 18:03







NBE

















asked Nov 19 '18 at 17:29









NBENBE

15110




15110













  • FYI, you code didn't run. Can you draw your expected output in Paint or Word?

    – Tung
    Nov 19 '18 at 17:49











  • Are you looking for something similar to flow duration curve? stackoverflow.com/questions/52831687/…

    – Tung
    Nov 19 '18 at 17:51











  • @Tung fixed code sorry, should work now. That's not quite what I want. I will try to draw expected output

    – NBE
    Nov 19 '18 at 17:53











  • @Tung I included expected output. The lines should be the time duration of each event that went over the standard of 500.

    – NBE
    Nov 19 '18 at 18:04











  • Can you pls define for us what an "event" is? Let's say a reading is over 500, as is the case about 35% of the time in your sample. When should the duration start and when should it end? As long as the readings continue to be over 500? In your drawing it looks like there is grouping that includes readings before and after the > 500 readings.

    – Jon Spring
    Nov 19 '18 at 18:31





















  • FYI, you code didn't run. Can you draw your expected output in Paint or Word?

    – Tung
    Nov 19 '18 at 17:49











  • Are you looking for something similar to flow duration curve? stackoverflow.com/questions/52831687/…

    – Tung
    Nov 19 '18 at 17:51











  • @Tung fixed code sorry, should work now. That's not quite what I want. I will try to draw expected output

    – NBE
    Nov 19 '18 at 17:53











  • @Tung I included expected output. The lines should be the time duration of each event that went over the standard of 500.

    – NBE
    Nov 19 '18 at 18:04











  • Can you pls define for us what an "event" is? Let's say a reading is over 500, as is the case about 35% of the time in your sample. When should the duration start and when should it end? As long as the readings continue to be over 500? In your drawing it looks like there is grouping that includes readings before and after the > 500 readings.

    – Jon Spring
    Nov 19 '18 at 18:31



















FYI, you code didn't run. Can you draw your expected output in Paint or Word?

– Tung
Nov 19 '18 at 17:49





FYI, you code didn't run. Can you draw your expected output in Paint or Word?

– Tung
Nov 19 '18 at 17:49













Are you looking for something similar to flow duration curve? stackoverflow.com/questions/52831687/…

– Tung
Nov 19 '18 at 17:51





Are you looking for something similar to flow duration curve? stackoverflow.com/questions/52831687/…

– Tung
Nov 19 '18 at 17:51













@Tung fixed code sorry, should work now. That's not quite what I want. I will try to draw expected output

– NBE
Nov 19 '18 at 17:53





@Tung fixed code sorry, should work now. That's not quite what I want. I will try to draw expected output

– NBE
Nov 19 '18 at 17:53













@Tung I included expected output. The lines should be the time duration of each event that went over the standard of 500.

– NBE
Nov 19 '18 at 18:04





@Tung I included expected output. The lines should be the time duration of each event that went over the standard of 500.

– NBE
Nov 19 '18 at 18:04













Can you pls define for us what an "event" is? Let's say a reading is over 500, as is the case about 35% of the time in your sample. When should the duration start and when should it end? As long as the readings continue to be over 500? In your drawing it looks like there is grouping that includes readings before and after the > 500 readings.

– Jon Spring
Nov 19 '18 at 18:31







Can you pls define for us what an "event" is? Let's say a reading is over 500, as is the case about 35% of the time in your sample. When should the duration start and when should it end? As long as the readings continue to be over 500? In your drawing it looks like there is grouping that includes readings before and after the > 500 readings.

– Jon Spring
Nov 19 '18 at 18:31














2 Answers
2






active

oldest

votes


















2














Andrew Lavers' answer is a good one. An alternative approach is to make groups for your geom_line and subset data within that geom.



test_df$group <- paste0("Group_", cumsum(c(1, diff(test_df$n_greater) != 0)))
test_df$duration <- ifelse(test_df$n_greater == 1, diff(test_df$Date_Time), 0)

# A tibble: 40 x 5
# Groups: Date_Time [?]
Date_Time Calculated_TDS n_greater duration group
<dttm> <dbl> <int> <dbl> <chr>
1 2012-01-18 23:30:00 674. 1 23 Group_1
2 2012-01-19 22:30:00 124. 0 0 Group_2
3 2012-02-21 11:30:00 220. 0 0 Group_2
4 2012-03-08 19:00:00 169. 0 0 Group_2
5 2012-03-18 22:00:00 24.5 0 0 Group_2
6 2012-03-25 13:00:00 460. 0 0 Group_2
7 2014-01-04 16:55:43 1000. 1 137. Group_3
8 2014-01-10 09:41:25 595. 1 351. Group_3
9 2014-01-25 00:53:11 411. 0 0 Group_4
10 2014-02-27 15:31:08 735. 1 296. Group_5
# ... with 30 more rows

ggplot() +
geom_point(data = test_df , aes(x = Date_Time, y = Calculated_TDS))+
geom_line(data = subset(test_df, Calculated_TDS > 500), aes(x=Date_Time, y = Calculated_TDS, group = group), stat="identity")+
geom_hline(aes(yintercept = 500,color="red"),size=1.3)


enter image description here






share|improve this answer


























  • thanks for your answer. Is there a way to make a new column saying how long each exceedance lasted?

    – NBE
    Nov 19 '18 at 19:06






  • 1





    You can use diff, see my edits.

    – Anonymous coward
    Nov 19 '18 at 21:37



















3














I am not quite sure what you want, but here is a starting point. The idea is simply to make columns for the exceeded and the below, filled with NA, and then plot.
Color assigned is red to the the exceeded line and blue to the below points. Note that the color = "red" should be outside the aes for the horizontal line. Only use color inside aes when the color should vary with the value.






library(tidyverse)



test_df <- df %>% 
mutate(greater = Calculated_TDS > 500,
exceed_value = if_else(greater, Calculated_TDS, as.numeric(NA)),
below_value = if_else(greater, as.numeric(NA), Calculated_TDS))

plot <- ggplot(data = test_df, aes(x = Date_Time)) +
geom_point(aes(y = exceed_value), color = "red") +
geom_point(aes(y = below_value), color = "blue") +
geom_line(aes(y = exceed_value),
color = "red") + geom_hline(aes(yintercept = 500), color = "red", size = 1.3)

print(plot)
#> Warning in as.POSIXlt.POSIXct(x): unknown timezone 'zone/tz/2018g.1.0/
#> zoneinfo/America/New_York'
#> Warning: Removed 26 rows containing missing values (geom_point).
#> Warning: Removed 14 rows containing missing values (geom_point).
#> Warning: Removed 4 rows containing missing values (geom_path).







share|improve this answer


























  • Thanks for your answer. Is there a way to get how long each event lasted in a new column?

    – NBE
    Nov 19 '18 at 19:01











Your Answer






StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});


}
});














draft saved

draft discarded


















StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53379823%2fplotting-the-duration-of-exceedances-of-a-pollutant-standard-in-ggplot2%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown

























2 Answers
2






active

oldest

votes








2 Answers
2






active

oldest

votes









active

oldest

votes






active

oldest

votes









2














Andrew Lavers' answer is a good one. An alternative approach is to make groups for your geom_line and subset data within that geom.



test_df$group <- paste0("Group_", cumsum(c(1, diff(test_df$n_greater) != 0)))
test_df$duration <- ifelse(test_df$n_greater == 1, diff(test_df$Date_Time), 0)

# A tibble: 40 x 5
# Groups: Date_Time [?]
Date_Time Calculated_TDS n_greater duration group
<dttm> <dbl> <int> <dbl> <chr>
1 2012-01-18 23:30:00 674. 1 23 Group_1
2 2012-01-19 22:30:00 124. 0 0 Group_2
3 2012-02-21 11:30:00 220. 0 0 Group_2
4 2012-03-08 19:00:00 169. 0 0 Group_2
5 2012-03-18 22:00:00 24.5 0 0 Group_2
6 2012-03-25 13:00:00 460. 0 0 Group_2
7 2014-01-04 16:55:43 1000. 1 137. Group_3
8 2014-01-10 09:41:25 595. 1 351. Group_3
9 2014-01-25 00:53:11 411. 0 0 Group_4
10 2014-02-27 15:31:08 735. 1 296. Group_5
# ... with 30 more rows

ggplot() +
geom_point(data = test_df , aes(x = Date_Time, y = Calculated_TDS))+
geom_line(data = subset(test_df, Calculated_TDS > 500), aes(x=Date_Time, y = Calculated_TDS, group = group), stat="identity")+
geom_hline(aes(yintercept = 500,color="red"),size=1.3)


enter image description here






share|improve this answer


























  • thanks for your answer. Is there a way to make a new column saying how long each exceedance lasted?

    – NBE
    Nov 19 '18 at 19:06






  • 1





    You can use diff, see my edits.

    – Anonymous coward
    Nov 19 '18 at 21:37
















2














Andrew Lavers' answer is a good one. An alternative approach is to make groups for your geom_line and subset data within that geom.



test_df$group <- paste0("Group_", cumsum(c(1, diff(test_df$n_greater) != 0)))
test_df$duration <- ifelse(test_df$n_greater == 1, diff(test_df$Date_Time), 0)

# A tibble: 40 x 5
# Groups: Date_Time [?]
Date_Time Calculated_TDS n_greater duration group
<dttm> <dbl> <int> <dbl> <chr>
1 2012-01-18 23:30:00 674. 1 23 Group_1
2 2012-01-19 22:30:00 124. 0 0 Group_2
3 2012-02-21 11:30:00 220. 0 0 Group_2
4 2012-03-08 19:00:00 169. 0 0 Group_2
5 2012-03-18 22:00:00 24.5 0 0 Group_2
6 2012-03-25 13:00:00 460. 0 0 Group_2
7 2014-01-04 16:55:43 1000. 1 137. Group_3
8 2014-01-10 09:41:25 595. 1 351. Group_3
9 2014-01-25 00:53:11 411. 0 0 Group_4
10 2014-02-27 15:31:08 735. 1 296. Group_5
# ... with 30 more rows

ggplot() +
geom_point(data = test_df , aes(x = Date_Time, y = Calculated_TDS))+
geom_line(data = subset(test_df, Calculated_TDS > 500), aes(x=Date_Time, y = Calculated_TDS, group = group), stat="identity")+
geom_hline(aes(yintercept = 500,color="red"),size=1.3)


enter image description here






share|improve this answer


























  • thanks for your answer. Is there a way to make a new column saying how long each exceedance lasted?

    – NBE
    Nov 19 '18 at 19:06






  • 1





    You can use diff, see my edits.

    – Anonymous coward
    Nov 19 '18 at 21:37














2












2








2







Andrew Lavers' answer is a good one. An alternative approach is to make groups for your geom_line and subset data within that geom.



test_df$group <- paste0("Group_", cumsum(c(1, diff(test_df$n_greater) != 0)))
test_df$duration <- ifelse(test_df$n_greater == 1, diff(test_df$Date_Time), 0)

# A tibble: 40 x 5
# Groups: Date_Time [?]
Date_Time Calculated_TDS n_greater duration group
<dttm> <dbl> <int> <dbl> <chr>
1 2012-01-18 23:30:00 674. 1 23 Group_1
2 2012-01-19 22:30:00 124. 0 0 Group_2
3 2012-02-21 11:30:00 220. 0 0 Group_2
4 2012-03-08 19:00:00 169. 0 0 Group_2
5 2012-03-18 22:00:00 24.5 0 0 Group_2
6 2012-03-25 13:00:00 460. 0 0 Group_2
7 2014-01-04 16:55:43 1000. 1 137. Group_3
8 2014-01-10 09:41:25 595. 1 351. Group_3
9 2014-01-25 00:53:11 411. 0 0 Group_4
10 2014-02-27 15:31:08 735. 1 296. Group_5
# ... with 30 more rows

ggplot() +
geom_point(data = test_df , aes(x = Date_Time, y = Calculated_TDS))+
geom_line(data = subset(test_df, Calculated_TDS > 500), aes(x=Date_Time, y = Calculated_TDS, group = group), stat="identity")+
geom_hline(aes(yintercept = 500,color="red"),size=1.3)


enter image description here






share|improve this answer















Andrew Lavers' answer is a good one. An alternative approach is to make groups for your geom_line and subset data within that geom.



test_df$group <- paste0("Group_", cumsum(c(1, diff(test_df$n_greater) != 0)))
test_df$duration <- ifelse(test_df$n_greater == 1, diff(test_df$Date_Time), 0)

# A tibble: 40 x 5
# Groups: Date_Time [?]
Date_Time Calculated_TDS n_greater duration group
<dttm> <dbl> <int> <dbl> <chr>
1 2012-01-18 23:30:00 674. 1 23 Group_1
2 2012-01-19 22:30:00 124. 0 0 Group_2
3 2012-02-21 11:30:00 220. 0 0 Group_2
4 2012-03-08 19:00:00 169. 0 0 Group_2
5 2012-03-18 22:00:00 24.5 0 0 Group_2
6 2012-03-25 13:00:00 460. 0 0 Group_2
7 2014-01-04 16:55:43 1000. 1 137. Group_3
8 2014-01-10 09:41:25 595. 1 351. Group_3
9 2014-01-25 00:53:11 411. 0 0 Group_4
10 2014-02-27 15:31:08 735. 1 296. Group_5
# ... with 30 more rows

ggplot() +
geom_point(data = test_df , aes(x = Date_Time, y = Calculated_TDS))+
geom_line(data = subset(test_df, Calculated_TDS > 500), aes(x=Date_Time, y = Calculated_TDS, group = group), stat="identity")+
geom_hline(aes(yintercept = 500,color="red"),size=1.3)


enter image description here







share|improve this answer














share|improve this answer



share|improve this answer








edited Nov 19 '18 at 21:39

























answered Nov 19 '18 at 18:49









Anonymous cowardAnonymous coward

1,4531919




1,4531919













  • thanks for your answer. Is there a way to make a new column saying how long each exceedance lasted?

    – NBE
    Nov 19 '18 at 19:06






  • 1





    You can use diff, see my edits.

    – Anonymous coward
    Nov 19 '18 at 21:37



















  • thanks for your answer. Is there a way to make a new column saying how long each exceedance lasted?

    – NBE
    Nov 19 '18 at 19:06






  • 1





    You can use diff, see my edits.

    – Anonymous coward
    Nov 19 '18 at 21:37

















thanks for your answer. Is there a way to make a new column saying how long each exceedance lasted?

– NBE
Nov 19 '18 at 19:06





thanks for your answer. Is there a way to make a new column saying how long each exceedance lasted?

– NBE
Nov 19 '18 at 19:06




1




1





You can use diff, see my edits.

– Anonymous coward
Nov 19 '18 at 21:37





You can use diff, see my edits.

– Anonymous coward
Nov 19 '18 at 21:37













3














I am not quite sure what you want, but here is a starting point. The idea is simply to make columns for the exceeded and the below, filled with NA, and then plot.
Color assigned is red to the the exceeded line and blue to the below points. Note that the color = "red" should be outside the aes for the horizontal line. Only use color inside aes when the color should vary with the value.






library(tidyverse)



test_df <- df %>% 
mutate(greater = Calculated_TDS > 500,
exceed_value = if_else(greater, Calculated_TDS, as.numeric(NA)),
below_value = if_else(greater, as.numeric(NA), Calculated_TDS))

plot <- ggplot(data = test_df, aes(x = Date_Time)) +
geom_point(aes(y = exceed_value), color = "red") +
geom_point(aes(y = below_value), color = "blue") +
geom_line(aes(y = exceed_value),
color = "red") + geom_hline(aes(yintercept = 500), color = "red", size = 1.3)

print(plot)
#> Warning in as.POSIXlt.POSIXct(x): unknown timezone 'zone/tz/2018g.1.0/
#> zoneinfo/America/New_York'
#> Warning: Removed 26 rows containing missing values (geom_point).
#> Warning: Removed 14 rows containing missing values (geom_point).
#> Warning: Removed 4 rows containing missing values (geom_path).







share|improve this answer


























  • Thanks for your answer. Is there a way to get how long each event lasted in a new column?

    – NBE
    Nov 19 '18 at 19:01
















3














I am not quite sure what you want, but here is a starting point. The idea is simply to make columns for the exceeded and the below, filled with NA, and then plot.
Color assigned is red to the the exceeded line and blue to the below points. Note that the color = "red" should be outside the aes for the horizontal line. Only use color inside aes when the color should vary with the value.






library(tidyverse)



test_df <- df %>% 
mutate(greater = Calculated_TDS > 500,
exceed_value = if_else(greater, Calculated_TDS, as.numeric(NA)),
below_value = if_else(greater, as.numeric(NA), Calculated_TDS))

plot <- ggplot(data = test_df, aes(x = Date_Time)) +
geom_point(aes(y = exceed_value), color = "red") +
geom_point(aes(y = below_value), color = "blue") +
geom_line(aes(y = exceed_value),
color = "red") + geom_hline(aes(yintercept = 500), color = "red", size = 1.3)

print(plot)
#> Warning in as.POSIXlt.POSIXct(x): unknown timezone 'zone/tz/2018g.1.0/
#> zoneinfo/America/New_York'
#> Warning: Removed 26 rows containing missing values (geom_point).
#> Warning: Removed 14 rows containing missing values (geom_point).
#> Warning: Removed 4 rows containing missing values (geom_path).







share|improve this answer


























  • Thanks for your answer. Is there a way to get how long each event lasted in a new column?

    – NBE
    Nov 19 '18 at 19:01














3












3








3







I am not quite sure what you want, but here is a starting point. The idea is simply to make columns for the exceeded and the below, filled with NA, and then plot.
Color assigned is red to the the exceeded line and blue to the below points. Note that the color = "red" should be outside the aes for the horizontal line. Only use color inside aes when the color should vary with the value.






library(tidyverse)



test_df <- df %>% 
mutate(greater = Calculated_TDS > 500,
exceed_value = if_else(greater, Calculated_TDS, as.numeric(NA)),
below_value = if_else(greater, as.numeric(NA), Calculated_TDS))

plot <- ggplot(data = test_df, aes(x = Date_Time)) +
geom_point(aes(y = exceed_value), color = "red") +
geom_point(aes(y = below_value), color = "blue") +
geom_line(aes(y = exceed_value),
color = "red") + geom_hline(aes(yintercept = 500), color = "red", size = 1.3)

print(plot)
#> Warning in as.POSIXlt.POSIXct(x): unknown timezone 'zone/tz/2018g.1.0/
#> zoneinfo/America/New_York'
#> Warning: Removed 26 rows containing missing values (geom_point).
#> Warning: Removed 14 rows containing missing values (geom_point).
#> Warning: Removed 4 rows containing missing values (geom_path).







share|improve this answer















I am not quite sure what you want, but here is a starting point. The idea is simply to make columns for the exceeded and the below, filled with NA, and then plot.
Color assigned is red to the the exceeded line and blue to the below points. Note that the color = "red" should be outside the aes for the horizontal line. Only use color inside aes when the color should vary with the value.






library(tidyverse)



test_df <- df %>% 
mutate(greater = Calculated_TDS > 500,
exceed_value = if_else(greater, Calculated_TDS, as.numeric(NA)),
below_value = if_else(greater, as.numeric(NA), Calculated_TDS))

plot <- ggplot(data = test_df, aes(x = Date_Time)) +
geom_point(aes(y = exceed_value), color = "red") +
geom_point(aes(y = below_value), color = "blue") +
geom_line(aes(y = exceed_value),
color = "red") + geom_hline(aes(yintercept = 500), color = "red", size = 1.3)

print(plot)
#> Warning in as.POSIXlt.POSIXct(x): unknown timezone 'zone/tz/2018g.1.0/
#> zoneinfo/America/New_York'
#> Warning: Removed 26 rows containing missing values (geom_point).
#> Warning: Removed 14 rows containing missing values (geom_point).
#> Warning: Removed 4 rows containing missing values (geom_path).








share|improve this answer














share|improve this answer



share|improve this answer








edited Nov 19 '18 at 21:09

























answered Nov 19 '18 at 18:42









Andrew LaversAndrew Lavers

3,0411713




3,0411713













  • Thanks for your answer. Is there a way to get how long each event lasted in a new column?

    – NBE
    Nov 19 '18 at 19:01



















  • Thanks for your answer. Is there a way to get how long each event lasted in a new column?

    – NBE
    Nov 19 '18 at 19:01

















Thanks for your answer. Is there a way to get how long each event lasted in a new column?

– NBE
Nov 19 '18 at 19:01





Thanks for your answer. Is there a way to get how long each event lasted in a new column?

– NBE
Nov 19 '18 at 19:01


















draft saved

draft discarded




















































Thanks for contributing an answer to Stack Overflow!


  • Please be sure to answer the question. Provide details and share your research!

But avoid



  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.


To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53379823%2fplotting-the-duration-of-exceedances-of-a-pollutant-standard-in-ggplot2%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

How to change which sound is reproduced for terminal bell?

Can I use Tabulator js library in my java Spring + Thymeleaf project?

Title Spacing in Bjornstrup Chapter, Removing Chapter Number From Contents