I want to categorise entries in a dataframe, e.g. 1970, 1971, 1972 turn into “70s”












0















That's what I have:



   numbers any_data year period
1 ab 1974 <NA
2 cd 1975 <NA
3 ef 1985 <NA
4 gh 1960 <NA
5 ij 1955 <NA


...and that is what I want:



   numbers any_data year period
1 ab 1974 "70s"
2 cd 1975 "70s"
3 ef 1985 "80s"
4 gh 1960 "older"
5 ij 1955 "older"


I could use a for-loop checking every single entry in the year-column, but there should be a smarter and faster way using apply or similar functions. Unfortunately, I can't figure that out.



n <- c(1,2,3,4,5)
a <- c("ab", "cd", "ef", "gh", "ij")
y <- c(1974, 1975, 1985, 1960, 1955)
df <- data.frame(numbers = n, any_data = a, year = y, period = NA)

df$period <- factor(df$period, levels =c("70s", "80s", "older"))

for (i in 1:length(df$year)){
if((df$year[i] 1969) && (df$year[i] < 1980)){
df$period[i] <- "70s"
}
# and so on
}

df


That's slow and ugly. Any better ideas?










share|improve this question




















  • 6





    Read help("cut").

    – Roland
    Nov 20 '18 at 12:02











  • If your problem got solved please choose an answer.

    – Andre Elrico
    Nov 22 '18 at 11:11
















0















That's what I have:



   numbers any_data year period
1 ab 1974 <NA
2 cd 1975 <NA
3 ef 1985 <NA
4 gh 1960 <NA
5 ij 1955 <NA


...and that is what I want:



   numbers any_data year period
1 ab 1974 "70s"
2 cd 1975 "70s"
3 ef 1985 "80s"
4 gh 1960 "older"
5 ij 1955 "older"


I could use a for-loop checking every single entry in the year-column, but there should be a smarter and faster way using apply or similar functions. Unfortunately, I can't figure that out.



n <- c(1,2,3,4,5)
a <- c("ab", "cd", "ef", "gh", "ij")
y <- c(1974, 1975, 1985, 1960, 1955)
df <- data.frame(numbers = n, any_data = a, year = y, period = NA)

df$period <- factor(df$period, levels =c("70s", "80s", "older"))

for (i in 1:length(df$year)){
if((df$year[i] 1969) && (df$year[i] < 1980)){
df$period[i] <- "70s"
}
# and so on
}

df


That's slow and ugly. Any better ideas?










share|improve this question




















  • 6





    Read help("cut").

    – Roland
    Nov 20 '18 at 12:02











  • If your problem got solved please choose an answer.

    – Andre Elrico
    Nov 22 '18 at 11:11














0












0








0








That's what I have:



   numbers any_data year period
1 ab 1974 <NA
2 cd 1975 <NA
3 ef 1985 <NA
4 gh 1960 <NA
5 ij 1955 <NA


...and that is what I want:



   numbers any_data year period
1 ab 1974 "70s"
2 cd 1975 "70s"
3 ef 1985 "80s"
4 gh 1960 "older"
5 ij 1955 "older"


I could use a for-loop checking every single entry in the year-column, but there should be a smarter and faster way using apply or similar functions. Unfortunately, I can't figure that out.



n <- c(1,2,3,4,5)
a <- c("ab", "cd", "ef", "gh", "ij")
y <- c(1974, 1975, 1985, 1960, 1955)
df <- data.frame(numbers = n, any_data = a, year = y, period = NA)

df$period <- factor(df$period, levels =c("70s", "80s", "older"))

for (i in 1:length(df$year)){
if((df$year[i] 1969) && (df$year[i] < 1980)){
df$period[i] <- "70s"
}
# and so on
}

df


That's slow and ugly. Any better ideas?










share|improve this question
















That's what I have:



   numbers any_data year period
1 ab 1974 <NA
2 cd 1975 <NA
3 ef 1985 <NA
4 gh 1960 <NA
5 ij 1955 <NA


...and that is what I want:



   numbers any_data year period
1 ab 1974 "70s"
2 cd 1975 "70s"
3 ef 1985 "80s"
4 gh 1960 "older"
5 ij 1955 "older"


I could use a for-loop checking every single entry in the year-column, but there should be a smarter and faster way using apply or similar functions. Unfortunately, I can't figure that out.



n <- c(1,2,3,4,5)
a <- c("ab", "cd", "ef", "gh", "ij")
y <- c(1974, 1975, 1985, 1960, 1955)
df <- data.frame(numbers = n, any_data = a, year = y, period = NA)

df$period <- factor(df$period, levels =c("70s", "80s", "older"))

for (i in 1:length(df$year)){
if((df$year[i] 1969) && (df$year[i] < 1980)){
df$period[i] <- "70s"
}
# and so on
}

df


That's slow and ugly. Any better ideas?







r apply categories






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Nov 20 '18 at 12:11









kath

4,250924




4,250924










asked Nov 20 '18 at 11:59









user9204439user9204439

111




111








  • 6





    Read help("cut").

    – Roland
    Nov 20 '18 at 12:02











  • If your problem got solved please choose an answer.

    – Andre Elrico
    Nov 22 '18 at 11:11














  • 6





    Read help("cut").

    – Roland
    Nov 20 '18 at 12:02











  • If your problem got solved please choose an answer.

    – Andre Elrico
    Nov 22 '18 at 11:11








6




6





Read help("cut").

– Roland
Nov 20 '18 at 12:02





Read help("cut").

– Roland
Nov 20 '18 at 12:02













If your problem got solved please choose an answer.

– Andre Elrico
Nov 22 '18 at 11:11





If your problem got solved please choose an answer.

– Andre Elrico
Nov 22 '18 at 11:11












1 Answer
1






active

oldest

votes


















2














a good, readable and general way would be to use dplyr's ?case_when.



df$period <- 
(df$y %% 100) %>% {dplyr::case_when( . >= 80 ~ "80s",
. >= 70 ~ "70s",
TRUE ~ "older")}

# [1] "70s" "70s" "80s" "older" "older"


another way using ?ifelse:



dec <- as.integer(substr(y, 3 ,3 ))

df$period <-
ifelse(dec > 6, paste0(dec, "0s"), "older")

# [1] "70s" "70s" "80s" "older" "older"


or ?cut as Roland suggests:



df$period <-
cut((df$y %% 100), breaks=c(-Inf, 70, 80, Inf), labels = c("older", "70s", "80s"), right = FALSE)

#[1] 70s 70s 80s older older
#Levels: older 70s 80s





share|improve this answer


























  • Thanks, Andre! For my problem cut works best.

    – user9204439
    Nov 21 '18 at 17:16











Your Answer






StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});


}
});














draft saved

draft discarded


















StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53392551%2fi-want-to-categorise-entries-in-a-dataframe-e-g-1970-1971-1972-turn-into-70%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown

























1 Answer
1






active

oldest

votes








1 Answer
1






active

oldest

votes









active

oldest

votes






active

oldest

votes









2














a good, readable and general way would be to use dplyr's ?case_when.



df$period <- 
(df$y %% 100) %>% {dplyr::case_when( . >= 80 ~ "80s",
. >= 70 ~ "70s",
TRUE ~ "older")}

# [1] "70s" "70s" "80s" "older" "older"


another way using ?ifelse:



dec <- as.integer(substr(y, 3 ,3 ))

df$period <-
ifelse(dec > 6, paste0(dec, "0s"), "older")

# [1] "70s" "70s" "80s" "older" "older"


or ?cut as Roland suggests:



df$period <-
cut((df$y %% 100), breaks=c(-Inf, 70, 80, Inf), labels = c("older", "70s", "80s"), right = FALSE)

#[1] 70s 70s 80s older older
#Levels: older 70s 80s





share|improve this answer


























  • Thanks, Andre! For my problem cut works best.

    – user9204439
    Nov 21 '18 at 17:16
















2














a good, readable and general way would be to use dplyr's ?case_when.



df$period <- 
(df$y %% 100) %>% {dplyr::case_when( . >= 80 ~ "80s",
. >= 70 ~ "70s",
TRUE ~ "older")}

# [1] "70s" "70s" "80s" "older" "older"


another way using ?ifelse:



dec <- as.integer(substr(y, 3 ,3 ))

df$period <-
ifelse(dec > 6, paste0(dec, "0s"), "older")

# [1] "70s" "70s" "80s" "older" "older"


or ?cut as Roland suggests:



df$period <-
cut((df$y %% 100), breaks=c(-Inf, 70, 80, Inf), labels = c("older", "70s", "80s"), right = FALSE)

#[1] 70s 70s 80s older older
#Levels: older 70s 80s





share|improve this answer


























  • Thanks, Andre! For my problem cut works best.

    – user9204439
    Nov 21 '18 at 17:16














2












2








2







a good, readable and general way would be to use dplyr's ?case_when.



df$period <- 
(df$y %% 100) %>% {dplyr::case_when( . >= 80 ~ "80s",
. >= 70 ~ "70s",
TRUE ~ "older")}

# [1] "70s" "70s" "80s" "older" "older"


another way using ?ifelse:



dec <- as.integer(substr(y, 3 ,3 ))

df$period <-
ifelse(dec > 6, paste0(dec, "0s"), "older")

# [1] "70s" "70s" "80s" "older" "older"


or ?cut as Roland suggests:



df$period <-
cut((df$y %% 100), breaks=c(-Inf, 70, 80, Inf), labels = c("older", "70s", "80s"), right = FALSE)

#[1] 70s 70s 80s older older
#Levels: older 70s 80s





share|improve this answer















a good, readable and general way would be to use dplyr's ?case_when.



df$period <- 
(df$y %% 100) %>% {dplyr::case_when( . >= 80 ~ "80s",
. >= 70 ~ "70s",
TRUE ~ "older")}

# [1] "70s" "70s" "80s" "older" "older"


another way using ?ifelse:



dec <- as.integer(substr(y, 3 ,3 ))

df$period <-
ifelse(dec > 6, paste0(dec, "0s"), "older")

# [1] "70s" "70s" "80s" "older" "older"


or ?cut as Roland suggests:



df$period <-
cut((df$y %% 100), breaks=c(-Inf, 70, 80, Inf), labels = c("older", "70s", "80s"), right = FALSE)

#[1] 70s 70s 80s older older
#Levels: older 70s 80s






share|improve this answer














share|improve this answer



share|improve this answer








edited Nov 20 '18 at 12:52

























answered Nov 20 '18 at 12:08









Andre ElricoAndre Elrico

5,70011129




5,70011129













  • Thanks, Andre! For my problem cut works best.

    – user9204439
    Nov 21 '18 at 17:16



















  • Thanks, Andre! For my problem cut works best.

    – user9204439
    Nov 21 '18 at 17:16

















Thanks, Andre! For my problem cut works best.

– user9204439
Nov 21 '18 at 17:16





Thanks, Andre! For my problem cut works best.

– user9204439
Nov 21 '18 at 17:16




















draft saved

draft discarded




















































Thanks for contributing an answer to Stack Overflow!


  • Please be sure to answer the question. Provide details and share your research!

But avoid



  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.


To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53392551%2fi-want-to-categorise-entries-in-a-dataframe-e-g-1970-1971-1972-turn-into-70%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

How to change which sound is reproduced for terminal bell?

Title Spacing in Bjornstrup Chapter, Removing Chapter Number From Contents

Can I use Tabulator js library in my java Spring + Thymeleaf project?