I want to categorise entries in a dataframe, e.g. 1970, 1971, 1972 turn into “70s”
That's what I have:
numbers any_data year period
1 ab 1974 <NA
2 cd 1975 <NA
3 ef 1985 <NA
4 gh 1960 <NA
5 ij 1955 <NA
...and that is what I want:
numbers any_data year period
1 ab 1974 "70s"
2 cd 1975 "70s"
3 ef 1985 "80s"
4 gh 1960 "older"
5 ij 1955 "older"
I could use a for-loop checking every single entry in the year-column, but there should be a smarter and faster way using apply or similar functions. Unfortunately, I can't figure that out.
n <- c(1,2,3,4,5)
a <- c("ab", "cd", "ef", "gh", "ij")
y <- c(1974, 1975, 1985, 1960, 1955)
df <- data.frame(numbers = n, any_data = a, year = y, period = NA)
df$period <- factor(df$period, levels =c("70s", "80s", "older"))
for (i in 1:length(df$year)){
if((df$year[i] 1969) && (df$year[i] < 1980)){
df$period[i] <- "70s"
}
# and so on
}
df
That's slow and ugly. Any better ideas?
r apply categories
add a comment |
That's what I have:
numbers any_data year period
1 ab 1974 <NA
2 cd 1975 <NA
3 ef 1985 <NA
4 gh 1960 <NA
5 ij 1955 <NA
...and that is what I want:
numbers any_data year period
1 ab 1974 "70s"
2 cd 1975 "70s"
3 ef 1985 "80s"
4 gh 1960 "older"
5 ij 1955 "older"
I could use a for-loop checking every single entry in the year-column, but there should be a smarter and faster way using apply or similar functions. Unfortunately, I can't figure that out.
n <- c(1,2,3,4,5)
a <- c("ab", "cd", "ef", "gh", "ij")
y <- c(1974, 1975, 1985, 1960, 1955)
df <- data.frame(numbers = n, any_data = a, year = y, period = NA)
df$period <- factor(df$period, levels =c("70s", "80s", "older"))
for (i in 1:length(df$year)){
if((df$year[i] 1969) && (df$year[i] < 1980)){
df$period[i] <- "70s"
}
# and so on
}
df
That's slow and ugly. Any better ideas?
r apply categories
6
Readhelp("cut")
.
– Roland
Nov 20 '18 at 12:02
If your problem got solved please choose an answer.
– Andre Elrico
Nov 22 '18 at 11:11
add a comment |
That's what I have:
numbers any_data year period
1 ab 1974 <NA
2 cd 1975 <NA
3 ef 1985 <NA
4 gh 1960 <NA
5 ij 1955 <NA
...and that is what I want:
numbers any_data year period
1 ab 1974 "70s"
2 cd 1975 "70s"
3 ef 1985 "80s"
4 gh 1960 "older"
5 ij 1955 "older"
I could use a for-loop checking every single entry in the year-column, but there should be a smarter and faster way using apply or similar functions. Unfortunately, I can't figure that out.
n <- c(1,2,3,4,5)
a <- c("ab", "cd", "ef", "gh", "ij")
y <- c(1974, 1975, 1985, 1960, 1955)
df <- data.frame(numbers = n, any_data = a, year = y, period = NA)
df$period <- factor(df$period, levels =c("70s", "80s", "older"))
for (i in 1:length(df$year)){
if((df$year[i] 1969) && (df$year[i] < 1980)){
df$period[i] <- "70s"
}
# and so on
}
df
That's slow and ugly. Any better ideas?
r apply categories
That's what I have:
numbers any_data year period
1 ab 1974 <NA
2 cd 1975 <NA
3 ef 1985 <NA
4 gh 1960 <NA
5 ij 1955 <NA
...and that is what I want:
numbers any_data year period
1 ab 1974 "70s"
2 cd 1975 "70s"
3 ef 1985 "80s"
4 gh 1960 "older"
5 ij 1955 "older"
I could use a for-loop checking every single entry in the year-column, but there should be a smarter and faster way using apply or similar functions. Unfortunately, I can't figure that out.
n <- c(1,2,3,4,5)
a <- c("ab", "cd", "ef", "gh", "ij")
y <- c(1974, 1975, 1985, 1960, 1955)
df <- data.frame(numbers = n, any_data = a, year = y, period = NA)
df$period <- factor(df$period, levels =c("70s", "80s", "older"))
for (i in 1:length(df$year)){
if((df$year[i] 1969) && (df$year[i] < 1980)){
df$period[i] <- "70s"
}
# and so on
}
df
That's slow and ugly. Any better ideas?
r apply categories
r apply categories
edited Nov 20 '18 at 12:11
kath
4,250924
4,250924
asked Nov 20 '18 at 11:59
user9204439user9204439
111
111
6
Readhelp("cut")
.
– Roland
Nov 20 '18 at 12:02
If your problem got solved please choose an answer.
– Andre Elrico
Nov 22 '18 at 11:11
add a comment |
6
Readhelp("cut")
.
– Roland
Nov 20 '18 at 12:02
If your problem got solved please choose an answer.
– Andre Elrico
Nov 22 '18 at 11:11
6
6
Read
help("cut")
.– Roland
Nov 20 '18 at 12:02
Read
help("cut")
.– Roland
Nov 20 '18 at 12:02
If your problem got solved please choose an answer.
– Andre Elrico
Nov 22 '18 at 11:11
If your problem got solved please choose an answer.
– Andre Elrico
Nov 22 '18 at 11:11
add a comment |
1 Answer
1
active
oldest
votes
a good, readable and general way would be to use dplyr's ?case_when
.
df$period <-
(df$y %% 100) %>% {dplyr::case_when( . >= 80 ~ "80s",
. >= 70 ~ "70s",
TRUE ~ "older")}
# [1] "70s" "70s" "80s" "older" "older"
another way using ?ifelse
:
dec <- as.integer(substr(y, 3 ,3 ))
df$period <-
ifelse(dec > 6, paste0(dec, "0s"), "older")
# [1] "70s" "70s" "80s" "older" "older"
or ?cut
as Roland suggests:
df$period <-
cut((df$y %% 100), breaks=c(-Inf, 70, 80, Inf), labels = c("older", "70s", "80s"), right = FALSE)
#[1] 70s 70s 80s older older
#Levels: older 70s 80s
Thanks, Andre! For my problem cut works best.
– user9204439
Nov 21 '18 at 17:16
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53392551%2fi-want-to-categorise-entries-in-a-dataframe-e-g-1970-1971-1972-turn-into-70%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
a good, readable and general way would be to use dplyr's ?case_when
.
df$period <-
(df$y %% 100) %>% {dplyr::case_when( . >= 80 ~ "80s",
. >= 70 ~ "70s",
TRUE ~ "older")}
# [1] "70s" "70s" "80s" "older" "older"
another way using ?ifelse
:
dec <- as.integer(substr(y, 3 ,3 ))
df$period <-
ifelse(dec > 6, paste0(dec, "0s"), "older")
# [1] "70s" "70s" "80s" "older" "older"
or ?cut
as Roland suggests:
df$period <-
cut((df$y %% 100), breaks=c(-Inf, 70, 80, Inf), labels = c("older", "70s", "80s"), right = FALSE)
#[1] 70s 70s 80s older older
#Levels: older 70s 80s
Thanks, Andre! For my problem cut works best.
– user9204439
Nov 21 '18 at 17:16
add a comment |
a good, readable and general way would be to use dplyr's ?case_when
.
df$period <-
(df$y %% 100) %>% {dplyr::case_when( . >= 80 ~ "80s",
. >= 70 ~ "70s",
TRUE ~ "older")}
# [1] "70s" "70s" "80s" "older" "older"
another way using ?ifelse
:
dec <- as.integer(substr(y, 3 ,3 ))
df$period <-
ifelse(dec > 6, paste0(dec, "0s"), "older")
# [1] "70s" "70s" "80s" "older" "older"
or ?cut
as Roland suggests:
df$period <-
cut((df$y %% 100), breaks=c(-Inf, 70, 80, Inf), labels = c("older", "70s", "80s"), right = FALSE)
#[1] 70s 70s 80s older older
#Levels: older 70s 80s
Thanks, Andre! For my problem cut works best.
– user9204439
Nov 21 '18 at 17:16
add a comment |
a good, readable and general way would be to use dplyr's ?case_when
.
df$period <-
(df$y %% 100) %>% {dplyr::case_when( . >= 80 ~ "80s",
. >= 70 ~ "70s",
TRUE ~ "older")}
# [1] "70s" "70s" "80s" "older" "older"
another way using ?ifelse
:
dec <- as.integer(substr(y, 3 ,3 ))
df$period <-
ifelse(dec > 6, paste0(dec, "0s"), "older")
# [1] "70s" "70s" "80s" "older" "older"
or ?cut
as Roland suggests:
df$period <-
cut((df$y %% 100), breaks=c(-Inf, 70, 80, Inf), labels = c("older", "70s", "80s"), right = FALSE)
#[1] 70s 70s 80s older older
#Levels: older 70s 80s
a good, readable and general way would be to use dplyr's ?case_when
.
df$period <-
(df$y %% 100) %>% {dplyr::case_when( . >= 80 ~ "80s",
. >= 70 ~ "70s",
TRUE ~ "older")}
# [1] "70s" "70s" "80s" "older" "older"
another way using ?ifelse
:
dec <- as.integer(substr(y, 3 ,3 ))
df$period <-
ifelse(dec > 6, paste0(dec, "0s"), "older")
# [1] "70s" "70s" "80s" "older" "older"
or ?cut
as Roland suggests:
df$period <-
cut((df$y %% 100), breaks=c(-Inf, 70, 80, Inf), labels = c("older", "70s", "80s"), right = FALSE)
#[1] 70s 70s 80s older older
#Levels: older 70s 80s
edited Nov 20 '18 at 12:52
answered Nov 20 '18 at 12:08
Andre ElricoAndre Elrico
5,70011129
5,70011129
Thanks, Andre! For my problem cut works best.
– user9204439
Nov 21 '18 at 17:16
add a comment |
Thanks, Andre! For my problem cut works best.
– user9204439
Nov 21 '18 at 17:16
Thanks, Andre! For my problem cut works best.
– user9204439
Nov 21 '18 at 17:16
Thanks, Andre! For my problem cut works best.
– user9204439
Nov 21 '18 at 17:16
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53392551%2fi-want-to-categorise-entries-in-a-dataframe-e-g-1970-1971-1972-turn-into-70%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
6
Read
help("cut")
.– Roland
Nov 20 '18 at 12:02
If your problem got solved please choose an answer.
– Andre Elrico
Nov 22 '18 at 11:11