Creating dummies with apply in R
up vote
1
down vote
favorite
I have data about different study strategies for individuals (stored in columns labeled StrategyA, StrategyB, StrategyC. The strategies are coded 1-15. I want to create a dummy for each strategy (e.g. strategy1, strategy2, etc) because each student can list up to 3 strategies.
Example Data
ID = c(1, 2, 3, 4, 5)
Strategy_A = c(10, 12, 13, 1, 2)
Strategy_B = c(1, 2, 1, 4, 5)
Strategy_C = c(2, 3, 6, 8, 15)
all = data.frame(ID, Strategy_A, Strategy_B, Strategy_C)
I thought about using apply and creating a function linked to the fastDummies package.
dummies = function(x){
dummy_cols(x)
}
new = apply(all [,-1], 2, dummies)
new = as.data.frame(new)
However, this creates dummies for StrategyA_1 StrategyA_2 StrategyA_3 rather than summarizing the dummies as Strategy1 Strategy2 Strategy3. Any ideas how to fix this?
r apply
add a comment |
up vote
1
down vote
favorite
I have data about different study strategies for individuals (stored in columns labeled StrategyA, StrategyB, StrategyC. The strategies are coded 1-15. I want to create a dummy for each strategy (e.g. strategy1, strategy2, etc) because each student can list up to 3 strategies.
Example Data
ID = c(1, 2, 3, 4, 5)
Strategy_A = c(10, 12, 13, 1, 2)
Strategy_B = c(1, 2, 1, 4, 5)
Strategy_C = c(2, 3, 6, 8, 15)
all = data.frame(ID, Strategy_A, Strategy_B, Strategy_C)
I thought about using apply and creating a function linked to the fastDummies package.
dummies = function(x){
dummy_cols(x)
}
new = apply(all [,-1], 2, dummies)
new = as.data.frame(new)
However, this creates dummies for StrategyA_1 StrategyA_2 StrategyA_3 rather than summarizing the dummies as Strategy1 Strategy2 Strategy3. Any ideas how to fix this?
r apply
1
please describe the output you expected.
– Darren Tsai
5 hours ago
Sounds like you're interested in creating a dummy for each combination of the three variables? In which case you might need to create a another variable that combines them and then create the dummy variables from that.
– Cleland
5 hours ago
add a comment |
up vote
1
down vote
favorite
up vote
1
down vote
favorite
I have data about different study strategies for individuals (stored in columns labeled StrategyA, StrategyB, StrategyC. The strategies are coded 1-15. I want to create a dummy for each strategy (e.g. strategy1, strategy2, etc) because each student can list up to 3 strategies.
Example Data
ID = c(1, 2, 3, 4, 5)
Strategy_A = c(10, 12, 13, 1, 2)
Strategy_B = c(1, 2, 1, 4, 5)
Strategy_C = c(2, 3, 6, 8, 15)
all = data.frame(ID, Strategy_A, Strategy_B, Strategy_C)
I thought about using apply and creating a function linked to the fastDummies package.
dummies = function(x){
dummy_cols(x)
}
new = apply(all [,-1], 2, dummies)
new = as.data.frame(new)
However, this creates dummies for StrategyA_1 StrategyA_2 StrategyA_3 rather than summarizing the dummies as Strategy1 Strategy2 Strategy3. Any ideas how to fix this?
r apply
I have data about different study strategies for individuals (stored in columns labeled StrategyA, StrategyB, StrategyC. The strategies are coded 1-15. I want to create a dummy for each strategy (e.g. strategy1, strategy2, etc) because each student can list up to 3 strategies.
Example Data
ID = c(1, 2, 3, 4, 5)
Strategy_A = c(10, 12, 13, 1, 2)
Strategy_B = c(1, 2, 1, 4, 5)
Strategy_C = c(2, 3, 6, 8, 15)
all = data.frame(ID, Strategy_A, Strategy_B, Strategy_C)
I thought about using apply and creating a function linked to the fastDummies package.
dummies = function(x){
dummy_cols(x)
}
new = apply(all [,-1], 2, dummies)
new = as.data.frame(new)
However, this creates dummies for StrategyA_1 StrategyA_2 StrategyA_3 rather than summarizing the dummies as Strategy1 Strategy2 Strategy3. Any ideas how to fix this?
r apply
r apply
asked 5 hours ago
Student
305
305
1
please describe the output you expected.
– Darren Tsai
5 hours ago
Sounds like you're interested in creating a dummy for each combination of the three variables? In which case you might need to create a another variable that combines them and then create the dummy variables from that.
– Cleland
5 hours ago
add a comment |
1
please describe the output you expected.
– Darren Tsai
5 hours ago
Sounds like you're interested in creating a dummy for each combination of the three variables? In which case you might need to create a another variable that combines them and then create the dummy variables from that.
– Cleland
5 hours ago
1
1
please describe the output you expected.
– Darren Tsai
5 hours ago
please describe the output you expected.
– Darren Tsai
5 hours ago
Sounds like you're interested in creating a dummy for each combination of the three variables? In which case you might need to create a another variable that combines them and then create the dummy variables from that.
– Cleland
5 hours ago
Sounds like you're interested in creating a dummy for each combination of the three variables? In which case you might need to create a another variable that combines them and then create the dummy variables from that.
– Cleland
5 hours ago
add a comment |
2 Answers
2
active
oldest
votes
up vote
0
down vote
After a small transformation of all
, you can use dummy.data.frame()
from dummies
(you can also use dummy_cols()
from fastDummies
) and then aggregate
per ID
.
all <- data.frame(ID = rep(all$ID, 3),
Strategy = c(all$Strategy_A, all$Strategy_B, all$Strategy_C)) # data frame "all" with one column Strategy
library(dummies)
all <- dummy.data.frame(all, "Strategy") # or fastDummies::dummy_cols(all, "Strategy")
aggregate(. ~ ID, all, sum) # since strategies are now dummies, the sum will always be 0 or 1
# output
ID Strategy1 Strategy2 Strategy3 Strategy4 Strategy5 Strategy6 Strategy8 Strategy10 Strategy12 Strategy13 Strategy15
1 1 1 1 0 0 0 0 0 1 0 0 0
2 2 0 1 1 0 0 0 0 0 1 0 0
3 3 1 0 0 0 0 1 0 0 0 1 0
4 4 1 0 0 1 0 0 1 0 0 0 0
5 5 0 1 0 0 1 0 0 0 0 0 1
add a comment |
up vote
0
down vote
I provide a method with the tidyverse
way.
library(tidyverse)
new <- all %>% gather(select = -ID) %>%
mutate(key = NULL, num = 1) %>%
spread(value, num)
# ID 1 2 3 4 5 6 8 10 12 13 15
# 1 1 1 1 NA NA NA NA NA 1 NA NA NA
# 2 2 NA 1 1 NA NA NA NA NA 1 NA NA
# 3 3 1 NA NA NA NA 1 NA NA NA 1 NA
# 4 4 1 NA NA 1 NA NA 1 NA NA NA NA
# 5 5 NA 1 NA NA 1 NA NA NA NA NA 1
new[is.na(new)] <- 0
new
# ID 1 2 3 4 5 6 8 10 12 13 15
# 1 1 1 1 0 0 0 0 0 1 0 0 0
# 2 2 0 1 1 0 0 0 0 0 1 0 0
# 3 3 1 0 0 0 0 1 0 0 0 1 0
# 4 4 1 0 0 1 0 0 1 0 0 0 0
# 5 5 0 1 0 0 1 0 0 0 0 0 1
add a comment |
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
0
down vote
After a small transformation of all
, you can use dummy.data.frame()
from dummies
(you can also use dummy_cols()
from fastDummies
) and then aggregate
per ID
.
all <- data.frame(ID = rep(all$ID, 3),
Strategy = c(all$Strategy_A, all$Strategy_B, all$Strategy_C)) # data frame "all" with one column Strategy
library(dummies)
all <- dummy.data.frame(all, "Strategy") # or fastDummies::dummy_cols(all, "Strategy")
aggregate(. ~ ID, all, sum) # since strategies are now dummies, the sum will always be 0 or 1
# output
ID Strategy1 Strategy2 Strategy3 Strategy4 Strategy5 Strategy6 Strategy8 Strategy10 Strategy12 Strategy13 Strategy15
1 1 1 1 0 0 0 0 0 1 0 0 0
2 2 0 1 1 0 0 0 0 0 1 0 0
3 3 1 0 0 0 0 1 0 0 0 1 0
4 4 1 0 0 1 0 0 1 0 0 0 0
5 5 0 1 0 0 1 0 0 0 0 0 1
add a comment |
up vote
0
down vote
After a small transformation of all
, you can use dummy.data.frame()
from dummies
(you can also use dummy_cols()
from fastDummies
) and then aggregate
per ID
.
all <- data.frame(ID = rep(all$ID, 3),
Strategy = c(all$Strategy_A, all$Strategy_B, all$Strategy_C)) # data frame "all" with one column Strategy
library(dummies)
all <- dummy.data.frame(all, "Strategy") # or fastDummies::dummy_cols(all, "Strategy")
aggregate(. ~ ID, all, sum) # since strategies are now dummies, the sum will always be 0 or 1
# output
ID Strategy1 Strategy2 Strategy3 Strategy4 Strategy5 Strategy6 Strategy8 Strategy10 Strategy12 Strategy13 Strategy15
1 1 1 1 0 0 0 0 0 1 0 0 0
2 2 0 1 1 0 0 0 0 0 1 0 0
3 3 1 0 0 0 0 1 0 0 0 1 0
4 4 1 0 0 1 0 0 1 0 0 0 0
5 5 0 1 0 0 1 0 0 0 0 0 1
add a comment |
up vote
0
down vote
up vote
0
down vote
After a small transformation of all
, you can use dummy.data.frame()
from dummies
(you can also use dummy_cols()
from fastDummies
) and then aggregate
per ID
.
all <- data.frame(ID = rep(all$ID, 3),
Strategy = c(all$Strategy_A, all$Strategy_B, all$Strategy_C)) # data frame "all" with one column Strategy
library(dummies)
all <- dummy.data.frame(all, "Strategy") # or fastDummies::dummy_cols(all, "Strategy")
aggregate(. ~ ID, all, sum) # since strategies are now dummies, the sum will always be 0 or 1
# output
ID Strategy1 Strategy2 Strategy3 Strategy4 Strategy5 Strategy6 Strategy8 Strategy10 Strategy12 Strategy13 Strategy15
1 1 1 1 0 0 0 0 0 1 0 0 0
2 2 0 1 1 0 0 0 0 0 1 0 0
3 3 1 0 0 0 0 1 0 0 0 1 0
4 4 1 0 0 1 0 0 1 0 0 0 0
5 5 0 1 0 0 1 0 0 0 0 0 1
After a small transformation of all
, you can use dummy.data.frame()
from dummies
(you can also use dummy_cols()
from fastDummies
) and then aggregate
per ID
.
all <- data.frame(ID = rep(all$ID, 3),
Strategy = c(all$Strategy_A, all$Strategy_B, all$Strategy_C)) # data frame "all" with one column Strategy
library(dummies)
all <- dummy.data.frame(all, "Strategy") # or fastDummies::dummy_cols(all, "Strategy")
aggregate(. ~ ID, all, sum) # since strategies are now dummies, the sum will always be 0 or 1
# output
ID Strategy1 Strategy2 Strategy3 Strategy4 Strategy5 Strategy6 Strategy8 Strategy10 Strategy12 Strategy13 Strategy15
1 1 1 1 0 0 0 0 0 1 0 0 0
2 2 0 1 1 0 0 0 0 0 1 0 0
3 3 1 0 0 0 0 1 0 0 0 1 0
4 4 1 0 0 1 0 0 1 0 0 0 0
5 5 0 1 0 0 1 0 0 0 0 0 1
edited 4 hours ago
answered 4 hours ago
ANG
3,8572620
3,8572620
add a comment |
add a comment |
up vote
0
down vote
I provide a method with the tidyverse
way.
library(tidyverse)
new <- all %>% gather(select = -ID) %>%
mutate(key = NULL, num = 1) %>%
spread(value, num)
# ID 1 2 3 4 5 6 8 10 12 13 15
# 1 1 1 1 NA NA NA NA NA 1 NA NA NA
# 2 2 NA 1 1 NA NA NA NA NA 1 NA NA
# 3 3 1 NA NA NA NA 1 NA NA NA 1 NA
# 4 4 1 NA NA 1 NA NA 1 NA NA NA NA
# 5 5 NA 1 NA NA 1 NA NA NA NA NA 1
new[is.na(new)] <- 0
new
# ID 1 2 3 4 5 6 8 10 12 13 15
# 1 1 1 1 0 0 0 0 0 1 0 0 0
# 2 2 0 1 1 0 0 0 0 0 1 0 0
# 3 3 1 0 0 0 0 1 0 0 0 1 0
# 4 4 1 0 0 1 0 0 1 0 0 0 0
# 5 5 0 1 0 0 1 0 0 0 0 0 1
add a comment |
up vote
0
down vote
I provide a method with the tidyverse
way.
library(tidyverse)
new <- all %>% gather(select = -ID) %>%
mutate(key = NULL, num = 1) %>%
spread(value, num)
# ID 1 2 3 4 5 6 8 10 12 13 15
# 1 1 1 1 NA NA NA NA NA 1 NA NA NA
# 2 2 NA 1 1 NA NA NA NA NA 1 NA NA
# 3 3 1 NA NA NA NA 1 NA NA NA 1 NA
# 4 4 1 NA NA 1 NA NA 1 NA NA NA NA
# 5 5 NA 1 NA NA 1 NA NA NA NA NA 1
new[is.na(new)] <- 0
new
# ID 1 2 3 4 5 6 8 10 12 13 15
# 1 1 1 1 0 0 0 0 0 1 0 0 0
# 2 2 0 1 1 0 0 0 0 0 1 0 0
# 3 3 1 0 0 0 0 1 0 0 0 1 0
# 4 4 1 0 0 1 0 0 1 0 0 0 0
# 5 5 0 1 0 0 1 0 0 0 0 0 1
add a comment |
up vote
0
down vote
up vote
0
down vote
I provide a method with the tidyverse
way.
library(tidyverse)
new <- all %>% gather(select = -ID) %>%
mutate(key = NULL, num = 1) %>%
spread(value, num)
# ID 1 2 3 4 5 6 8 10 12 13 15
# 1 1 1 1 NA NA NA NA NA 1 NA NA NA
# 2 2 NA 1 1 NA NA NA NA NA 1 NA NA
# 3 3 1 NA NA NA NA 1 NA NA NA 1 NA
# 4 4 1 NA NA 1 NA NA 1 NA NA NA NA
# 5 5 NA 1 NA NA 1 NA NA NA NA NA 1
new[is.na(new)] <- 0
new
# ID 1 2 3 4 5 6 8 10 12 13 15
# 1 1 1 1 0 0 0 0 0 1 0 0 0
# 2 2 0 1 1 0 0 0 0 0 1 0 0
# 3 3 1 0 0 0 0 1 0 0 0 1 0
# 4 4 1 0 0 1 0 0 1 0 0 0 0
# 5 5 0 1 0 0 1 0 0 0 0 0 1
I provide a method with the tidyverse
way.
library(tidyverse)
new <- all %>% gather(select = -ID) %>%
mutate(key = NULL, num = 1) %>%
spread(value, num)
# ID 1 2 3 4 5 6 8 10 12 13 15
# 1 1 1 1 NA NA NA NA NA 1 NA NA NA
# 2 2 NA 1 1 NA NA NA NA NA 1 NA NA
# 3 3 1 NA NA NA NA 1 NA NA NA 1 NA
# 4 4 1 NA NA 1 NA NA 1 NA NA NA NA
# 5 5 NA 1 NA NA 1 NA NA NA NA NA 1
new[is.na(new)] <- 0
new
# ID 1 2 3 4 5 6 8 10 12 13 15
# 1 1 1 1 0 0 0 0 0 1 0 0 0
# 2 2 0 1 1 0 0 0 0 0 1 0 0
# 3 3 1 0 0 0 0 1 0 0 0 1 0
# 4 4 1 0 0 1 0 0 1 0 0 0 0
# 5 5 0 1 0 0 1 0 0 0 0 0 1
answered 4 hours ago
Darren Tsai
742116
742116
add a comment |
add a comment |
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53266035%2fcreating-dummies-with-apply-in-r%23new-answer', 'question_page');
}
);
Post as a guest
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
1
please describe the output you expected.
– Darren Tsai
5 hours ago
Sounds like you're interested in creating a dummy for each combination of the three variables? In which case you might need to create a another variable that combines them and then create the dummy variables from that.
– Cleland
5 hours ago