Creating dummies with apply in R











up vote
1
down vote

favorite












I have data about different study strategies for individuals (stored in columns labeled StrategyA, StrategyB, StrategyC. The strategies are coded 1-15. I want to create a dummy for each strategy (e.g. strategy1, strategy2, etc) because each student can list up to 3 strategies.



Example Data



   ID = c(1, 2, 3, 4, 5)
Strategy_A = c(10, 12, 13, 1, 2)
Strategy_B = c(1, 2, 1, 4, 5)
Strategy_C = c(2, 3, 6, 8, 15)
all = data.frame(ID, Strategy_A, Strategy_B, Strategy_C)


I thought about using apply and creating a function linked to the fastDummies package.



     dummies = function(x){
dummy_cols(x)
}

new = apply(all [,-1], 2, dummies)
new = as.data.frame(new)


However, this creates dummies for StrategyA_1 StrategyA_2 StrategyA_3 rather than summarizing the dummies as Strategy1 Strategy2 Strategy3. Any ideas how to fix this?










share|improve this question


















  • 1




    please describe the output you expected.
    – Darren Tsai
    5 hours ago










  • Sounds like you're interested in creating a dummy for each combination of the three variables? In which case you might need to create a another variable that combines them and then create the dummy variables from that.
    – Cleland
    5 hours ago















up vote
1
down vote

favorite












I have data about different study strategies for individuals (stored in columns labeled StrategyA, StrategyB, StrategyC. The strategies are coded 1-15. I want to create a dummy for each strategy (e.g. strategy1, strategy2, etc) because each student can list up to 3 strategies.



Example Data



   ID = c(1, 2, 3, 4, 5)
Strategy_A = c(10, 12, 13, 1, 2)
Strategy_B = c(1, 2, 1, 4, 5)
Strategy_C = c(2, 3, 6, 8, 15)
all = data.frame(ID, Strategy_A, Strategy_B, Strategy_C)


I thought about using apply and creating a function linked to the fastDummies package.



     dummies = function(x){
dummy_cols(x)
}

new = apply(all [,-1], 2, dummies)
new = as.data.frame(new)


However, this creates dummies for StrategyA_1 StrategyA_2 StrategyA_3 rather than summarizing the dummies as Strategy1 Strategy2 Strategy3. Any ideas how to fix this?










share|improve this question


















  • 1




    please describe the output you expected.
    – Darren Tsai
    5 hours ago










  • Sounds like you're interested in creating a dummy for each combination of the three variables? In which case you might need to create a another variable that combines them and then create the dummy variables from that.
    – Cleland
    5 hours ago













up vote
1
down vote

favorite









up vote
1
down vote

favorite











I have data about different study strategies for individuals (stored in columns labeled StrategyA, StrategyB, StrategyC. The strategies are coded 1-15. I want to create a dummy for each strategy (e.g. strategy1, strategy2, etc) because each student can list up to 3 strategies.



Example Data



   ID = c(1, 2, 3, 4, 5)
Strategy_A = c(10, 12, 13, 1, 2)
Strategy_B = c(1, 2, 1, 4, 5)
Strategy_C = c(2, 3, 6, 8, 15)
all = data.frame(ID, Strategy_A, Strategy_B, Strategy_C)


I thought about using apply and creating a function linked to the fastDummies package.



     dummies = function(x){
dummy_cols(x)
}

new = apply(all [,-1], 2, dummies)
new = as.data.frame(new)


However, this creates dummies for StrategyA_1 StrategyA_2 StrategyA_3 rather than summarizing the dummies as Strategy1 Strategy2 Strategy3. Any ideas how to fix this?










share|improve this question













I have data about different study strategies for individuals (stored in columns labeled StrategyA, StrategyB, StrategyC. The strategies are coded 1-15. I want to create a dummy for each strategy (e.g. strategy1, strategy2, etc) because each student can list up to 3 strategies.



Example Data



   ID = c(1, 2, 3, 4, 5)
Strategy_A = c(10, 12, 13, 1, 2)
Strategy_B = c(1, 2, 1, 4, 5)
Strategy_C = c(2, 3, 6, 8, 15)
all = data.frame(ID, Strategy_A, Strategy_B, Strategy_C)


I thought about using apply and creating a function linked to the fastDummies package.



     dummies = function(x){
dummy_cols(x)
}

new = apply(all [,-1], 2, dummies)
new = as.data.frame(new)


However, this creates dummies for StrategyA_1 StrategyA_2 StrategyA_3 rather than summarizing the dummies as Strategy1 Strategy2 Strategy3. Any ideas how to fix this?







r apply






share|improve this question













share|improve this question











share|improve this question




share|improve this question










asked 5 hours ago









Student

305




305








  • 1




    please describe the output you expected.
    – Darren Tsai
    5 hours ago










  • Sounds like you're interested in creating a dummy for each combination of the three variables? In which case you might need to create a another variable that combines them and then create the dummy variables from that.
    – Cleland
    5 hours ago














  • 1




    please describe the output you expected.
    – Darren Tsai
    5 hours ago










  • Sounds like you're interested in creating a dummy for each combination of the three variables? In which case you might need to create a another variable that combines them and then create the dummy variables from that.
    – Cleland
    5 hours ago








1




1




please describe the output you expected.
– Darren Tsai
5 hours ago




please describe the output you expected.
– Darren Tsai
5 hours ago












Sounds like you're interested in creating a dummy for each combination of the three variables? In which case you might need to create a another variable that combines them and then create the dummy variables from that.
– Cleland
5 hours ago




Sounds like you're interested in creating a dummy for each combination of the three variables? In which case you might need to create a another variable that combines them and then create the dummy variables from that.
– Cleland
5 hours ago












2 Answers
2






active

oldest

votes

















up vote
0
down vote













After a small transformation of all, you can use dummy.data.frame() from dummies (you can also use dummy_cols() from fastDummies) and then aggregate per ID.



all <- data.frame(ID = rep(all$ID, 3),
Strategy = c(all$Strategy_A, all$Strategy_B, all$Strategy_C)) # data frame "all" with one column Strategy
library(dummies)
all <- dummy.data.frame(all, "Strategy") # or fastDummies::dummy_cols(all, "Strategy")
aggregate(. ~ ID, all, sum) # since strategies are now dummies, the sum will always be 0 or 1
# output
ID Strategy1 Strategy2 Strategy3 Strategy4 Strategy5 Strategy6 Strategy8 Strategy10 Strategy12 Strategy13 Strategy15
1 1 1 1 0 0 0 0 0 1 0 0 0
2 2 0 1 1 0 0 0 0 0 1 0 0
3 3 1 0 0 0 0 1 0 0 0 1 0
4 4 1 0 0 1 0 0 1 0 0 0 0
5 5 0 1 0 0 1 0 0 0 0 0 1





share|improve this answer






























    up vote
    0
    down vote













    I provide a method with the tidyverse way.



    library(tidyverse)

    new <- all %>% gather(select = -ID) %>%
    mutate(key = NULL, num = 1) %>%
    spread(value, num)

    # ID 1 2 3 4 5 6 8 10 12 13 15
    # 1 1 1 1 NA NA NA NA NA 1 NA NA NA
    # 2 2 NA 1 1 NA NA NA NA NA 1 NA NA
    # 3 3 1 NA NA NA NA 1 NA NA NA 1 NA
    # 4 4 1 NA NA 1 NA NA 1 NA NA NA NA
    # 5 5 NA 1 NA NA 1 NA NA NA NA NA 1

    new[is.na(new)] <- 0
    new

    # ID 1 2 3 4 5 6 8 10 12 13 15
    # 1 1 1 1 0 0 0 0 0 1 0 0 0
    # 2 2 0 1 1 0 0 0 0 0 1 0 0
    # 3 3 1 0 0 0 0 1 0 0 0 1 0
    # 4 4 1 0 0 1 0 0 1 0 0 0 0
    # 5 5 0 1 0 0 1 0 0 0 0 0 1





    share|improve this answer





















      Your Answer






      StackExchange.ifUsing("editor", function () {
      StackExchange.using("externalEditor", function () {
      StackExchange.using("snippets", function () {
      StackExchange.snippets.init();
      });
      });
      }, "code-snippets");

      StackExchange.ready(function() {
      var channelOptions = {
      tags: "".split(" "),
      id: "1"
      };
      initTagRenderer("".split(" "), "".split(" "), channelOptions);

      StackExchange.using("externalEditor", function() {
      // Have to fire editor after snippets, if snippets enabled
      if (StackExchange.settings.snippets.snippetsEnabled) {
      StackExchange.using("snippets", function() {
      createEditor();
      });
      }
      else {
      createEditor();
      }
      });

      function createEditor() {
      StackExchange.prepareEditor({
      heartbeatType: 'answer',
      convertImagesToLinks: true,
      noModals: true,
      showLowRepImageUploadWarning: true,
      reputationToPostImages: 10,
      bindNavPrevention: true,
      postfix: "",
      imageUploader: {
      brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
      contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
      allowUrls: true
      },
      onDemand: true,
      discardSelector: ".discard-answer"
      ,immediatelyShowMarkdownHelp:true
      });


      }
      });














       

      draft saved


      draft discarded


















      StackExchange.ready(
      function () {
      StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53266035%2fcreating-dummies-with-apply-in-r%23new-answer', 'question_page');
      }
      );

      Post as a guest
































      2 Answers
      2






      active

      oldest

      votes








      2 Answers
      2






      active

      oldest

      votes









      active

      oldest

      votes






      active

      oldest

      votes








      up vote
      0
      down vote













      After a small transformation of all, you can use dummy.data.frame() from dummies (you can also use dummy_cols() from fastDummies) and then aggregate per ID.



      all <- data.frame(ID = rep(all$ID, 3),
      Strategy = c(all$Strategy_A, all$Strategy_B, all$Strategy_C)) # data frame "all" with one column Strategy
      library(dummies)
      all <- dummy.data.frame(all, "Strategy") # or fastDummies::dummy_cols(all, "Strategy")
      aggregate(. ~ ID, all, sum) # since strategies are now dummies, the sum will always be 0 or 1
      # output
      ID Strategy1 Strategy2 Strategy3 Strategy4 Strategy5 Strategy6 Strategy8 Strategy10 Strategy12 Strategy13 Strategy15
      1 1 1 1 0 0 0 0 0 1 0 0 0
      2 2 0 1 1 0 0 0 0 0 1 0 0
      3 3 1 0 0 0 0 1 0 0 0 1 0
      4 4 1 0 0 1 0 0 1 0 0 0 0
      5 5 0 1 0 0 1 0 0 0 0 0 1





      share|improve this answer



























        up vote
        0
        down vote













        After a small transformation of all, you can use dummy.data.frame() from dummies (you can also use dummy_cols() from fastDummies) and then aggregate per ID.



        all <- data.frame(ID = rep(all$ID, 3),
        Strategy = c(all$Strategy_A, all$Strategy_B, all$Strategy_C)) # data frame "all" with one column Strategy
        library(dummies)
        all <- dummy.data.frame(all, "Strategy") # or fastDummies::dummy_cols(all, "Strategy")
        aggregate(. ~ ID, all, sum) # since strategies are now dummies, the sum will always be 0 or 1
        # output
        ID Strategy1 Strategy2 Strategy3 Strategy4 Strategy5 Strategy6 Strategy8 Strategy10 Strategy12 Strategy13 Strategy15
        1 1 1 1 0 0 0 0 0 1 0 0 0
        2 2 0 1 1 0 0 0 0 0 1 0 0
        3 3 1 0 0 0 0 1 0 0 0 1 0
        4 4 1 0 0 1 0 0 1 0 0 0 0
        5 5 0 1 0 0 1 0 0 0 0 0 1





        share|improve this answer

























          up vote
          0
          down vote










          up vote
          0
          down vote









          After a small transformation of all, you can use dummy.data.frame() from dummies (you can also use dummy_cols() from fastDummies) and then aggregate per ID.



          all <- data.frame(ID = rep(all$ID, 3),
          Strategy = c(all$Strategy_A, all$Strategy_B, all$Strategy_C)) # data frame "all" with one column Strategy
          library(dummies)
          all <- dummy.data.frame(all, "Strategy") # or fastDummies::dummy_cols(all, "Strategy")
          aggregate(. ~ ID, all, sum) # since strategies are now dummies, the sum will always be 0 or 1
          # output
          ID Strategy1 Strategy2 Strategy3 Strategy4 Strategy5 Strategy6 Strategy8 Strategy10 Strategy12 Strategy13 Strategy15
          1 1 1 1 0 0 0 0 0 1 0 0 0
          2 2 0 1 1 0 0 0 0 0 1 0 0
          3 3 1 0 0 0 0 1 0 0 0 1 0
          4 4 1 0 0 1 0 0 1 0 0 0 0
          5 5 0 1 0 0 1 0 0 0 0 0 1





          share|improve this answer














          After a small transformation of all, you can use dummy.data.frame() from dummies (you can also use dummy_cols() from fastDummies) and then aggregate per ID.



          all <- data.frame(ID = rep(all$ID, 3),
          Strategy = c(all$Strategy_A, all$Strategy_B, all$Strategy_C)) # data frame "all" with one column Strategy
          library(dummies)
          all <- dummy.data.frame(all, "Strategy") # or fastDummies::dummy_cols(all, "Strategy")
          aggregate(. ~ ID, all, sum) # since strategies are now dummies, the sum will always be 0 or 1
          # output
          ID Strategy1 Strategy2 Strategy3 Strategy4 Strategy5 Strategy6 Strategy8 Strategy10 Strategy12 Strategy13 Strategy15
          1 1 1 1 0 0 0 0 0 1 0 0 0
          2 2 0 1 1 0 0 0 0 0 1 0 0
          3 3 1 0 0 0 0 1 0 0 0 1 0
          4 4 1 0 0 1 0 0 1 0 0 0 0
          5 5 0 1 0 0 1 0 0 0 0 0 1






          share|improve this answer














          share|improve this answer



          share|improve this answer








          edited 4 hours ago

























          answered 4 hours ago









          ANG

          3,8572620




          3,8572620
























              up vote
              0
              down vote













              I provide a method with the tidyverse way.



              library(tidyverse)

              new <- all %>% gather(select = -ID) %>%
              mutate(key = NULL, num = 1) %>%
              spread(value, num)

              # ID 1 2 3 4 5 6 8 10 12 13 15
              # 1 1 1 1 NA NA NA NA NA 1 NA NA NA
              # 2 2 NA 1 1 NA NA NA NA NA 1 NA NA
              # 3 3 1 NA NA NA NA 1 NA NA NA 1 NA
              # 4 4 1 NA NA 1 NA NA 1 NA NA NA NA
              # 5 5 NA 1 NA NA 1 NA NA NA NA NA 1

              new[is.na(new)] <- 0
              new

              # ID 1 2 3 4 5 6 8 10 12 13 15
              # 1 1 1 1 0 0 0 0 0 1 0 0 0
              # 2 2 0 1 1 0 0 0 0 0 1 0 0
              # 3 3 1 0 0 0 0 1 0 0 0 1 0
              # 4 4 1 0 0 1 0 0 1 0 0 0 0
              # 5 5 0 1 0 0 1 0 0 0 0 0 1





              share|improve this answer

























                up vote
                0
                down vote













                I provide a method with the tidyverse way.



                library(tidyverse)

                new <- all %>% gather(select = -ID) %>%
                mutate(key = NULL, num = 1) %>%
                spread(value, num)

                # ID 1 2 3 4 5 6 8 10 12 13 15
                # 1 1 1 1 NA NA NA NA NA 1 NA NA NA
                # 2 2 NA 1 1 NA NA NA NA NA 1 NA NA
                # 3 3 1 NA NA NA NA 1 NA NA NA 1 NA
                # 4 4 1 NA NA 1 NA NA 1 NA NA NA NA
                # 5 5 NA 1 NA NA 1 NA NA NA NA NA 1

                new[is.na(new)] <- 0
                new

                # ID 1 2 3 4 5 6 8 10 12 13 15
                # 1 1 1 1 0 0 0 0 0 1 0 0 0
                # 2 2 0 1 1 0 0 0 0 0 1 0 0
                # 3 3 1 0 0 0 0 1 0 0 0 1 0
                # 4 4 1 0 0 1 0 0 1 0 0 0 0
                # 5 5 0 1 0 0 1 0 0 0 0 0 1





                share|improve this answer























                  up vote
                  0
                  down vote










                  up vote
                  0
                  down vote









                  I provide a method with the tidyverse way.



                  library(tidyverse)

                  new <- all %>% gather(select = -ID) %>%
                  mutate(key = NULL, num = 1) %>%
                  spread(value, num)

                  # ID 1 2 3 4 5 6 8 10 12 13 15
                  # 1 1 1 1 NA NA NA NA NA 1 NA NA NA
                  # 2 2 NA 1 1 NA NA NA NA NA 1 NA NA
                  # 3 3 1 NA NA NA NA 1 NA NA NA 1 NA
                  # 4 4 1 NA NA 1 NA NA 1 NA NA NA NA
                  # 5 5 NA 1 NA NA 1 NA NA NA NA NA 1

                  new[is.na(new)] <- 0
                  new

                  # ID 1 2 3 4 5 6 8 10 12 13 15
                  # 1 1 1 1 0 0 0 0 0 1 0 0 0
                  # 2 2 0 1 1 0 0 0 0 0 1 0 0
                  # 3 3 1 0 0 0 0 1 0 0 0 1 0
                  # 4 4 1 0 0 1 0 0 1 0 0 0 0
                  # 5 5 0 1 0 0 1 0 0 0 0 0 1





                  share|improve this answer












                  I provide a method with the tidyverse way.



                  library(tidyverse)

                  new <- all %>% gather(select = -ID) %>%
                  mutate(key = NULL, num = 1) %>%
                  spread(value, num)

                  # ID 1 2 3 4 5 6 8 10 12 13 15
                  # 1 1 1 1 NA NA NA NA NA 1 NA NA NA
                  # 2 2 NA 1 1 NA NA NA NA NA 1 NA NA
                  # 3 3 1 NA NA NA NA 1 NA NA NA 1 NA
                  # 4 4 1 NA NA 1 NA NA 1 NA NA NA NA
                  # 5 5 NA 1 NA NA 1 NA NA NA NA NA 1

                  new[is.na(new)] <- 0
                  new

                  # ID 1 2 3 4 5 6 8 10 12 13 15
                  # 1 1 1 1 0 0 0 0 0 1 0 0 0
                  # 2 2 0 1 1 0 0 0 0 0 1 0 0
                  # 3 3 1 0 0 0 0 1 0 0 0 1 0
                  # 4 4 1 0 0 1 0 0 1 0 0 0 0
                  # 5 5 0 1 0 0 1 0 0 0 0 0 1






                  share|improve this answer












                  share|improve this answer



                  share|improve this answer










                  answered 4 hours ago









                  Darren Tsai

                  742116




                  742116






























                       

                      draft saved


                      draft discarded



















































                       


                      draft saved


                      draft discarded














                      StackExchange.ready(
                      function () {
                      StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53266035%2fcreating-dummies-with-apply-in-r%23new-answer', 'question_page');
                      }
                      );

                      Post as a guest




















































































                      Popular posts from this blog

                      How to change which sound is reproduced for terminal bell?

                      Title Spacing in Bjornstrup Chapter, Removing Chapter Number From Contents

                      Can I use Tabulator js library in my java Spring + Thymeleaf project?