R data table rolling unique [closed]












-2














Have been looking for a solution for counting unique values of a column in a data data table in a rolling fashion, found rollmean but have not found something like rollunique.



What is a good practice for achieving this using data table?



Thanks










share|improve this question













closed as off-topic by alistaire, MLavoie, EdChum, Unheilig, greg-449 Nov 19 '18 at 11:02


This question appears to be off-topic. The users who voted to close gave this specific reason:


  • "Questions seeking debugging help ("why isn't this code working?") must include the desired behavior, a specific problem or error and the shortest code necessary to reproduce it in the question itself. Questions without a clear problem statement are not useful to other readers. See: How to create a Minimal, Complete, and Verifiable example." – alistaire, MLavoie, EdChum

If this question can be reworded to fit the rules in the help center, please edit the question.


















    -2














    Have been looking for a solution for counting unique values of a column in a data data table in a rolling fashion, found rollmean but have not found something like rollunique.



    What is a good practice for achieving this using data table?



    Thanks










    share|improve this question













    closed as off-topic by alistaire, MLavoie, EdChum, Unheilig, greg-449 Nov 19 '18 at 11:02


    This question appears to be off-topic. The users who voted to close gave this specific reason:


    • "Questions seeking debugging help ("why isn't this code working?") must include the desired behavior, a specific problem or error and the shortest code necessary to reproduce it in the question itself. Questions without a clear problem statement are not useful to other readers. See: How to create a Minimal, Complete, and Verifiable example." – alistaire, MLavoie, EdChum

    If this question can be reworded to fit the rules in the help center, please edit the question.
















      -2












      -2








      -2







      Have been looking for a solution for counting unique values of a column in a data data table in a rolling fashion, found rollmean but have not found something like rollunique.



      What is a good practice for achieving this using data table?



      Thanks










      share|improve this question













      Have been looking for a solution for counting unique values of a column in a data data table in a rolling fashion, found rollmean but have not found something like rollunique.



      What is a good practice for achieving this using data table?



      Thanks







      r data.table rolling






      share|improve this question













      share|improve this question











      share|improve this question




      share|improve this question










      asked Nov 19 '18 at 1:22









      jsilva99jsilva99

      21




      21




      closed as off-topic by alistaire, MLavoie, EdChum, Unheilig, greg-449 Nov 19 '18 at 11:02


      This question appears to be off-topic. The users who voted to close gave this specific reason:


      • "Questions seeking debugging help ("why isn't this code working?") must include the desired behavior, a specific problem or error and the shortest code necessary to reproduce it in the question itself. Questions without a clear problem statement are not useful to other readers. See: How to create a Minimal, Complete, and Verifiable example." – alistaire, MLavoie, EdChum

      If this question can be reworded to fit the rules in the help center, please edit the question.




      closed as off-topic by alistaire, MLavoie, EdChum, Unheilig, greg-449 Nov 19 '18 at 11:02


      This question appears to be off-topic. The users who voted to close gave this specific reason:


      • "Questions seeking debugging help ("why isn't this code working?") must include the desired behavior, a specific problem or error and the shortest code necessary to reproduce it in the question itself. Questions without a clear problem statement are not useful to other readers. See: How to create a Minimal, Complete, and Verifiable example." – alistaire, MLavoie, EdChum

      If this question can be reworded to fit the rules in the help center, please edit the question.
























          1 Answer
          1






          active

          oldest

          votes


















          2














          1) An option is to use zoo::rollapply



          sample data:



          library(data.table)
          set.seed(0L)
          sz <- 1e5L
          winsz <- 5L
          DT <- data.table(ID=sample(letters, sz, replace=TRUE))


          sample usage using zoo:rollapply*:



          DT[, numUniq := zoo::rollapplyr(ID, winsz, uniqueN, fill=NA_integer_)]


          2) Another option is to write your own windowing:



          DT[, numUniq2 := replace(
          sapply(1:.N, function(n) uniqueN(ID[max(n-winsz+1, 1L):n])),
          .I < winsz,
          NA_integer_)]


          3) Another option is to use data.table::shift



          DT[, numUniq3 := replace(
          apply(setDT(shift(ID, 0L:(winsz-1L))), 1L, uniqueN),
          1L:.N < winsz,
          NA_integer_)]


          output:



                  ID numUniq numUniq2 numUniq3
          1: x NA NA NA
          2: g NA NA NA
          3: j NA NA NA
          4: o NA NA NA
          5: x 4 4 4
          ---
          99996: k 4 4 4
          99997: a 4 4 4
          99998: f 4 4 4
          99999: z 4 4 4
          100000: c 5 5 5


          Benchmarking



          timing code:



          microbenchmark::microbenchmark(
          zooRoll=DT[, numUniq := zoo::rollapplyr(ID, winsz, uniqueN, fill=NA)],
          sapply=DT[, numUniq2 := replace(
          vapply(1L:.N, function(n) uniqueN(ID[max(n-winsz+1L, 1L):n]), integer(1L)),
          1L:.N < winsz,
          NA_integer_)],
          shift=DT[, numUniq3 := replace(
          apply(setDT(shift(ID, 0L:(winsz-1L))), 1L, uniqueN),
          1L:.N < winsz,
          NA_integer_)],
          times=3L)


          timings:



          Unit: seconds
          expr min lq mean median uq max neval
          zooRoll 1.723915 1.774423 1.837433 1.824931 1.894191 1.963451 3
          sapply 1.214608 1.224971 1.230763 1.235333 1.238840 1.242348 3
          shift 1.188266 1.234769 1.266852 1.281272 1.306145 1.331018 3


          see also:




          • Is there a _fast_ way to run a rolling regression inside data.table?


          • R data.table sliding window







          share|improve this answer






























            1 Answer
            1






            active

            oldest

            votes








            1 Answer
            1






            active

            oldest

            votes









            active

            oldest

            votes






            active

            oldest

            votes









            2














            1) An option is to use zoo::rollapply



            sample data:



            library(data.table)
            set.seed(0L)
            sz <- 1e5L
            winsz <- 5L
            DT <- data.table(ID=sample(letters, sz, replace=TRUE))


            sample usage using zoo:rollapply*:



            DT[, numUniq := zoo::rollapplyr(ID, winsz, uniqueN, fill=NA_integer_)]


            2) Another option is to write your own windowing:



            DT[, numUniq2 := replace(
            sapply(1:.N, function(n) uniqueN(ID[max(n-winsz+1, 1L):n])),
            .I < winsz,
            NA_integer_)]


            3) Another option is to use data.table::shift



            DT[, numUniq3 := replace(
            apply(setDT(shift(ID, 0L:(winsz-1L))), 1L, uniqueN),
            1L:.N < winsz,
            NA_integer_)]


            output:



                    ID numUniq numUniq2 numUniq3
            1: x NA NA NA
            2: g NA NA NA
            3: j NA NA NA
            4: o NA NA NA
            5: x 4 4 4
            ---
            99996: k 4 4 4
            99997: a 4 4 4
            99998: f 4 4 4
            99999: z 4 4 4
            100000: c 5 5 5


            Benchmarking



            timing code:



            microbenchmark::microbenchmark(
            zooRoll=DT[, numUniq := zoo::rollapplyr(ID, winsz, uniqueN, fill=NA)],
            sapply=DT[, numUniq2 := replace(
            vapply(1L:.N, function(n) uniqueN(ID[max(n-winsz+1L, 1L):n]), integer(1L)),
            1L:.N < winsz,
            NA_integer_)],
            shift=DT[, numUniq3 := replace(
            apply(setDT(shift(ID, 0L:(winsz-1L))), 1L, uniqueN),
            1L:.N < winsz,
            NA_integer_)],
            times=3L)


            timings:



            Unit: seconds
            expr min lq mean median uq max neval
            zooRoll 1.723915 1.774423 1.837433 1.824931 1.894191 1.963451 3
            sapply 1.214608 1.224971 1.230763 1.235333 1.238840 1.242348 3
            shift 1.188266 1.234769 1.266852 1.281272 1.306145 1.331018 3


            see also:




            • Is there a _fast_ way to run a rolling regression inside data.table?


            • R data.table sliding window







            share|improve this answer




























              2














              1) An option is to use zoo::rollapply



              sample data:



              library(data.table)
              set.seed(0L)
              sz <- 1e5L
              winsz <- 5L
              DT <- data.table(ID=sample(letters, sz, replace=TRUE))


              sample usage using zoo:rollapply*:



              DT[, numUniq := zoo::rollapplyr(ID, winsz, uniqueN, fill=NA_integer_)]


              2) Another option is to write your own windowing:



              DT[, numUniq2 := replace(
              sapply(1:.N, function(n) uniqueN(ID[max(n-winsz+1, 1L):n])),
              .I < winsz,
              NA_integer_)]


              3) Another option is to use data.table::shift



              DT[, numUniq3 := replace(
              apply(setDT(shift(ID, 0L:(winsz-1L))), 1L, uniqueN),
              1L:.N < winsz,
              NA_integer_)]


              output:



                      ID numUniq numUniq2 numUniq3
              1: x NA NA NA
              2: g NA NA NA
              3: j NA NA NA
              4: o NA NA NA
              5: x 4 4 4
              ---
              99996: k 4 4 4
              99997: a 4 4 4
              99998: f 4 4 4
              99999: z 4 4 4
              100000: c 5 5 5


              Benchmarking



              timing code:



              microbenchmark::microbenchmark(
              zooRoll=DT[, numUniq := zoo::rollapplyr(ID, winsz, uniqueN, fill=NA)],
              sapply=DT[, numUniq2 := replace(
              vapply(1L:.N, function(n) uniqueN(ID[max(n-winsz+1L, 1L):n]), integer(1L)),
              1L:.N < winsz,
              NA_integer_)],
              shift=DT[, numUniq3 := replace(
              apply(setDT(shift(ID, 0L:(winsz-1L))), 1L, uniqueN),
              1L:.N < winsz,
              NA_integer_)],
              times=3L)


              timings:



              Unit: seconds
              expr min lq mean median uq max neval
              zooRoll 1.723915 1.774423 1.837433 1.824931 1.894191 1.963451 3
              sapply 1.214608 1.224971 1.230763 1.235333 1.238840 1.242348 3
              shift 1.188266 1.234769 1.266852 1.281272 1.306145 1.331018 3


              see also:




              • Is there a _fast_ way to run a rolling regression inside data.table?


              • R data.table sliding window







              share|improve this answer


























                2












                2








                2






                1) An option is to use zoo::rollapply



                sample data:



                library(data.table)
                set.seed(0L)
                sz <- 1e5L
                winsz <- 5L
                DT <- data.table(ID=sample(letters, sz, replace=TRUE))


                sample usage using zoo:rollapply*:



                DT[, numUniq := zoo::rollapplyr(ID, winsz, uniqueN, fill=NA_integer_)]


                2) Another option is to write your own windowing:



                DT[, numUniq2 := replace(
                sapply(1:.N, function(n) uniqueN(ID[max(n-winsz+1, 1L):n])),
                .I < winsz,
                NA_integer_)]


                3) Another option is to use data.table::shift



                DT[, numUniq3 := replace(
                apply(setDT(shift(ID, 0L:(winsz-1L))), 1L, uniqueN),
                1L:.N < winsz,
                NA_integer_)]


                output:



                        ID numUniq numUniq2 numUniq3
                1: x NA NA NA
                2: g NA NA NA
                3: j NA NA NA
                4: o NA NA NA
                5: x 4 4 4
                ---
                99996: k 4 4 4
                99997: a 4 4 4
                99998: f 4 4 4
                99999: z 4 4 4
                100000: c 5 5 5


                Benchmarking



                timing code:



                microbenchmark::microbenchmark(
                zooRoll=DT[, numUniq := zoo::rollapplyr(ID, winsz, uniqueN, fill=NA)],
                sapply=DT[, numUniq2 := replace(
                vapply(1L:.N, function(n) uniqueN(ID[max(n-winsz+1L, 1L):n]), integer(1L)),
                1L:.N < winsz,
                NA_integer_)],
                shift=DT[, numUniq3 := replace(
                apply(setDT(shift(ID, 0L:(winsz-1L))), 1L, uniqueN),
                1L:.N < winsz,
                NA_integer_)],
                times=3L)


                timings:



                Unit: seconds
                expr min lq mean median uq max neval
                zooRoll 1.723915 1.774423 1.837433 1.824931 1.894191 1.963451 3
                sapply 1.214608 1.224971 1.230763 1.235333 1.238840 1.242348 3
                shift 1.188266 1.234769 1.266852 1.281272 1.306145 1.331018 3


                see also:




                • Is there a _fast_ way to run a rolling regression inside data.table?


                • R data.table sliding window







                share|improve this answer














                1) An option is to use zoo::rollapply



                sample data:



                library(data.table)
                set.seed(0L)
                sz <- 1e5L
                winsz <- 5L
                DT <- data.table(ID=sample(letters, sz, replace=TRUE))


                sample usage using zoo:rollapply*:



                DT[, numUniq := zoo::rollapplyr(ID, winsz, uniqueN, fill=NA_integer_)]


                2) Another option is to write your own windowing:



                DT[, numUniq2 := replace(
                sapply(1:.N, function(n) uniqueN(ID[max(n-winsz+1, 1L):n])),
                .I < winsz,
                NA_integer_)]


                3) Another option is to use data.table::shift



                DT[, numUniq3 := replace(
                apply(setDT(shift(ID, 0L:(winsz-1L))), 1L, uniqueN),
                1L:.N < winsz,
                NA_integer_)]


                output:



                        ID numUniq numUniq2 numUniq3
                1: x NA NA NA
                2: g NA NA NA
                3: j NA NA NA
                4: o NA NA NA
                5: x 4 4 4
                ---
                99996: k 4 4 4
                99997: a 4 4 4
                99998: f 4 4 4
                99999: z 4 4 4
                100000: c 5 5 5


                Benchmarking



                timing code:



                microbenchmark::microbenchmark(
                zooRoll=DT[, numUniq := zoo::rollapplyr(ID, winsz, uniqueN, fill=NA)],
                sapply=DT[, numUniq2 := replace(
                vapply(1L:.N, function(n) uniqueN(ID[max(n-winsz+1L, 1L):n]), integer(1L)),
                1L:.N < winsz,
                NA_integer_)],
                shift=DT[, numUniq3 := replace(
                apply(setDT(shift(ID, 0L:(winsz-1L))), 1L, uniqueN),
                1L:.N < winsz,
                NA_integer_)],
                times=3L)


                timings:



                Unit: seconds
                expr min lq mean median uq max neval
                zooRoll 1.723915 1.774423 1.837433 1.824931 1.894191 1.963451 3
                sapply 1.214608 1.224971 1.230763 1.235333 1.238840 1.242348 3
                shift 1.188266 1.234769 1.266852 1.281272 1.306145 1.331018 3


                see also:




                • Is there a _fast_ way to run a rolling regression inside data.table?


                • R data.table sliding window








                share|improve this answer














                share|improve this answer



                share|improve this answer








                edited Nov 19 '18 at 2:51

























                answered Nov 19 '18 at 1:32









                chinsoon12chinsoon12

                8,41611119




                8,41611119















                    Popular posts from this blog

                    How to change which sound is reproduced for terminal bell?

                    Can I use Tabulator js library in my java Spring + Thymeleaf project?

                    Title Spacing in Bjornstrup Chapter, Removing Chapter Number From Contents