R data table rolling unique [closed]












-2














Have been looking for a solution for counting unique values of a column in a data data table in a rolling fashion, found rollmean but have not found something like rollunique.



What is a good practice for achieving this using data table?



Thanks










share|improve this question













closed as off-topic by alistaire, MLavoie, EdChum, Unheilig, greg-449 Nov 19 '18 at 11:02


This question appears to be off-topic. The users who voted to close gave this specific reason:


  • "Questions seeking debugging help ("why isn't this code working?") must include the desired behavior, a specific problem or error and the shortest code necessary to reproduce it in the question itself. Questions without a clear problem statement are not useful to other readers. See: How to create a Minimal, Complete, and Verifiable example." – alistaire, MLavoie, EdChum

If this question can be reworded to fit the rules in the help center, please edit the question.


















    -2














    Have been looking for a solution for counting unique values of a column in a data data table in a rolling fashion, found rollmean but have not found something like rollunique.



    What is a good practice for achieving this using data table?



    Thanks










    share|improve this question













    closed as off-topic by alistaire, MLavoie, EdChum, Unheilig, greg-449 Nov 19 '18 at 11:02


    This question appears to be off-topic. The users who voted to close gave this specific reason:


    • "Questions seeking debugging help ("why isn't this code working?") must include the desired behavior, a specific problem or error and the shortest code necessary to reproduce it in the question itself. Questions without a clear problem statement are not useful to other readers. See: How to create a Minimal, Complete, and Verifiable example." – alistaire, MLavoie, EdChum

    If this question can be reworded to fit the rules in the help center, please edit the question.
















      -2












      -2








      -2







      Have been looking for a solution for counting unique values of a column in a data data table in a rolling fashion, found rollmean but have not found something like rollunique.



      What is a good practice for achieving this using data table?



      Thanks










      share|improve this question













      Have been looking for a solution for counting unique values of a column in a data data table in a rolling fashion, found rollmean but have not found something like rollunique.



      What is a good practice for achieving this using data table?



      Thanks







      r data.table rolling






      share|improve this question













      share|improve this question











      share|improve this question




      share|improve this question










      asked Nov 19 '18 at 1:22









      jsilva99jsilva99

      21




      21




      closed as off-topic by alistaire, MLavoie, EdChum, Unheilig, greg-449 Nov 19 '18 at 11:02


      This question appears to be off-topic. The users who voted to close gave this specific reason:


      • "Questions seeking debugging help ("why isn't this code working?") must include the desired behavior, a specific problem or error and the shortest code necessary to reproduce it in the question itself. Questions without a clear problem statement are not useful to other readers. See: How to create a Minimal, Complete, and Verifiable example." – alistaire, MLavoie, EdChum

      If this question can be reworded to fit the rules in the help center, please edit the question.




      closed as off-topic by alistaire, MLavoie, EdChum, Unheilig, greg-449 Nov 19 '18 at 11:02


      This question appears to be off-topic. The users who voted to close gave this specific reason:


      • "Questions seeking debugging help ("why isn't this code working?") must include the desired behavior, a specific problem or error and the shortest code necessary to reproduce it in the question itself. Questions without a clear problem statement are not useful to other readers. See: How to create a Minimal, Complete, and Verifiable example." – alistaire, MLavoie, EdChum

      If this question can be reworded to fit the rules in the help center, please edit the question.
























          1 Answer
          1






          active

          oldest

          votes


















          2














          1) An option is to use zoo::rollapply



          sample data:



          library(data.table)
          set.seed(0L)
          sz <- 1e5L
          winsz <- 5L
          DT <- data.table(ID=sample(letters, sz, replace=TRUE))


          sample usage using zoo:rollapply*:



          DT[, numUniq := zoo::rollapplyr(ID, winsz, uniqueN, fill=NA_integer_)]


          2) Another option is to write your own windowing:



          DT[, numUniq2 := replace(
          sapply(1:.N, function(n) uniqueN(ID[max(n-winsz+1, 1L):n])),
          .I < winsz,
          NA_integer_)]


          3) Another option is to use data.table::shift



          DT[, numUniq3 := replace(
          apply(setDT(shift(ID, 0L:(winsz-1L))), 1L, uniqueN),
          1L:.N < winsz,
          NA_integer_)]


          output:



                  ID numUniq numUniq2 numUniq3
          1: x NA NA NA
          2: g NA NA NA
          3: j NA NA NA
          4: o NA NA NA
          5: x 4 4 4
          ---
          99996: k 4 4 4
          99997: a 4 4 4
          99998: f 4 4 4
          99999: z 4 4 4
          100000: c 5 5 5


          Benchmarking



          timing code:



          microbenchmark::microbenchmark(
          zooRoll=DT[, numUniq := zoo::rollapplyr(ID, winsz, uniqueN, fill=NA)],
          sapply=DT[, numUniq2 := replace(
          vapply(1L:.N, function(n) uniqueN(ID[max(n-winsz+1L, 1L):n]), integer(1L)),
          1L:.N < winsz,
          NA_integer_)],
          shift=DT[, numUniq3 := replace(
          apply(setDT(shift(ID, 0L:(winsz-1L))), 1L, uniqueN),
          1L:.N < winsz,
          NA_integer_)],
          times=3L)


          timings:



          Unit: seconds
          expr min lq mean median uq max neval
          zooRoll 1.723915 1.774423 1.837433 1.824931 1.894191 1.963451 3
          sapply 1.214608 1.224971 1.230763 1.235333 1.238840 1.242348 3
          shift 1.188266 1.234769 1.266852 1.281272 1.306145 1.331018 3


          see also:




          • Is there a _fast_ way to run a rolling regression inside data.table?


          • R data.table sliding window







          share|improve this answer






























            1 Answer
            1






            active

            oldest

            votes








            1 Answer
            1






            active

            oldest

            votes









            active

            oldest

            votes






            active

            oldest

            votes









            2














            1) An option is to use zoo::rollapply



            sample data:



            library(data.table)
            set.seed(0L)
            sz <- 1e5L
            winsz <- 5L
            DT <- data.table(ID=sample(letters, sz, replace=TRUE))


            sample usage using zoo:rollapply*:



            DT[, numUniq := zoo::rollapplyr(ID, winsz, uniqueN, fill=NA_integer_)]


            2) Another option is to write your own windowing:



            DT[, numUniq2 := replace(
            sapply(1:.N, function(n) uniqueN(ID[max(n-winsz+1, 1L):n])),
            .I < winsz,
            NA_integer_)]


            3) Another option is to use data.table::shift



            DT[, numUniq3 := replace(
            apply(setDT(shift(ID, 0L:(winsz-1L))), 1L, uniqueN),
            1L:.N < winsz,
            NA_integer_)]


            output:



                    ID numUniq numUniq2 numUniq3
            1: x NA NA NA
            2: g NA NA NA
            3: j NA NA NA
            4: o NA NA NA
            5: x 4 4 4
            ---
            99996: k 4 4 4
            99997: a 4 4 4
            99998: f 4 4 4
            99999: z 4 4 4
            100000: c 5 5 5


            Benchmarking



            timing code:



            microbenchmark::microbenchmark(
            zooRoll=DT[, numUniq := zoo::rollapplyr(ID, winsz, uniqueN, fill=NA)],
            sapply=DT[, numUniq2 := replace(
            vapply(1L:.N, function(n) uniqueN(ID[max(n-winsz+1L, 1L):n]), integer(1L)),
            1L:.N < winsz,
            NA_integer_)],
            shift=DT[, numUniq3 := replace(
            apply(setDT(shift(ID, 0L:(winsz-1L))), 1L, uniqueN),
            1L:.N < winsz,
            NA_integer_)],
            times=3L)


            timings:



            Unit: seconds
            expr min lq mean median uq max neval
            zooRoll 1.723915 1.774423 1.837433 1.824931 1.894191 1.963451 3
            sapply 1.214608 1.224971 1.230763 1.235333 1.238840 1.242348 3
            shift 1.188266 1.234769 1.266852 1.281272 1.306145 1.331018 3


            see also:




            • Is there a _fast_ way to run a rolling regression inside data.table?


            • R data.table sliding window







            share|improve this answer




























              2














              1) An option is to use zoo::rollapply



              sample data:



              library(data.table)
              set.seed(0L)
              sz <- 1e5L
              winsz <- 5L
              DT <- data.table(ID=sample(letters, sz, replace=TRUE))


              sample usage using zoo:rollapply*:



              DT[, numUniq := zoo::rollapplyr(ID, winsz, uniqueN, fill=NA_integer_)]


              2) Another option is to write your own windowing:



              DT[, numUniq2 := replace(
              sapply(1:.N, function(n) uniqueN(ID[max(n-winsz+1, 1L):n])),
              .I < winsz,
              NA_integer_)]


              3) Another option is to use data.table::shift



              DT[, numUniq3 := replace(
              apply(setDT(shift(ID, 0L:(winsz-1L))), 1L, uniqueN),
              1L:.N < winsz,
              NA_integer_)]


              output:



                      ID numUniq numUniq2 numUniq3
              1: x NA NA NA
              2: g NA NA NA
              3: j NA NA NA
              4: o NA NA NA
              5: x 4 4 4
              ---
              99996: k 4 4 4
              99997: a 4 4 4
              99998: f 4 4 4
              99999: z 4 4 4
              100000: c 5 5 5


              Benchmarking



              timing code:



              microbenchmark::microbenchmark(
              zooRoll=DT[, numUniq := zoo::rollapplyr(ID, winsz, uniqueN, fill=NA)],
              sapply=DT[, numUniq2 := replace(
              vapply(1L:.N, function(n) uniqueN(ID[max(n-winsz+1L, 1L):n]), integer(1L)),
              1L:.N < winsz,
              NA_integer_)],
              shift=DT[, numUniq3 := replace(
              apply(setDT(shift(ID, 0L:(winsz-1L))), 1L, uniqueN),
              1L:.N < winsz,
              NA_integer_)],
              times=3L)


              timings:



              Unit: seconds
              expr min lq mean median uq max neval
              zooRoll 1.723915 1.774423 1.837433 1.824931 1.894191 1.963451 3
              sapply 1.214608 1.224971 1.230763 1.235333 1.238840 1.242348 3
              shift 1.188266 1.234769 1.266852 1.281272 1.306145 1.331018 3


              see also:




              • Is there a _fast_ way to run a rolling regression inside data.table?


              • R data.table sliding window







              share|improve this answer


























                2












                2








                2






                1) An option is to use zoo::rollapply



                sample data:



                library(data.table)
                set.seed(0L)
                sz <- 1e5L
                winsz <- 5L
                DT <- data.table(ID=sample(letters, sz, replace=TRUE))


                sample usage using zoo:rollapply*:



                DT[, numUniq := zoo::rollapplyr(ID, winsz, uniqueN, fill=NA_integer_)]


                2) Another option is to write your own windowing:



                DT[, numUniq2 := replace(
                sapply(1:.N, function(n) uniqueN(ID[max(n-winsz+1, 1L):n])),
                .I < winsz,
                NA_integer_)]


                3) Another option is to use data.table::shift



                DT[, numUniq3 := replace(
                apply(setDT(shift(ID, 0L:(winsz-1L))), 1L, uniqueN),
                1L:.N < winsz,
                NA_integer_)]


                output:



                        ID numUniq numUniq2 numUniq3
                1: x NA NA NA
                2: g NA NA NA
                3: j NA NA NA
                4: o NA NA NA
                5: x 4 4 4
                ---
                99996: k 4 4 4
                99997: a 4 4 4
                99998: f 4 4 4
                99999: z 4 4 4
                100000: c 5 5 5


                Benchmarking



                timing code:



                microbenchmark::microbenchmark(
                zooRoll=DT[, numUniq := zoo::rollapplyr(ID, winsz, uniqueN, fill=NA)],
                sapply=DT[, numUniq2 := replace(
                vapply(1L:.N, function(n) uniqueN(ID[max(n-winsz+1L, 1L):n]), integer(1L)),
                1L:.N < winsz,
                NA_integer_)],
                shift=DT[, numUniq3 := replace(
                apply(setDT(shift(ID, 0L:(winsz-1L))), 1L, uniqueN),
                1L:.N < winsz,
                NA_integer_)],
                times=3L)


                timings:



                Unit: seconds
                expr min lq mean median uq max neval
                zooRoll 1.723915 1.774423 1.837433 1.824931 1.894191 1.963451 3
                sapply 1.214608 1.224971 1.230763 1.235333 1.238840 1.242348 3
                shift 1.188266 1.234769 1.266852 1.281272 1.306145 1.331018 3


                see also:




                • Is there a _fast_ way to run a rolling regression inside data.table?


                • R data.table sliding window







                share|improve this answer














                1) An option is to use zoo::rollapply



                sample data:



                library(data.table)
                set.seed(0L)
                sz <- 1e5L
                winsz <- 5L
                DT <- data.table(ID=sample(letters, sz, replace=TRUE))


                sample usage using zoo:rollapply*:



                DT[, numUniq := zoo::rollapplyr(ID, winsz, uniqueN, fill=NA_integer_)]


                2) Another option is to write your own windowing:



                DT[, numUniq2 := replace(
                sapply(1:.N, function(n) uniqueN(ID[max(n-winsz+1, 1L):n])),
                .I < winsz,
                NA_integer_)]


                3) Another option is to use data.table::shift



                DT[, numUniq3 := replace(
                apply(setDT(shift(ID, 0L:(winsz-1L))), 1L, uniqueN),
                1L:.N < winsz,
                NA_integer_)]


                output:



                        ID numUniq numUniq2 numUniq3
                1: x NA NA NA
                2: g NA NA NA
                3: j NA NA NA
                4: o NA NA NA
                5: x 4 4 4
                ---
                99996: k 4 4 4
                99997: a 4 4 4
                99998: f 4 4 4
                99999: z 4 4 4
                100000: c 5 5 5


                Benchmarking



                timing code:



                microbenchmark::microbenchmark(
                zooRoll=DT[, numUniq := zoo::rollapplyr(ID, winsz, uniqueN, fill=NA)],
                sapply=DT[, numUniq2 := replace(
                vapply(1L:.N, function(n) uniqueN(ID[max(n-winsz+1L, 1L):n]), integer(1L)),
                1L:.N < winsz,
                NA_integer_)],
                shift=DT[, numUniq3 := replace(
                apply(setDT(shift(ID, 0L:(winsz-1L))), 1L, uniqueN),
                1L:.N < winsz,
                NA_integer_)],
                times=3L)


                timings:



                Unit: seconds
                expr min lq mean median uq max neval
                zooRoll 1.723915 1.774423 1.837433 1.824931 1.894191 1.963451 3
                sapply 1.214608 1.224971 1.230763 1.235333 1.238840 1.242348 3
                shift 1.188266 1.234769 1.266852 1.281272 1.306145 1.331018 3


                see also:




                • Is there a _fast_ way to run a rolling regression inside data.table?


                • R data.table sliding window








                share|improve this answer














                share|improve this answer



                share|improve this answer








                edited Nov 19 '18 at 2:51

























                answered Nov 19 '18 at 1:32









                chinsoon12chinsoon12

                8,41611119




                8,41611119















                    Popular posts from this blog

                    Biblatex bibliography style without URLs when DOI exists (in Overleaf with Zotero bibliography)

                    How to change which sound is reproduced for terminal bell?

                    Title Spacing in Bjornstrup Chapter, Removing Chapter Number From Contents