R data table rolling unique [closed]
Have been looking for a solution for counting unique values of a column in a data data table in a rolling fashion, found rollmean but have not found something like rollunique.
What is a good practice for achieving this using data table?
Thanks
r data.table rolling
closed as off-topic by alistaire, MLavoie, EdChum, Unheilig, greg-449 Nov 19 '18 at 11:02
This question appears to be off-topic. The users who voted to close gave this specific reason:
- "Questions seeking debugging help ("why isn't this code working?") must include the desired behavior, a specific problem or error and the shortest code necessary to reproduce it in the question itself. Questions without a clear problem statement are not useful to other readers. See: How to create a Minimal, Complete, and Verifiable example." – alistaire, MLavoie, EdChum
If this question can be reworded to fit the rules in the help center, please edit the question.
add a comment |
Have been looking for a solution for counting unique values of a column in a data data table in a rolling fashion, found rollmean but have not found something like rollunique.
What is a good practice for achieving this using data table?
Thanks
r data.table rolling
closed as off-topic by alistaire, MLavoie, EdChum, Unheilig, greg-449 Nov 19 '18 at 11:02
This question appears to be off-topic. The users who voted to close gave this specific reason:
- "Questions seeking debugging help ("why isn't this code working?") must include the desired behavior, a specific problem or error and the shortest code necessary to reproduce it in the question itself. Questions without a clear problem statement are not useful to other readers. See: How to create a Minimal, Complete, and Verifiable example." – alistaire, MLavoie, EdChum
If this question can be reworded to fit the rules in the help center, please edit the question.
add a comment |
Have been looking for a solution for counting unique values of a column in a data data table in a rolling fashion, found rollmean but have not found something like rollunique.
What is a good practice for achieving this using data table?
Thanks
r data.table rolling
Have been looking for a solution for counting unique values of a column in a data data table in a rolling fashion, found rollmean but have not found something like rollunique.
What is a good practice for achieving this using data table?
Thanks
r data.table rolling
r data.table rolling
asked Nov 19 '18 at 1:22
jsilva99jsilva99
21
21
closed as off-topic by alistaire, MLavoie, EdChum, Unheilig, greg-449 Nov 19 '18 at 11:02
This question appears to be off-topic. The users who voted to close gave this specific reason:
- "Questions seeking debugging help ("why isn't this code working?") must include the desired behavior, a specific problem or error and the shortest code necessary to reproduce it in the question itself. Questions without a clear problem statement are not useful to other readers. See: How to create a Minimal, Complete, and Verifiable example." – alistaire, MLavoie, EdChum
If this question can be reworded to fit the rules in the help center, please edit the question.
closed as off-topic by alistaire, MLavoie, EdChum, Unheilig, greg-449 Nov 19 '18 at 11:02
This question appears to be off-topic. The users who voted to close gave this specific reason:
- "Questions seeking debugging help ("why isn't this code working?") must include the desired behavior, a specific problem or error and the shortest code necessary to reproduce it in the question itself. Questions without a clear problem statement are not useful to other readers. See: How to create a Minimal, Complete, and Verifiable example." – alistaire, MLavoie, EdChum
If this question can be reworded to fit the rules in the help center, please edit the question.
add a comment |
add a comment |
1 Answer
1
active
oldest
votes
1) An option is to use zoo::rollapply
sample data:
library(data.table)
set.seed(0L)
sz <- 1e5L
winsz <- 5L
DT <- data.table(ID=sample(letters, sz, replace=TRUE))
sample usage using zoo:rollapply*
:
DT[, numUniq := zoo::rollapplyr(ID, winsz, uniqueN, fill=NA_integer_)]
2) Another option is to write your own windowing:
DT[, numUniq2 := replace(
sapply(1:.N, function(n) uniqueN(ID[max(n-winsz+1, 1L):n])),
.I < winsz,
NA_integer_)]
3) Another option is to use data.table::shift
DT[, numUniq3 := replace(
apply(setDT(shift(ID, 0L:(winsz-1L))), 1L, uniqueN),
1L:.N < winsz,
NA_integer_)]
output:
ID numUniq numUniq2 numUniq3
1: x NA NA NA
2: g NA NA NA
3: j NA NA NA
4: o NA NA NA
5: x 4 4 4
---
99996: k 4 4 4
99997: a 4 4 4
99998: f 4 4 4
99999: z 4 4 4
100000: c 5 5 5
Benchmarking
timing code:
microbenchmark::microbenchmark(
zooRoll=DT[, numUniq := zoo::rollapplyr(ID, winsz, uniqueN, fill=NA)],
sapply=DT[, numUniq2 := replace(
vapply(1L:.N, function(n) uniqueN(ID[max(n-winsz+1L, 1L):n]), integer(1L)),
1L:.N < winsz,
NA_integer_)],
shift=DT[, numUniq3 := replace(
apply(setDT(shift(ID, 0L:(winsz-1L))), 1L, uniqueN),
1L:.N < winsz,
NA_integer_)],
times=3L)
timings:
Unit: seconds
expr min lq mean median uq max neval
zooRoll 1.723915 1.774423 1.837433 1.824931 1.894191 1.963451 3
sapply 1.214608 1.224971 1.230763 1.235333 1.238840 1.242348 3
shift 1.188266 1.234769 1.266852 1.281272 1.306145 1.331018 3
see also:
Is there a _fast_ way to run a rolling regression inside data.table?
R data.table sliding window
add a comment |
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
1) An option is to use zoo::rollapply
sample data:
library(data.table)
set.seed(0L)
sz <- 1e5L
winsz <- 5L
DT <- data.table(ID=sample(letters, sz, replace=TRUE))
sample usage using zoo:rollapply*
:
DT[, numUniq := zoo::rollapplyr(ID, winsz, uniqueN, fill=NA_integer_)]
2) Another option is to write your own windowing:
DT[, numUniq2 := replace(
sapply(1:.N, function(n) uniqueN(ID[max(n-winsz+1, 1L):n])),
.I < winsz,
NA_integer_)]
3) Another option is to use data.table::shift
DT[, numUniq3 := replace(
apply(setDT(shift(ID, 0L:(winsz-1L))), 1L, uniqueN),
1L:.N < winsz,
NA_integer_)]
output:
ID numUniq numUniq2 numUniq3
1: x NA NA NA
2: g NA NA NA
3: j NA NA NA
4: o NA NA NA
5: x 4 4 4
---
99996: k 4 4 4
99997: a 4 4 4
99998: f 4 4 4
99999: z 4 4 4
100000: c 5 5 5
Benchmarking
timing code:
microbenchmark::microbenchmark(
zooRoll=DT[, numUniq := zoo::rollapplyr(ID, winsz, uniqueN, fill=NA)],
sapply=DT[, numUniq2 := replace(
vapply(1L:.N, function(n) uniqueN(ID[max(n-winsz+1L, 1L):n]), integer(1L)),
1L:.N < winsz,
NA_integer_)],
shift=DT[, numUniq3 := replace(
apply(setDT(shift(ID, 0L:(winsz-1L))), 1L, uniqueN),
1L:.N < winsz,
NA_integer_)],
times=3L)
timings:
Unit: seconds
expr min lq mean median uq max neval
zooRoll 1.723915 1.774423 1.837433 1.824931 1.894191 1.963451 3
sapply 1.214608 1.224971 1.230763 1.235333 1.238840 1.242348 3
shift 1.188266 1.234769 1.266852 1.281272 1.306145 1.331018 3
see also:
Is there a _fast_ way to run a rolling regression inside data.table?
R data.table sliding window
add a comment |
1) An option is to use zoo::rollapply
sample data:
library(data.table)
set.seed(0L)
sz <- 1e5L
winsz <- 5L
DT <- data.table(ID=sample(letters, sz, replace=TRUE))
sample usage using zoo:rollapply*
:
DT[, numUniq := zoo::rollapplyr(ID, winsz, uniqueN, fill=NA_integer_)]
2) Another option is to write your own windowing:
DT[, numUniq2 := replace(
sapply(1:.N, function(n) uniqueN(ID[max(n-winsz+1, 1L):n])),
.I < winsz,
NA_integer_)]
3) Another option is to use data.table::shift
DT[, numUniq3 := replace(
apply(setDT(shift(ID, 0L:(winsz-1L))), 1L, uniqueN),
1L:.N < winsz,
NA_integer_)]
output:
ID numUniq numUniq2 numUniq3
1: x NA NA NA
2: g NA NA NA
3: j NA NA NA
4: o NA NA NA
5: x 4 4 4
---
99996: k 4 4 4
99997: a 4 4 4
99998: f 4 4 4
99999: z 4 4 4
100000: c 5 5 5
Benchmarking
timing code:
microbenchmark::microbenchmark(
zooRoll=DT[, numUniq := zoo::rollapplyr(ID, winsz, uniqueN, fill=NA)],
sapply=DT[, numUniq2 := replace(
vapply(1L:.N, function(n) uniqueN(ID[max(n-winsz+1L, 1L):n]), integer(1L)),
1L:.N < winsz,
NA_integer_)],
shift=DT[, numUniq3 := replace(
apply(setDT(shift(ID, 0L:(winsz-1L))), 1L, uniqueN),
1L:.N < winsz,
NA_integer_)],
times=3L)
timings:
Unit: seconds
expr min lq mean median uq max neval
zooRoll 1.723915 1.774423 1.837433 1.824931 1.894191 1.963451 3
sapply 1.214608 1.224971 1.230763 1.235333 1.238840 1.242348 3
shift 1.188266 1.234769 1.266852 1.281272 1.306145 1.331018 3
see also:
Is there a _fast_ way to run a rolling regression inside data.table?
R data.table sliding window
add a comment |
1) An option is to use zoo::rollapply
sample data:
library(data.table)
set.seed(0L)
sz <- 1e5L
winsz <- 5L
DT <- data.table(ID=sample(letters, sz, replace=TRUE))
sample usage using zoo:rollapply*
:
DT[, numUniq := zoo::rollapplyr(ID, winsz, uniqueN, fill=NA_integer_)]
2) Another option is to write your own windowing:
DT[, numUniq2 := replace(
sapply(1:.N, function(n) uniqueN(ID[max(n-winsz+1, 1L):n])),
.I < winsz,
NA_integer_)]
3) Another option is to use data.table::shift
DT[, numUniq3 := replace(
apply(setDT(shift(ID, 0L:(winsz-1L))), 1L, uniqueN),
1L:.N < winsz,
NA_integer_)]
output:
ID numUniq numUniq2 numUniq3
1: x NA NA NA
2: g NA NA NA
3: j NA NA NA
4: o NA NA NA
5: x 4 4 4
---
99996: k 4 4 4
99997: a 4 4 4
99998: f 4 4 4
99999: z 4 4 4
100000: c 5 5 5
Benchmarking
timing code:
microbenchmark::microbenchmark(
zooRoll=DT[, numUniq := zoo::rollapplyr(ID, winsz, uniqueN, fill=NA)],
sapply=DT[, numUniq2 := replace(
vapply(1L:.N, function(n) uniqueN(ID[max(n-winsz+1L, 1L):n]), integer(1L)),
1L:.N < winsz,
NA_integer_)],
shift=DT[, numUniq3 := replace(
apply(setDT(shift(ID, 0L:(winsz-1L))), 1L, uniqueN),
1L:.N < winsz,
NA_integer_)],
times=3L)
timings:
Unit: seconds
expr min lq mean median uq max neval
zooRoll 1.723915 1.774423 1.837433 1.824931 1.894191 1.963451 3
sapply 1.214608 1.224971 1.230763 1.235333 1.238840 1.242348 3
shift 1.188266 1.234769 1.266852 1.281272 1.306145 1.331018 3
see also:
Is there a _fast_ way to run a rolling regression inside data.table?
R data.table sliding window
1) An option is to use zoo::rollapply
sample data:
library(data.table)
set.seed(0L)
sz <- 1e5L
winsz <- 5L
DT <- data.table(ID=sample(letters, sz, replace=TRUE))
sample usage using zoo:rollapply*
:
DT[, numUniq := zoo::rollapplyr(ID, winsz, uniqueN, fill=NA_integer_)]
2) Another option is to write your own windowing:
DT[, numUniq2 := replace(
sapply(1:.N, function(n) uniqueN(ID[max(n-winsz+1, 1L):n])),
.I < winsz,
NA_integer_)]
3) Another option is to use data.table::shift
DT[, numUniq3 := replace(
apply(setDT(shift(ID, 0L:(winsz-1L))), 1L, uniqueN),
1L:.N < winsz,
NA_integer_)]
output:
ID numUniq numUniq2 numUniq3
1: x NA NA NA
2: g NA NA NA
3: j NA NA NA
4: o NA NA NA
5: x 4 4 4
---
99996: k 4 4 4
99997: a 4 4 4
99998: f 4 4 4
99999: z 4 4 4
100000: c 5 5 5
Benchmarking
timing code:
microbenchmark::microbenchmark(
zooRoll=DT[, numUniq := zoo::rollapplyr(ID, winsz, uniqueN, fill=NA)],
sapply=DT[, numUniq2 := replace(
vapply(1L:.N, function(n) uniqueN(ID[max(n-winsz+1L, 1L):n]), integer(1L)),
1L:.N < winsz,
NA_integer_)],
shift=DT[, numUniq3 := replace(
apply(setDT(shift(ID, 0L:(winsz-1L))), 1L, uniqueN),
1L:.N < winsz,
NA_integer_)],
times=3L)
timings:
Unit: seconds
expr min lq mean median uq max neval
zooRoll 1.723915 1.774423 1.837433 1.824931 1.894191 1.963451 3
sapply 1.214608 1.224971 1.230763 1.235333 1.238840 1.242348 3
shift 1.188266 1.234769 1.266852 1.281272 1.306145 1.331018 3
see also:
Is there a _fast_ way to run a rolling regression inside data.table?
R data.table sliding window
edited Nov 19 '18 at 2:51
answered Nov 19 '18 at 1:32
chinsoon12chinsoon12
8,41611119
8,41611119
add a comment |
add a comment |