R - Block sampling: generate new unique IDs after sampling?

I have data that are grouped into blocks, or clusters. I would like to generate a number of bootstrap samples for model evaluation with this data, where the blocks/clusters are sampled with replacement. However, this puts me in a bit of a dilemma when it comes to the analysis portion, because I have repeats of the block/cluster identifier.

For example, say my data looks like this:

set.seed(1)

test <- data.frame(block = rep(1:10, each = 5), matrix(rnorm(150), ncol = 3))

In practice I will be performing a number of bootstrap samples, but for didactic purposes let's say I only want a single new dataset, where I have randomly selected IDs with replacement from the original dataset, above, as follows:

test <- as.data.table(test)

setkey(test, 'block')

random.block <- sample(unique(test$block), size=10, replace=TRUE)

random.sample <- test[J(random.block), allow.cartesian=TRUE]

This works as intended: it creates a new dataset of the same size as the original dataset, but where the blocks have been randomly sampled with replacement.

The problem is this: in the original dataset, each block has only 5 observations (in my real dataset, the number of observations for block is variable, for the record). In the new dataset, while each block has only 5 observations, since I have sampled with replacement I now have multiple blocks with the same ID number.

In the new dataset, if I try to run any sort of analysis that is stratified or contingent upon on the block identification number (e.g. something as simple as the average of the X variables per block, or more complicated analyses like a mixed model with a random effect on block), it treats the repetitions of a block ID as a single block. So instead of, say, 3 different blocks of size 5, it gives me one block of size 15. This can have profound effects on the analysis, not to mention the interpretation of any results.

The question I have: how could I go about assigning a new unique block ID in my randomly sampled dataset? Such that after I have sampled with replacement, each sample of each block has a unique identifier, so that in my final analysis they would be treated as separate blocks rather than a single larger block? I can think of ad hoc ways of doing this (e.g. if each block has the same number of observations), but nothing simple or generalizable.

asked Nov 20 '18 at 19:24

Ryan Simmons

42821426

add a comment |

For example, say my data looks like this:

set.seed(1)

test <- data.frame(block = rep(1:10, each = 5), matrix(rnorm(150), ncol = 3))

test <- as.data.table(test)

setkey(test, 'block')

random.block <- sample(unique(test$block), size=10, replace=TRUE)

random.sample <- test[J(random.block), allow.cartesian=TRUE]

This works as intended: it creates a new dataset of the same size as the original dataset, but where the blocks have been randomly sampled with replacement.

asked Nov 20 '18 at 19:24

Ryan Simmons

42821426

add a comment |

For example, say my data looks like this:

set.seed(1)

test <- data.frame(block = rep(1:10, each = 5), matrix(rnorm(150), ncol = 3))

test <- as.data.table(test)

setkey(test, 'block')

random.block <- sample(unique(test$block), size=10, replace=TRUE)

random.sample <- test[J(random.block), allow.cartesian=TRUE]

This works as intended: it creates a new dataset of the same size as the original dataset, but where the blocks have been randomly sampled with replacement.

asked Nov 20 '18 at 19:24

Ryan Simmons

42821426

For example, say my data looks like this:

set.seed(1)

test <- data.frame(block = rep(1:10, each = 5), matrix(rnorm(150), ncol = 3))

test <- as.data.table(test)

setkey(test, 'block')

random.block <- sample(unique(test$block), size=10, replace=TRUE)

random.sample <- test[J(random.block), allow.cartesian=TRUE]

This works as intended: it creates a new dataset of the same size as the original dataset, but where the blocks have been randomly sampled with replacement.

asked Nov 20 '18 at 19:24

Ryan Simmons

42821426

asked Nov 20 '18 at 19:24

Ryan Simmons

42821426

asked Nov 20 '18 at 19:24

Ryan Simmons

42821426

asked Nov 20 '18 at 19:24

Ryan Simmons

42821426

asked Nov 20 '18 at 19:24

Ryan Simmons

42821426

add a comment |

1 Answer
1

active

oldest

votes

I think the best way would be to create a data.table with an index based on the key. You can then merge based on the key:

set.seed(1)

test <- data.frame(block = rep(1:10, each = 5), matrix(rnorm(150), ncol = 3))

test

test <- as.data.table(test)

setkey(test, 'block')

random.block <- sample(unique(test$block), size=10, replace=TRUE)

random.sample.orig <- test[J(random.block), allow.cartesian=TRUE]

So instead of just using the vector you create a table with an index id:

rand.tab <- data.table(block=random.block,id=1:length(random.block))

And then merge with the test and call the id the block (if you need to):

random.sample <- test[J(rand.tab), allow.cartesian=TRUE]



random.sample[,block := id]

random.sample[,id := NULL]

To prove it is the same as your original version:

all(random.sample$X1 == random.sample.orig$X1 & 

  random.sample$X2 == random.sample.orig$X2 & 

  random.sample$X3 == random.sample.orig$X3)

answered Nov 20 '18 at 20:14

rookie

863

Nice solution! Simple and elegant. I had figured the solution would look something like that, but hadn't progressed beyond a rather messy/crude do loop. Nice that the solution is in data.table. Thanks!

– Ryan Simmons
Nov 21 '18 at 1:01

My first selected answer :) Yes the data.table package is excellent.

– rookie
Nov 21 '18 at 10:23

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53400145%2fr-block-sampling-generate-new-unique-ids-after-sampling%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

I think the best way would be to create a data.table with an index based on the key. You can then merge based on the key:

set.seed(1)

test <- data.frame(block = rep(1:10, each = 5), matrix(rnorm(150), ncol = 3))

test

test <- as.data.table(test)

setkey(test, 'block')

random.block <- sample(unique(test$block), size=10, replace=TRUE)

random.sample.orig <- test[J(random.block), allow.cartesian=TRUE]

So instead of just using the vector you create a table with an index id:

rand.tab <- data.table(block=random.block,id=1:length(random.block))

And then merge with the test and call the id the block (if you need to):

random.sample <- test[J(rand.tab), allow.cartesian=TRUE]



random.sample[,block := id]

random.sample[,id := NULL]

To prove it is the same as your original version:

all(random.sample$X1 == random.sample.orig$X1 & 

  random.sample$X2 == random.sample.orig$X2 & 

  random.sample$X3 == random.sample.orig$X3)

answered Nov 20 '18 at 20:14

rookie

863

Nice solution! Simple and elegant. I had figured the solution would look something like that, but hadn't progressed beyond a rather messy/crude do loop. Nice that the solution is in data.table. Thanks!

– Ryan Simmons
Nov 21 '18 at 1:01

My first selected answer :) Yes the data.table package is excellent.

– rookie
Nov 21 '18 at 10:23

add a comment |

I think the best way would be to create a data.table with an index based on the key. You can then merge based on the key:

set.seed(1)

test <- data.frame(block = rep(1:10, each = 5), matrix(rnorm(150), ncol = 3))

test

test <- as.data.table(test)

setkey(test, 'block')

random.block <- sample(unique(test$block), size=10, replace=TRUE)

random.sample.orig <- test[J(random.block), allow.cartesian=TRUE]

So instead of just using the vector you create a table with an index id:

rand.tab <- data.table(block=random.block,id=1:length(random.block))

And then merge with the test and call the id the block (if you need to):

random.sample <- test[J(rand.tab), allow.cartesian=TRUE]



random.sample[,block := id]

random.sample[,id := NULL]

To prove it is the same as your original version:

all(random.sample$X1 == random.sample.orig$X1 & 

  random.sample$X2 == random.sample.orig$X2 & 

  random.sample$X3 == random.sample.orig$X3)

answered Nov 20 '18 at 20:14

rookie

863

Nice solution! Simple and elegant. I had figured the solution would look something like that, but hadn't progressed beyond a rather messy/crude do loop. Nice that the solution is in data.table. Thanks!

– Ryan Simmons
Nov 21 '18 at 1:01

My first selected answer :) Yes the data.table package is excellent.

– rookie
Nov 21 '18 at 10:23

add a comment |

I think the best way would be to create a data.table with an index based on the key. You can then merge based on the key:

set.seed(1)

test <- data.frame(block = rep(1:10, each = 5), matrix(rnorm(150), ncol = 3))

test

test <- as.data.table(test)

setkey(test, 'block')

random.block <- sample(unique(test$block), size=10, replace=TRUE)

random.sample.orig <- test[J(random.block), allow.cartesian=TRUE]

So instead of just using the vector you create a table with an index id:

rand.tab <- data.table(block=random.block,id=1:length(random.block))

And then merge with the test and call the id the block (if you need to):

random.sample <- test[J(rand.tab), allow.cartesian=TRUE]



random.sample[,block := id]

random.sample[,id := NULL]

To prove it is the same as your original version:

all(random.sample$X1 == random.sample.orig$X1 & 

  random.sample$X2 == random.sample.orig$X2 & 

  random.sample$X3 == random.sample.orig$X3)

answered Nov 20 '18 at 20:14

rookie

863

I think the best way would be to create a data.table with an index based on the key. You can then merge based on the key:

set.seed(1)

test <- data.frame(block = rep(1:10, each = 5), matrix(rnorm(150), ncol = 3))

test

test <- as.data.table(test)

setkey(test, 'block')

random.block <- sample(unique(test$block), size=10, replace=TRUE)

random.sample.orig <- test[J(random.block), allow.cartesian=TRUE]

So instead of just using the vector you create a table with an index id:

rand.tab <- data.table(block=random.block,id=1:length(random.block))

And then merge with the test and call the id the block (if you need to):

random.sample <- test[J(rand.tab), allow.cartesian=TRUE]



random.sample[,block := id]

random.sample[,id := NULL]

To prove it is the same as your original version:

all(random.sample$X1 == random.sample.orig$X1 & 

  random.sample$X2 == random.sample.orig$X2 & 

  random.sample$X3 == random.sample.orig$X3)

answered Nov 20 '18 at 20:14

rookie

863

answered Nov 20 '18 at 20:14

rookie

863

answered Nov 20 '18 at 20:14

rookie

863

answered Nov 20 '18 at 20:14

rookie

863

Nice solution! Simple and elegant. I had figured the solution would look something like that, but hadn't progressed beyond a rather messy/crude do loop. Nice that the solution is in data.table. Thanks!

– Ryan Simmons
Nov 21 '18 at 1:01

My first selected answer :) Yes the data.table package is excellent.

– rookie
Nov 21 '18 at 10:23

add a comment |

Nice solution! Simple and elegant. I had figured the solution would look something like that, but hadn't progressed beyond a rather messy/crude do loop. Nice that the solution is in data.table. Thanks!

– Ryan Simmons
Nov 21 '18 at 1:01

My first selected answer :) Yes the data.table package is excellent.

– rookie
Nov 21 '18 at 10:23

Nice solution! Simple and elegant. I had figured the solution would look something like that, but hadn't progressed beyond a rather messy/crude do loop. Nice that the solution is in data.table. Thanks!

– Ryan Simmons
Nov 21 '18 at 1:01

My first selected answer :) Yes the data.table package is excellent.

– rookie
Nov 21 '18 at 10:23

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Cfrgtkky