Shift values in columns up if there are missing values above
I have a data frame like this:
df <- data.frame(id = c("A", "A", "A", "A", "A", "A", "A", "A",
"B", "B", "B", "B", "B", "B"),
var1 = c("100", "200", "300", NA, NA, NA, NA, NA,
"100", "200", "300", NA, NA, NA),
var2 = c("100", NA, NA, "400", "500", "600", NA, NA,
NA, NA, NA, "400", NA, NA),
var3 = c("200", NA, NA, NA, NA, NA, "700", "800",
"500", NA, NA, NA, "500", "600"))
which looks like this:
id var1 var2 var3
A 100 100 200
A 200 <NA> <NA>
A 300 <NA> <NA>
A <NA> 400 <NA>
A <NA> 500 <NA>
A <NA> 600 <NA>
A <NA> <NA> 700
A <NA> <NA> 800
B 100 <NA> 500
B 200 <NA> <NA>
B 300 <NA> <NA>
B <NA> 400 <NA>
B <NA> <NA> 500
B <NA> <NA> 600
I would like to shift values in columns up if there are missing values above (by group). The result should look like this:
id var1 var2 var3
A 100 100 200
A 200 400 700
A 300 500 800
A <NA> 600 <NA>
B 100 400 500
B 200 <NA> 500
B 300 <NA> 600
I have no idea how to do this. Any thoughts?
r dataframe
add a comment |
I have a data frame like this:
df <- data.frame(id = c("A", "A", "A", "A", "A", "A", "A", "A",
"B", "B", "B", "B", "B", "B"),
var1 = c("100", "200", "300", NA, NA, NA, NA, NA,
"100", "200", "300", NA, NA, NA),
var2 = c("100", NA, NA, "400", "500", "600", NA, NA,
NA, NA, NA, "400", NA, NA),
var3 = c("200", NA, NA, NA, NA, NA, "700", "800",
"500", NA, NA, NA, "500", "600"))
which looks like this:
id var1 var2 var3
A 100 100 200
A 200 <NA> <NA>
A 300 <NA> <NA>
A <NA> 400 <NA>
A <NA> 500 <NA>
A <NA> 600 <NA>
A <NA> <NA> 700
A <NA> <NA> 800
B 100 <NA> 500
B 200 <NA> <NA>
B 300 <NA> <NA>
B <NA> 400 <NA>
B <NA> <NA> 500
B <NA> <NA> 600
I would like to shift values in columns up if there are missing values above (by group). The result should look like this:
id var1 var2 var3
A 100 100 200
A 200 400 700
A 300 500 800
A <NA> 600 <NA>
B 100 400 500
B 200 <NA> 500
B 300 <NA> 600
I have no idea how to do this. Any thoughts?
r dataframe
Should "NA" be NA (i.e not a string?)
– Khaynes
Jan 11 at 7:23
1
Edit your question. NA is currently a string.
– NelsonGon
Jan 11 at 7:26
1
Are your numbers really of classcharacter
?
– LAP
Jan 11 at 7:44
add a comment |
I have a data frame like this:
df <- data.frame(id = c("A", "A", "A", "A", "A", "A", "A", "A",
"B", "B", "B", "B", "B", "B"),
var1 = c("100", "200", "300", NA, NA, NA, NA, NA,
"100", "200", "300", NA, NA, NA),
var2 = c("100", NA, NA, "400", "500", "600", NA, NA,
NA, NA, NA, "400", NA, NA),
var3 = c("200", NA, NA, NA, NA, NA, "700", "800",
"500", NA, NA, NA, "500", "600"))
which looks like this:
id var1 var2 var3
A 100 100 200
A 200 <NA> <NA>
A 300 <NA> <NA>
A <NA> 400 <NA>
A <NA> 500 <NA>
A <NA> 600 <NA>
A <NA> <NA> 700
A <NA> <NA> 800
B 100 <NA> 500
B 200 <NA> <NA>
B 300 <NA> <NA>
B <NA> 400 <NA>
B <NA> <NA> 500
B <NA> <NA> 600
I would like to shift values in columns up if there are missing values above (by group). The result should look like this:
id var1 var2 var3
A 100 100 200
A 200 400 700
A 300 500 800
A <NA> 600 <NA>
B 100 400 500
B 200 <NA> 500
B 300 <NA> 600
I have no idea how to do this. Any thoughts?
r dataframe
I have a data frame like this:
df <- data.frame(id = c("A", "A", "A", "A", "A", "A", "A", "A",
"B", "B", "B", "B", "B", "B"),
var1 = c("100", "200", "300", NA, NA, NA, NA, NA,
"100", "200", "300", NA, NA, NA),
var2 = c("100", NA, NA, "400", "500", "600", NA, NA,
NA, NA, NA, "400", NA, NA),
var3 = c("200", NA, NA, NA, NA, NA, "700", "800",
"500", NA, NA, NA, "500", "600"))
which looks like this:
id var1 var2 var3
A 100 100 200
A 200 <NA> <NA>
A 300 <NA> <NA>
A <NA> 400 <NA>
A <NA> 500 <NA>
A <NA> 600 <NA>
A <NA> <NA> 700
A <NA> <NA> 800
B 100 <NA> 500
B 200 <NA> <NA>
B 300 <NA> <NA>
B <NA> 400 <NA>
B <NA> <NA> 500
B <NA> <NA> 600
I would like to shift values in columns up if there are missing values above (by group). The result should look like this:
id var1 var2 var3
A 100 100 200
A 200 400 700
A 300 500 800
A <NA> 600 <NA>
B 100 400 500
B 200 <NA> 500
B 300 <NA> 600
I have no idea how to do this. Any thoughts?
r dataframe
r dataframe
edited Jan 11 at 10:25
Ronak Shah
35.2k103856
35.2k103856
asked Jan 11 at 7:07
RuilinRuilin
442
442
Should "NA" be NA (i.e not a string?)
– Khaynes
Jan 11 at 7:23
1
Edit your question. NA is currently a string.
– NelsonGon
Jan 11 at 7:26
1
Are your numbers really of classcharacter
?
– LAP
Jan 11 at 7:44
add a comment |
Should "NA" be NA (i.e not a string?)
– Khaynes
Jan 11 at 7:23
1
Edit your question. NA is currently a string.
– NelsonGon
Jan 11 at 7:26
1
Are your numbers really of classcharacter
?
– LAP
Jan 11 at 7:44
Should "NA" be NA (i.e not a string?)
– Khaynes
Jan 11 at 7:23
Should "NA" be NA (i.e not a string?)
– Khaynes
Jan 11 at 7:23
1
1
Edit your question. NA is currently a string.
– NelsonGon
Jan 11 at 7:26
Edit your question. NA is currently a string.
– NelsonGon
Jan 11 at 7:26
1
1
Are your numbers really of class
character
?– LAP
Jan 11 at 7:44
Are your numbers really of class
character
?– LAP
Jan 11 at 7:44
add a comment |
4 Answers
4
active
oldest
votes
Here is a rough concept using data.table
that can be refined:
library(data.table)
# Helper function:
shift_up <- function(x) {
n <- length(x)
x <- x[!is.na(x)]
length(x) <- n
x
}
setDT(df)
df[, lapply(.SD, shift_up), id][!(is.na(var1) & is.na(var2) & is.na(var3))]
id var1 var2 var3
1: A 100 100 200
2: A 200 400 700
3: A 300 500 800
4: A <NA> 600 <NA>
5: B 100 400 500
6: B 200 <NA> 500
7: B 300 <NA> 600
Hi snoram! Thanks for helping me with this! Your codes work out well! I just want to know if there is a simple way to write "[!(is.na(var1) & is.na(var2) & is.na(var3))]". In my actual data frame, I have over 30 variables. It would be great if I don't have to write this one by one. Thanks again!
– Ruilin
Jan 11 at 10:20
1
In the last step, specify the.SDcols
and you can use!Reduce('&', .SD)
to create the logical vector
– akrun
Jan 11 at 11:04
Also if it is all the columns except theid
you could do smth like:out <- df[, lapply(.SD, shift_up), id]; out[rowSums(is.na(out[,!"id"])) < (length(out) - 1)]
.
– snoram
Jan 11 at 11:06
add a comment |
Don't think this is the most efficient way to do it but one option
library(rowr)
df1 <- do.call(rbind, lapply(split(df, df$id), function(x) {
data.frame(id = x$id[1], do.call(cbind.fill,c(sapply(x[-1], na.omit),fill = NA)))
}))
names(df1) <- names(df)
df1
# id var1 var2 var3
#A.1 A 100 100 200
#A.2 A 200 400 700
#A.3 A 300 500 800
#A.4 A <NA> 600 <NA>
#B.1 B 100 400 500
#B.2 B 200 <NA> 500
#B.3 B 300 <NA> 600
We split
the dataframe into list of dataframe for every id
and for each dataframe we remove the NA
values using na.omit
and use cbind.fill
to fill the values with NA
and finally merge the list of dataframes back into one using rbind
with do.call
.
Hi Ronak! Thanks for the excellent code! Just wondering if there is a way to keep the column names unchanged? In my actual data frame, the columns names are years; it will be very confusing if I mess it up. Thanks again for your help!
– Ruilin
Jan 11 at 10:13
@Ruilin yes, you can save the output into different object and rename it using the original dataframe. Updated the answer with that change.
– Ronak Shah
Jan 11 at 10:25
add a comment |
Here is an option with data.table
. Convert the 'data.frame' to 'data.table' (setDT(df)
), grouped by 'id', order
the other column based on the NA values, then create an index to remove the rows where all the elements are NA
library(data.table)
df1 <- setDT(df)[, lapply(.SD, function(x) x[order(is.na(x))]), id]
df1[df1[,!Reduce(`&`, lapply(.SD, is.na)), .SDcols = var1:var3]]
# id var1 var2 var3
#1: A 100 100 200
#2: A 200 400 700
#3: A 300 500 800
#4: A <NA> 600 <NA>
#5: B 100 400 500
#6: B 200 <NA> 500
#7: B 300 <NA> 600
Or using the same logic with tidyverse
. Grouped by 'id', change the order
or elements in all other column with mutate_all
by order
ing on the logical vector (is.na(column)
) and keep the rows having at least one non-NA (filter_at
)
library(tidyverse)
df %>%
group_by(id) %>%
mutate_all(funs(.[order(is.na(.))])) %>%
filter_at(vars(var1:var3), any_vars(!is.na(.)))
# A tibble: 7 x 4
# Groups: id [2]
# id var1 var2 var3
# <fct> <fct> <fct> <fct>
#1 A 100 100 200
#2 A 200 400 700
#3 A 300 500 800
#4 A <NA> 600 <NA>
#5 B 100 400 500
#6 B 200 <NA> 500
#7 B 300 <NA> 600
Ordering a vector/column based on logical indexing is simple.
v1 <- c(1:3, NA, 5, NA, 7)
order(is.na(v1)) #gives the index of order
#[1] 1 2 3 5 7 4 6
use that index to change the order of values
v1[order(is.na(v1))]
#[1] 1 2 3 5 7 NA NA
1
Thank you for the beautiful answer. Could you please explain this part `mutate_all(funs(.[order(is.na(.))])) %>%
in this answer
– amrrs
Jan 11 at 11:13
@amrrs Updated the answer. Hopefully, it helps
– akrun
Jan 11 at 11:19
add a comment |
here is a base solution, if your real case doesn't feature factors you can skip the first and last lines :
df <- lapply(df,as.character)
. <- lapply(split(df,df$id),lapply, na.omit)
. <- lapply(., function(x) lapply(x, `length<-`, max(lengths(x[-1]))))
df <- do.call(rbind,lapply(., do.call, what = data.frame))
df <- lapply(df, factor)
# id var1 var2 var3
# A.1 A 100 100 200
# A.2 A 200 400 700
# A.3 A 300 500 800
# A.4 A <NA> 600 <NA>
# B.1 B 100 400 500
# B.2 B 200 <NA> 500
# B.3 B 300 <NA> 600
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f54141835%2fshift-values-in-columns-up-if-there-are-missing-values-above%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
4 Answers
4
active
oldest
votes
4 Answers
4
active
oldest
votes
active
oldest
votes
active
oldest
votes
Here is a rough concept using data.table
that can be refined:
library(data.table)
# Helper function:
shift_up <- function(x) {
n <- length(x)
x <- x[!is.na(x)]
length(x) <- n
x
}
setDT(df)
df[, lapply(.SD, shift_up), id][!(is.na(var1) & is.na(var2) & is.na(var3))]
id var1 var2 var3
1: A 100 100 200
2: A 200 400 700
3: A 300 500 800
4: A <NA> 600 <NA>
5: B 100 400 500
6: B 200 <NA> 500
7: B 300 <NA> 600
Hi snoram! Thanks for helping me with this! Your codes work out well! I just want to know if there is a simple way to write "[!(is.na(var1) & is.na(var2) & is.na(var3))]". In my actual data frame, I have over 30 variables. It would be great if I don't have to write this one by one. Thanks again!
– Ruilin
Jan 11 at 10:20
1
In the last step, specify the.SDcols
and you can use!Reduce('&', .SD)
to create the logical vector
– akrun
Jan 11 at 11:04
Also if it is all the columns except theid
you could do smth like:out <- df[, lapply(.SD, shift_up), id]; out[rowSums(is.na(out[,!"id"])) < (length(out) - 1)]
.
– snoram
Jan 11 at 11:06
add a comment |
Here is a rough concept using data.table
that can be refined:
library(data.table)
# Helper function:
shift_up <- function(x) {
n <- length(x)
x <- x[!is.na(x)]
length(x) <- n
x
}
setDT(df)
df[, lapply(.SD, shift_up), id][!(is.na(var1) & is.na(var2) & is.na(var3))]
id var1 var2 var3
1: A 100 100 200
2: A 200 400 700
3: A 300 500 800
4: A <NA> 600 <NA>
5: B 100 400 500
6: B 200 <NA> 500
7: B 300 <NA> 600
Hi snoram! Thanks for helping me with this! Your codes work out well! I just want to know if there is a simple way to write "[!(is.na(var1) & is.na(var2) & is.na(var3))]". In my actual data frame, I have over 30 variables. It would be great if I don't have to write this one by one. Thanks again!
– Ruilin
Jan 11 at 10:20
1
In the last step, specify the.SDcols
and you can use!Reduce('&', .SD)
to create the logical vector
– akrun
Jan 11 at 11:04
Also if it is all the columns except theid
you could do smth like:out <- df[, lapply(.SD, shift_up), id]; out[rowSums(is.na(out[,!"id"])) < (length(out) - 1)]
.
– snoram
Jan 11 at 11:06
add a comment |
Here is a rough concept using data.table
that can be refined:
library(data.table)
# Helper function:
shift_up <- function(x) {
n <- length(x)
x <- x[!is.na(x)]
length(x) <- n
x
}
setDT(df)
df[, lapply(.SD, shift_up), id][!(is.na(var1) & is.na(var2) & is.na(var3))]
id var1 var2 var3
1: A 100 100 200
2: A 200 400 700
3: A 300 500 800
4: A <NA> 600 <NA>
5: B 100 400 500
6: B 200 <NA> 500
7: B 300 <NA> 600
Here is a rough concept using data.table
that can be refined:
library(data.table)
# Helper function:
shift_up <- function(x) {
n <- length(x)
x <- x[!is.na(x)]
length(x) <- n
x
}
setDT(df)
df[, lapply(.SD, shift_up), id][!(is.na(var1) & is.na(var2) & is.na(var3))]
id var1 var2 var3
1: A 100 100 200
2: A 200 400 700
3: A 300 500 800
4: A <NA> 600 <NA>
5: B 100 400 500
6: B 200 <NA> 500
7: B 300 <NA> 600
answered Jan 11 at 7:45
snoramsnoram
6,692832
6,692832
Hi snoram! Thanks for helping me with this! Your codes work out well! I just want to know if there is a simple way to write "[!(is.na(var1) & is.na(var2) & is.na(var3))]". In my actual data frame, I have over 30 variables. It would be great if I don't have to write this one by one. Thanks again!
– Ruilin
Jan 11 at 10:20
1
In the last step, specify the.SDcols
and you can use!Reduce('&', .SD)
to create the logical vector
– akrun
Jan 11 at 11:04
Also if it is all the columns except theid
you could do smth like:out <- df[, lapply(.SD, shift_up), id]; out[rowSums(is.na(out[,!"id"])) < (length(out) - 1)]
.
– snoram
Jan 11 at 11:06
add a comment |
Hi snoram! Thanks for helping me with this! Your codes work out well! I just want to know if there is a simple way to write "[!(is.na(var1) & is.na(var2) & is.na(var3))]". In my actual data frame, I have over 30 variables. It would be great if I don't have to write this one by one. Thanks again!
– Ruilin
Jan 11 at 10:20
1
In the last step, specify the.SDcols
and you can use!Reduce('&', .SD)
to create the logical vector
– akrun
Jan 11 at 11:04
Also if it is all the columns except theid
you could do smth like:out <- df[, lapply(.SD, shift_up), id]; out[rowSums(is.na(out[,!"id"])) < (length(out) - 1)]
.
– snoram
Jan 11 at 11:06
Hi snoram! Thanks for helping me with this! Your codes work out well! I just want to know if there is a simple way to write "[!(is.na(var1) & is.na(var2) & is.na(var3))]". In my actual data frame, I have over 30 variables. It would be great if I don't have to write this one by one. Thanks again!
– Ruilin
Jan 11 at 10:20
Hi snoram! Thanks for helping me with this! Your codes work out well! I just want to know if there is a simple way to write "[!(is.na(var1) & is.na(var2) & is.na(var3))]". In my actual data frame, I have over 30 variables. It would be great if I don't have to write this one by one. Thanks again!
– Ruilin
Jan 11 at 10:20
1
1
In the last step, specify the
.SDcols
and you can use !Reduce('&', .SD)
to create the logical vector– akrun
Jan 11 at 11:04
In the last step, specify the
.SDcols
and you can use !Reduce('&', .SD)
to create the logical vector– akrun
Jan 11 at 11:04
Also if it is all the columns except the
id
you could do smth like: out <- df[, lapply(.SD, shift_up), id]; out[rowSums(is.na(out[,!"id"])) < (length(out) - 1)]
.– snoram
Jan 11 at 11:06
Also if it is all the columns except the
id
you could do smth like: out <- df[, lapply(.SD, shift_up), id]; out[rowSums(is.na(out[,!"id"])) < (length(out) - 1)]
.– snoram
Jan 11 at 11:06
add a comment |
Don't think this is the most efficient way to do it but one option
library(rowr)
df1 <- do.call(rbind, lapply(split(df, df$id), function(x) {
data.frame(id = x$id[1], do.call(cbind.fill,c(sapply(x[-1], na.omit),fill = NA)))
}))
names(df1) <- names(df)
df1
# id var1 var2 var3
#A.1 A 100 100 200
#A.2 A 200 400 700
#A.3 A 300 500 800
#A.4 A <NA> 600 <NA>
#B.1 B 100 400 500
#B.2 B 200 <NA> 500
#B.3 B 300 <NA> 600
We split
the dataframe into list of dataframe for every id
and for each dataframe we remove the NA
values using na.omit
and use cbind.fill
to fill the values with NA
and finally merge the list of dataframes back into one using rbind
with do.call
.
Hi Ronak! Thanks for the excellent code! Just wondering if there is a way to keep the column names unchanged? In my actual data frame, the columns names are years; it will be very confusing if I mess it up. Thanks again for your help!
– Ruilin
Jan 11 at 10:13
@Ruilin yes, you can save the output into different object and rename it using the original dataframe. Updated the answer with that change.
– Ronak Shah
Jan 11 at 10:25
add a comment |
Don't think this is the most efficient way to do it but one option
library(rowr)
df1 <- do.call(rbind, lapply(split(df, df$id), function(x) {
data.frame(id = x$id[1], do.call(cbind.fill,c(sapply(x[-1], na.omit),fill = NA)))
}))
names(df1) <- names(df)
df1
# id var1 var2 var3
#A.1 A 100 100 200
#A.2 A 200 400 700
#A.3 A 300 500 800
#A.4 A <NA> 600 <NA>
#B.1 B 100 400 500
#B.2 B 200 <NA> 500
#B.3 B 300 <NA> 600
We split
the dataframe into list of dataframe for every id
and for each dataframe we remove the NA
values using na.omit
and use cbind.fill
to fill the values with NA
and finally merge the list of dataframes back into one using rbind
with do.call
.
Hi Ronak! Thanks for the excellent code! Just wondering if there is a way to keep the column names unchanged? In my actual data frame, the columns names are years; it will be very confusing if I mess it up. Thanks again for your help!
– Ruilin
Jan 11 at 10:13
@Ruilin yes, you can save the output into different object and rename it using the original dataframe. Updated the answer with that change.
– Ronak Shah
Jan 11 at 10:25
add a comment |
Don't think this is the most efficient way to do it but one option
library(rowr)
df1 <- do.call(rbind, lapply(split(df, df$id), function(x) {
data.frame(id = x$id[1], do.call(cbind.fill,c(sapply(x[-1], na.omit),fill = NA)))
}))
names(df1) <- names(df)
df1
# id var1 var2 var3
#A.1 A 100 100 200
#A.2 A 200 400 700
#A.3 A 300 500 800
#A.4 A <NA> 600 <NA>
#B.1 B 100 400 500
#B.2 B 200 <NA> 500
#B.3 B 300 <NA> 600
We split
the dataframe into list of dataframe for every id
and for each dataframe we remove the NA
values using na.omit
and use cbind.fill
to fill the values with NA
and finally merge the list of dataframes back into one using rbind
with do.call
.
Don't think this is the most efficient way to do it but one option
library(rowr)
df1 <- do.call(rbind, lapply(split(df, df$id), function(x) {
data.frame(id = x$id[1], do.call(cbind.fill,c(sapply(x[-1], na.omit),fill = NA)))
}))
names(df1) <- names(df)
df1
# id var1 var2 var3
#A.1 A 100 100 200
#A.2 A 200 400 700
#A.3 A 300 500 800
#A.4 A <NA> 600 <NA>
#B.1 B 100 400 500
#B.2 B 200 <NA> 500
#B.3 B 300 <NA> 600
We split
the dataframe into list of dataframe for every id
and for each dataframe we remove the NA
values using na.omit
and use cbind.fill
to fill the values with NA
and finally merge the list of dataframes back into one using rbind
with do.call
.
edited Jan 11 at 10:24
answered Jan 11 at 7:47
Ronak ShahRonak Shah
35.2k103856
35.2k103856
Hi Ronak! Thanks for the excellent code! Just wondering if there is a way to keep the column names unchanged? In my actual data frame, the columns names are years; it will be very confusing if I mess it up. Thanks again for your help!
– Ruilin
Jan 11 at 10:13
@Ruilin yes, you can save the output into different object and rename it using the original dataframe. Updated the answer with that change.
– Ronak Shah
Jan 11 at 10:25
add a comment |
Hi Ronak! Thanks for the excellent code! Just wondering if there is a way to keep the column names unchanged? In my actual data frame, the columns names are years; it will be very confusing if I mess it up. Thanks again for your help!
– Ruilin
Jan 11 at 10:13
@Ruilin yes, you can save the output into different object and rename it using the original dataframe. Updated the answer with that change.
– Ronak Shah
Jan 11 at 10:25
Hi Ronak! Thanks for the excellent code! Just wondering if there is a way to keep the column names unchanged? In my actual data frame, the columns names are years; it will be very confusing if I mess it up. Thanks again for your help!
– Ruilin
Jan 11 at 10:13
Hi Ronak! Thanks for the excellent code! Just wondering if there is a way to keep the column names unchanged? In my actual data frame, the columns names are years; it will be very confusing if I mess it up. Thanks again for your help!
– Ruilin
Jan 11 at 10:13
@Ruilin yes, you can save the output into different object and rename it using the original dataframe. Updated the answer with that change.
– Ronak Shah
Jan 11 at 10:25
@Ruilin yes, you can save the output into different object and rename it using the original dataframe. Updated the answer with that change.
– Ronak Shah
Jan 11 at 10:25
add a comment |
Here is an option with data.table
. Convert the 'data.frame' to 'data.table' (setDT(df)
), grouped by 'id', order
the other column based on the NA values, then create an index to remove the rows where all the elements are NA
library(data.table)
df1 <- setDT(df)[, lapply(.SD, function(x) x[order(is.na(x))]), id]
df1[df1[,!Reduce(`&`, lapply(.SD, is.na)), .SDcols = var1:var3]]
# id var1 var2 var3
#1: A 100 100 200
#2: A 200 400 700
#3: A 300 500 800
#4: A <NA> 600 <NA>
#5: B 100 400 500
#6: B 200 <NA> 500
#7: B 300 <NA> 600
Or using the same logic with tidyverse
. Grouped by 'id', change the order
or elements in all other column with mutate_all
by order
ing on the logical vector (is.na(column)
) and keep the rows having at least one non-NA (filter_at
)
library(tidyverse)
df %>%
group_by(id) %>%
mutate_all(funs(.[order(is.na(.))])) %>%
filter_at(vars(var1:var3), any_vars(!is.na(.)))
# A tibble: 7 x 4
# Groups: id [2]
# id var1 var2 var3
# <fct> <fct> <fct> <fct>
#1 A 100 100 200
#2 A 200 400 700
#3 A 300 500 800
#4 A <NA> 600 <NA>
#5 B 100 400 500
#6 B 200 <NA> 500
#7 B 300 <NA> 600
Ordering a vector/column based on logical indexing is simple.
v1 <- c(1:3, NA, 5, NA, 7)
order(is.na(v1)) #gives the index of order
#[1] 1 2 3 5 7 4 6
use that index to change the order of values
v1[order(is.na(v1))]
#[1] 1 2 3 5 7 NA NA
1
Thank you for the beautiful answer. Could you please explain this part `mutate_all(funs(.[order(is.na(.))])) %>%
in this answer
– amrrs
Jan 11 at 11:13
@amrrs Updated the answer. Hopefully, it helps
– akrun
Jan 11 at 11:19
add a comment |
Here is an option with data.table
. Convert the 'data.frame' to 'data.table' (setDT(df)
), grouped by 'id', order
the other column based on the NA values, then create an index to remove the rows where all the elements are NA
library(data.table)
df1 <- setDT(df)[, lapply(.SD, function(x) x[order(is.na(x))]), id]
df1[df1[,!Reduce(`&`, lapply(.SD, is.na)), .SDcols = var1:var3]]
# id var1 var2 var3
#1: A 100 100 200
#2: A 200 400 700
#3: A 300 500 800
#4: A <NA> 600 <NA>
#5: B 100 400 500
#6: B 200 <NA> 500
#7: B 300 <NA> 600
Or using the same logic with tidyverse
. Grouped by 'id', change the order
or elements in all other column with mutate_all
by order
ing on the logical vector (is.na(column)
) and keep the rows having at least one non-NA (filter_at
)
library(tidyverse)
df %>%
group_by(id) %>%
mutate_all(funs(.[order(is.na(.))])) %>%
filter_at(vars(var1:var3), any_vars(!is.na(.)))
# A tibble: 7 x 4
# Groups: id [2]
# id var1 var2 var3
# <fct> <fct> <fct> <fct>
#1 A 100 100 200
#2 A 200 400 700
#3 A 300 500 800
#4 A <NA> 600 <NA>
#5 B 100 400 500
#6 B 200 <NA> 500
#7 B 300 <NA> 600
Ordering a vector/column based on logical indexing is simple.
v1 <- c(1:3, NA, 5, NA, 7)
order(is.na(v1)) #gives the index of order
#[1] 1 2 3 5 7 4 6
use that index to change the order of values
v1[order(is.na(v1))]
#[1] 1 2 3 5 7 NA NA
1
Thank you for the beautiful answer. Could you please explain this part `mutate_all(funs(.[order(is.na(.))])) %>%
in this answer
– amrrs
Jan 11 at 11:13
@amrrs Updated the answer. Hopefully, it helps
– akrun
Jan 11 at 11:19
add a comment |
Here is an option with data.table
. Convert the 'data.frame' to 'data.table' (setDT(df)
), grouped by 'id', order
the other column based on the NA values, then create an index to remove the rows where all the elements are NA
library(data.table)
df1 <- setDT(df)[, lapply(.SD, function(x) x[order(is.na(x))]), id]
df1[df1[,!Reduce(`&`, lapply(.SD, is.na)), .SDcols = var1:var3]]
# id var1 var2 var3
#1: A 100 100 200
#2: A 200 400 700
#3: A 300 500 800
#4: A <NA> 600 <NA>
#5: B 100 400 500
#6: B 200 <NA> 500
#7: B 300 <NA> 600
Or using the same logic with tidyverse
. Grouped by 'id', change the order
or elements in all other column with mutate_all
by order
ing on the logical vector (is.na(column)
) and keep the rows having at least one non-NA (filter_at
)
library(tidyverse)
df %>%
group_by(id) %>%
mutate_all(funs(.[order(is.na(.))])) %>%
filter_at(vars(var1:var3), any_vars(!is.na(.)))
# A tibble: 7 x 4
# Groups: id [2]
# id var1 var2 var3
# <fct> <fct> <fct> <fct>
#1 A 100 100 200
#2 A 200 400 700
#3 A 300 500 800
#4 A <NA> 600 <NA>
#5 B 100 400 500
#6 B 200 <NA> 500
#7 B 300 <NA> 600
Ordering a vector/column based on logical indexing is simple.
v1 <- c(1:3, NA, 5, NA, 7)
order(is.na(v1)) #gives the index of order
#[1] 1 2 3 5 7 4 6
use that index to change the order of values
v1[order(is.na(v1))]
#[1] 1 2 3 5 7 NA NA
Here is an option with data.table
. Convert the 'data.frame' to 'data.table' (setDT(df)
), grouped by 'id', order
the other column based on the NA values, then create an index to remove the rows where all the elements are NA
library(data.table)
df1 <- setDT(df)[, lapply(.SD, function(x) x[order(is.na(x))]), id]
df1[df1[,!Reduce(`&`, lapply(.SD, is.na)), .SDcols = var1:var3]]
# id var1 var2 var3
#1: A 100 100 200
#2: A 200 400 700
#3: A 300 500 800
#4: A <NA> 600 <NA>
#5: B 100 400 500
#6: B 200 <NA> 500
#7: B 300 <NA> 600
Or using the same logic with tidyverse
. Grouped by 'id', change the order
or elements in all other column with mutate_all
by order
ing on the logical vector (is.na(column)
) and keep the rows having at least one non-NA (filter_at
)
library(tidyverse)
df %>%
group_by(id) %>%
mutate_all(funs(.[order(is.na(.))])) %>%
filter_at(vars(var1:var3), any_vars(!is.na(.)))
# A tibble: 7 x 4
# Groups: id [2]
# id var1 var2 var3
# <fct> <fct> <fct> <fct>
#1 A 100 100 200
#2 A 200 400 700
#3 A 300 500 800
#4 A <NA> 600 <NA>
#5 B 100 400 500
#6 B 200 <NA> 500
#7 B 300 <NA> 600
Ordering a vector/column based on logical indexing is simple.
v1 <- c(1:3, NA, 5, NA, 7)
order(is.na(v1)) #gives the index of order
#[1] 1 2 3 5 7 4 6
use that index to change the order of values
v1[order(is.na(v1))]
#[1] 1 2 3 5 7 NA NA
edited Jan 11 at 11:19
answered Jan 11 at 10:26
akrunakrun
402k13193266
402k13193266
1
Thank you for the beautiful answer. Could you please explain this part `mutate_all(funs(.[order(is.na(.))])) %>%
in this answer
– amrrs
Jan 11 at 11:13
@amrrs Updated the answer. Hopefully, it helps
– akrun
Jan 11 at 11:19
add a comment |
1
Thank you for the beautiful answer. Could you please explain this part `mutate_all(funs(.[order(is.na(.))])) %>%
in this answer
– amrrs
Jan 11 at 11:13
@amrrs Updated the answer. Hopefully, it helps
– akrun
Jan 11 at 11:19
1
1
Thank you for the beautiful answer. Could you please explain this part `
mutate_all(funs(.[order(is.na(.))])) %>%
in this answer– amrrs
Jan 11 at 11:13
Thank you for the beautiful answer. Could you please explain this part `
mutate_all(funs(.[order(is.na(.))])) %>%
in this answer– amrrs
Jan 11 at 11:13
@amrrs Updated the answer. Hopefully, it helps
– akrun
Jan 11 at 11:19
@amrrs Updated the answer. Hopefully, it helps
– akrun
Jan 11 at 11:19
add a comment |
here is a base solution, if your real case doesn't feature factors you can skip the first and last lines :
df <- lapply(df,as.character)
. <- lapply(split(df,df$id),lapply, na.omit)
. <- lapply(., function(x) lapply(x, `length<-`, max(lengths(x[-1]))))
df <- do.call(rbind,lapply(., do.call, what = data.frame))
df <- lapply(df, factor)
# id var1 var2 var3
# A.1 A 100 100 200
# A.2 A 200 400 700
# A.3 A 300 500 800
# A.4 A <NA> 600 <NA>
# B.1 B 100 400 500
# B.2 B 200 <NA> 500
# B.3 B 300 <NA> 600
add a comment |
here is a base solution, if your real case doesn't feature factors you can skip the first and last lines :
df <- lapply(df,as.character)
. <- lapply(split(df,df$id),lapply, na.omit)
. <- lapply(., function(x) lapply(x, `length<-`, max(lengths(x[-1]))))
df <- do.call(rbind,lapply(., do.call, what = data.frame))
df <- lapply(df, factor)
# id var1 var2 var3
# A.1 A 100 100 200
# A.2 A 200 400 700
# A.3 A 300 500 800
# A.4 A <NA> 600 <NA>
# B.1 B 100 400 500
# B.2 B 200 <NA> 500
# B.3 B 300 <NA> 600
add a comment |
here is a base solution, if your real case doesn't feature factors you can skip the first and last lines :
df <- lapply(df,as.character)
. <- lapply(split(df,df$id),lapply, na.omit)
. <- lapply(., function(x) lapply(x, `length<-`, max(lengths(x[-1]))))
df <- do.call(rbind,lapply(., do.call, what = data.frame))
df <- lapply(df, factor)
# id var1 var2 var3
# A.1 A 100 100 200
# A.2 A 200 400 700
# A.3 A 300 500 800
# A.4 A <NA> 600 <NA>
# B.1 B 100 400 500
# B.2 B 200 <NA> 500
# B.3 B 300 <NA> 600
here is a base solution, if your real case doesn't feature factors you can skip the first and last lines :
df <- lapply(df,as.character)
. <- lapply(split(df,df$id),lapply, na.omit)
. <- lapply(., function(x) lapply(x, `length<-`, max(lengths(x[-1]))))
df <- do.call(rbind,lapply(., do.call, what = data.frame))
df <- lapply(df, factor)
# id var1 var2 var3
# A.1 A 100 100 200
# A.2 A 200 400 700
# A.3 A 300 500 800
# A.4 A <NA> 600 <NA>
# B.1 B 100 400 500
# B.2 B 200 <NA> 500
# B.3 B 300 <NA> 600
answered Jan 14 at 13:12
Moody_MudskipperMoody_Mudskipper
21.8k32864
21.8k32864
add a comment |
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f54141835%2fshift-values-in-columns-up-if-there-are-missing-values-above%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Should "NA" be NA (i.e not a string?)
– Khaynes
Jan 11 at 7:23
1
Edit your question. NA is currently a string.
– NelsonGon
Jan 11 at 7:26
1
Are your numbers really of class
character
?– LAP
Jan 11 at 7:44