Shift values in columns up if there are missing values above












8















I have a data frame like this:



df <- data.frame(id = c("A", "A", "A", "A", "A", "A", "A", "A", 
"B", "B", "B", "B", "B", "B"),
var1 = c("100", "200", "300", NA, NA, NA, NA, NA,
"100", "200", "300", NA, NA, NA),
var2 = c("100", NA, NA, "400", "500", "600", NA, NA,
NA, NA, NA, "400", NA, NA),
var3 = c("200", NA, NA, NA, NA, NA, "700", "800",
"500", NA, NA, NA, "500", "600"))


which looks like this:



  id var1 var2 var3
A 100 100 200
A 200 <NA> <NA>
A 300 <NA> <NA>
A <NA> 400 <NA>
A <NA> 500 <NA>
A <NA> 600 <NA>
A <NA> <NA> 700
A <NA> <NA> 800
B 100 <NA> 500
B 200 <NA> <NA>
B 300 <NA> <NA>
B <NA> 400 <NA>
B <NA> <NA> 500
B <NA> <NA> 600


I would like to shift values in columns up if there are missing values above (by group). The result should look like this:



  id var1 var2 var3
A 100 100 200
A 200 400 700
A 300 500 800
A <NA> 600 <NA>
B 100 400 500
B 200 <NA> 500
B 300 <NA> 600


I have no idea how to do this. Any thoughts?










share|improve this question

























  • Should "NA" be NA (i.e not a string?)

    – Khaynes
    Jan 11 at 7:23








  • 1





    Edit your question. NA is currently a string.

    – NelsonGon
    Jan 11 at 7:26






  • 1





    Are your numbers really of class character?

    – LAP
    Jan 11 at 7:44
















8















I have a data frame like this:



df <- data.frame(id = c("A", "A", "A", "A", "A", "A", "A", "A", 
"B", "B", "B", "B", "B", "B"),
var1 = c("100", "200", "300", NA, NA, NA, NA, NA,
"100", "200", "300", NA, NA, NA),
var2 = c("100", NA, NA, "400", "500", "600", NA, NA,
NA, NA, NA, "400", NA, NA),
var3 = c("200", NA, NA, NA, NA, NA, "700", "800",
"500", NA, NA, NA, "500", "600"))


which looks like this:



  id var1 var2 var3
A 100 100 200
A 200 <NA> <NA>
A 300 <NA> <NA>
A <NA> 400 <NA>
A <NA> 500 <NA>
A <NA> 600 <NA>
A <NA> <NA> 700
A <NA> <NA> 800
B 100 <NA> 500
B 200 <NA> <NA>
B 300 <NA> <NA>
B <NA> 400 <NA>
B <NA> <NA> 500
B <NA> <NA> 600


I would like to shift values in columns up if there are missing values above (by group). The result should look like this:



  id var1 var2 var3
A 100 100 200
A 200 400 700
A 300 500 800
A <NA> 600 <NA>
B 100 400 500
B 200 <NA> 500
B 300 <NA> 600


I have no idea how to do this. Any thoughts?










share|improve this question

























  • Should "NA" be NA (i.e not a string?)

    – Khaynes
    Jan 11 at 7:23








  • 1





    Edit your question. NA is currently a string.

    – NelsonGon
    Jan 11 at 7:26






  • 1





    Are your numbers really of class character?

    – LAP
    Jan 11 at 7:44














8












8








8








I have a data frame like this:



df <- data.frame(id = c("A", "A", "A", "A", "A", "A", "A", "A", 
"B", "B", "B", "B", "B", "B"),
var1 = c("100", "200", "300", NA, NA, NA, NA, NA,
"100", "200", "300", NA, NA, NA),
var2 = c("100", NA, NA, "400", "500", "600", NA, NA,
NA, NA, NA, "400", NA, NA),
var3 = c("200", NA, NA, NA, NA, NA, "700", "800",
"500", NA, NA, NA, "500", "600"))


which looks like this:



  id var1 var2 var3
A 100 100 200
A 200 <NA> <NA>
A 300 <NA> <NA>
A <NA> 400 <NA>
A <NA> 500 <NA>
A <NA> 600 <NA>
A <NA> <NA> 700
A <NA> <NA> 800
B 100 <NA> 500
B 200 <NA> <NA>
B 300 <NA> <NA>
B <NA> 400 <NA>
B <NA> <NA> 500
B <NA> <NA> 600


I would like to shift values in columns up if there are missing values above (by group). The result should look like this:



  id var1 var2 var3
A 100 100 200
A 200 400 700
A 300 500 800
A <NA> 600 <NA>
B 100 400 500
B 200 <NA> 500
B 300 <NA> 600


I have no idea how to do this. Any thoughts?










share|improve this question
















I have a data frame like this:



df <- data.frame(id = c("A", "A", "A", "A", "A", "A", "A", "A", 
"B", "B", "B", "B", "B", "B"),
var1 = c("100", "200", "300", NA, NA, NA, NA, NA,
"100", "200", "300", NA, NA, NA),
var2 = c("100", NA, NA, "400", "500", "600", NA, NA,
NA, NA, NA, "400", NA, NA),
var3 = c("200", NA, NA, NA, NA, NA, "700", "800",
"500", NA, NA, NA, "500", "600"))


which looks like this:



  id var1 var2 var3
A 100 100 200
A 200 <NA> <NA>
A 300 <NA> <NA>
A <NA> 400 <NA>
A <NA> 500 <NA>
A <NA> 600 <NA>
A <NA> <NA> 700
A <NA> <NA> 800
B 100 <NA> 500
B 200 <NA> <NA>
B 300 <NA> <NA>
B <NA> 400 <NA>
B <NA> <NA> 500
B <NA> <NA> 600


I would like to shift values in columns up if there are missing values above (by group). The result should look like this:



  id var1 var2 var3
A 100 100 200
A 200 400 700
A 300 500 800
A <NA> 600 <NA>
B 100 400 500
B 200 <NA> 500
B 300 <NA> 600


I have no idea how to do this. Any thoughts?







r dataframe






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Jan 11 at 10:25









Ronak Shah

35.2k103856




35.2k103856










asked Jan 11 at 7:07









RuilinRuilin

442




442













  • Should "NA" be NA (i.e not a string?)

    – Khaynes
    Jan 11 at 7:23








  • 1





    Edit your question. NA is currently a string.

    – NelsonGon
    Jan 11 at 7:26






  • 1





    Are your numbers really of class character?

    – LAP
    Jan 11 at 7:44



















  • Should "NA" be NA (i.e not a string?)

    – Khaynes
    Jan 11 at 7:23








  • 1





    Edit your question. NA is currently a string.

    – NelsonGon
    Jan 11 at 7:26






  • 1





    Are your numbers really of class character?

    – LAP
    Jan 11 at 7:44

















Should "NA" be NA (i.e not a string?)

– Khaynes
Jan 11 at 7:23







Should "NA" be NA (i.e not a string?)

– Khaynes
Jan 11 at 7:23






1




1





Edit your question. NA is currently a string.

– NelsonGon
Jan 11 at 7:26





Edit your question. NA is currently a string.

– NelsonGon
Jan 11 at 7:26




1




1





Are your numbers really of class character?

– LAP
Jan 11 at 7:44





Are your numbers really of class character?

– LAP
Jan 11 at 7:44












4 Answers
4






active

oldest

votes


















5














Here is a rough concept using data.table that can be refined:



library(data.table)
# Helper function:
shift_up <- function(x) {
n <- length(x)
x <- x[!is.na(x)]
length(x) <- n
x
}

setDT(df)
df[, lapply(.SD, shift_up), id][!(is.na(var1) & is.na(var2) & is.na(var3))]

id var1 var2 var3
1: A 100 100 200
2: A 200 400 700
3: A 300 500 800
4: A <NA> 600 <NA>
5: B 100 400 500
6: B 200 <NA> 500
7: B 300 <NA> 600





share|improve this answer
























  • Hi snoram! Thanks for helping me with this! Your codes work out well! I just want to know if there is a simple way to write "[!(is.na(var1) & is.na(var2) & is.na(var3))]". In my actual data frame, I have over 30 variables. It would be great if I don't have to write this one by one. Thanks again!

    – Ruilin
    Jan 11 at 10:20






  • 1





    In the last step, specify the .SDcols and you can use !Reduce('&', .SD) to create the logical vector

    – akrun
    Jan 11 at 11:04











  • Also if it is all the columns except the id you could do smth like: out <- df[, lapply(.SD, shift_up), id]; out[rowSums(is.na(out[,!"id"])) < (length(out) - 1)].

    – snoram
    Jan 11 at 11:06



















4














Don't think this is the most efficient way to do it but one option



library(rowr)

df1 <- do.call(rbind, lapply(split(df, df$id), function(x) {
data.frame(id = x$id[1], do.call(cbind.fill,c(sapply(x[-1], na.omit),fill = NA)))
}))
names(df1) <- names(df)
df1


# id var1 var2 var3
#A.1 A 100 100 200
#A.2 A 200 400 700
#A.3 A 300 500 800
#A.4 A <NA> 600 <NA>
#B.1 B 100 400 500
#B.2 B 200 <NA> 500
#B.3 B 300 <NA> 600


We split the dataframe into list of dataframe for every id and for each dataframe we remove the NA values using na.omit and use cbind.fill to fill the values with NA and finally merge the list of dataframes back into one using rbind with do.call.






share|improve this answer


























  • Hi Ronak! Thanks for the excellent code! Just wondering if there is a way to keep the column names unchanged? In my actual data frame, the columns names are years; it will be very confusing if I mess it up. Thanks again for your help!

    – Ruilin
    Jan 11 at 10:13











  • @Ruilin yes, you can save the output into different object and rename it using the original dataframe. Updated the answer with that change.

    – Ronak Shah
    Jan 11 at 10:25





















3














Here is an option with data.table. Convert the 'data.frame' to 'data.table' (setDT(df)), grouped by 'id', order the other column based on the NA values, then create an index to remove the rows where all the elements are NA



library(data.table)
df1 <- setDT(df)[, lapply(.SD, function(x) x[order(is.na(x))]), id]
df1[df1[,!Reduce(`&`, lapply(.SD, is.na)), .SDcols = var1:var3]]
# id var1 var2 var3
#1: A 100 100 200
#2: A 200 400 700
#3: A 300 500 800
#4: A <NA> 600 <NA>
#5: B 100 400 500
#6: B 200 <NA> 500
#7: B 300 <NA> 600




Or using the same logic with tidyverse. Grouped by 'id', change the order or elements in all other column with mutate_all by ordering on the logical vector (is.na(column)) and keep the rows having at least one non-NA (filter_at)



library(tidyverse)
df %>%
group_by(id) %>%
mutate_all(funs(.[order(is.na(.))])) %>%
filter_at(vars(var1:var3), any_vars(!is.na(.)))
# A tibble: 7 x 4
# Groups: id [2]
# id var1 var2 var3
# <fct> <fct> <fct> <fct>
#1 A 100 100 200
#2 A 200 400 700
#3 A 300 500 800
#4 A <NA> 600 <NA>
#5 B 100 400 500
#6 B 200 <NA> 500
#7 B 300 <NA> 600




Ordering a vector/column based on logical indexing is simple.



v1 <- c(1:3, NA, 5, NA, 7)
order(is.na(v1)) #gives the index of order
#[1] 1 2 3 5 7 4 6


use that index to change the order of values



v1[order(is.na(v1))]
#[1] 1 2 3 5 7 NA NA





share|improve this answer





















  • 1





    Thank you for the beautiful answer. Could you please explain this part ` mutate_all(funs(.[order(is.na(.))])) %>% in this answer

    – amrrs
    Jan 11 at 11:13













  • @amrrs Updated the answer. Hopefully, it helps

    – akrun
    Jan 11 at 11:19



















0














here is a base solution, if your real case doesn't feature factors you can skip the first and last lines :



df <- lapply(df,as.character)
. <- lapply(split(df,df$id),lapply, na.omit)
. <- lapply(., function(x) lapply(x, `length<-`, max(lengths(x[-1]))))
df <- do.call(rbind,lapply(., do.call, what = data.frame))
df <- lapply(df, factor)

# id var1 var2 var3
# A.1 A 100 100 200
# A.2 A 200 400 700
# A.3 A 300 500 800
# A.4 A <NA> 600 <NA>
# B.1 B 100 400 500
# B.2 B 200 <NA> 500
# B.3 B 300 <NA> 600





share|improve this answer























    Your Answer






    StackExchange.ifUsing("editor", function () {
    StackExchange.using("externalEditor", function () {
    StackExchange.using("snippets", function () {
    StackExchange.snippets.init();
    });
    });
    }, "code-snippets");

    StackExchange.ready(function() {
    var channelOptions = {
    tags: "".split(" "),
    id: "1"
    };
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function() {
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled) {
    StackExchange.using("snippets", function() {
    createEditor();
    });
    }
    else {
    createEditor();
    }
    });

    function createEditor() {
    StackExchange.prepareEditor({
    heartbeatType: 'answer',
    autoActivateHeartbeat: false,
    convertImagesToLinks: true,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: 10,
    bindNavPrevention: true,
    postfix: "",
    imageUploader: {
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    },
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    });


    }
    });














    draft saved

    draft discarded


















    StackExchange.ready(
    function () {
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f54141835%2fshift-values-in-columns-up-if-there-are-missing-values-above%23new-answer', 'question_page');
    }
    );

    Post as a guest















    Required, but never shown

























    4 Answers
    4






    active

    oldest

    votes








    4 Answers
    4






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes









    5














    Here is a rough concept using data.table that can be refined:



    library(data.table)
    # Helper function:
    shift_up <- function(x) {
    n <- length(x)
    x <- x[!is.na(x)]
    length(x) <- n
    x
    }

    setDT(df)
    df[, lapply(.SD, shift_up), id][!(is.na(var1) & is.na(var2) & is.na(var3))]

    id var1 var2 var3
    1: A 100 100 200
    2: A 200 400 700
    3: A 300 500 800
    4: A <NA> 600 <NA>
    5: B 100 400 500
    6: B 200 <NA> 500
    7: B 300 <NA> 600





    share|improve this answer
























    • Hi snoram! Thanks for helping me with this! Your codes work out well! I just want to know if there is a simple way to write "[!(is.na(var1) & is.na(var2) & is.na(var3))]". In my actual data frame, I have over 30 variables. It would be great if I don't have to write this one by one. Thanks again!

      – Ruilin
      Jan 11 at 10:20






    • 1





      In the last step, specify the .SDcols and you can use !Reduce('&', .SD) to create the logical vector

      – akrun
      Jan 11 at 11:04











    • Also if it is all the columns except the id you could do smth like: out <- df[, lapply(.SD, shift_up), id]; out[rowSums(is.na(out[,!"id"])) < (length(out) - 1)].

      – snoram
      Jan 11 at 11:06
















    5














    Here is a rough concept using data.table that can be refined:



    library(data.table)
    # Helper function:
    shift_up <- function(x) {
    n <- length(x)
    x <- x[!is.na(x)]
    length(x) <- n
    x
    }

    setDT(df)
    df[, lapply(.SD, shift_up), id][!(is.na(var1) & is.na(var2) & is.na(var3))]

    id var1 var2 var3
    1: A 100 100 200
    2: A 200 400 700
    3: A 300 500 800
    4: A <NA> 600 <NA>
    5: B 100 400 500
    6: B 200 <NA> 500
    7: B 300 <NA> 600





    share|improve this answer
























    • Hi snoram! Thanks for helping me with this! Your codes work out well! I just want to know if there is a simple way to write "[!(is.na(var1) & is.na(var2) & is.na(var3))]". In my actual data frame, I have over 30 variables. It would be great if I don't have to write this one by one. Thanks again!

      – Ruilin
      Jan 11 at 10:20






    • 1





      In the last step, specify the .SDcols and you can use !Reduce('&', .SD) to create the logical vector

      – akrun
      Jan 11 at 11:04











    • Also if it is all the columns except the id you could do smth like: out <- df[, lapply(.SD, shift_up), id]; out[rowSums(is.na(out[,!"id"])) < (length(out) - 1)].

      – snoram
      Jan 11 at 11:06














    5












    5








    5







    Here is a rough concept using data.table that can be refined:



    library(data.table)
    # Helper function:
    shift_up <- function(x) {
    n <- length(x)
    x <- x[!is.na(x)]
    length(x) <- n
    x
    }

    setDT(df)
    df[, lapply(.SD, shift_up), id][!(is.na(var1) & is.na(var2) & is.na(var3))]

    id var1 var2 var3
    1: A 100 100 200
    2: A 200 400 700
    3: A 300 500 800
    4: A <NA> 600 <NA>
    5: B 100 400 500
    6: B 200 <NA> 500
    7: B 300 <NA> 600





    share|improve this answer













    Here is a rough concept using data.table that can be refined:



    library(data.table)
    # Helper function:
    shift_up <- function(x) {
    n <- length(x)
    x <- x[!is.na(x)]
    length(x) <- n
    x
    }

    setDT(df)
    df[, lapply(.SD, shift_up), id][!(is.na(var1) & is.na(var2) & is.na(var3))]

    id var1 var2 var3
    1: A 100 100 200
    2: A 200 400 700
    3: A 300 500 800
    4: A <NA> 600 <NA>
    5: B 100 400 500
    6: B 200 <NA> 500
    7: B 300 <NA> 600






    share|improve this answer












    share|improve this answer



    share|improve this answer










    answered Jan 11 at 7:45









    snoramsnoram

    6,692832




    6,692832













    • Hi snoram! Thanks for helping me with this! Your codes work out well! I just want to know if there is a simple way to write "[!(is.na(var1) & is.na(var2) & is.na(var3))]". In my actual data frame, I have over 30 variables. It would be great if I don't have to write this one by one. Thanks again!

      – Ruilin
      Jan 11 at 10:20






    • 1





      In the last step, specify the .SDcols and you can use !Reduce('&', .SD) to create the logical vector

      – akrun
      Jan 11 at 11:04











    • Also if it is all the columns except the id you could do smth like: out <- df[, lapply(.SD, shift_up), id]; out[rowSums(is.na(out[,!"id"])) < (length(out) - 1)].

      – snoram
      Jan 11 at 11:06



















    • Hi snoram! Thanks for helping me with this! Your codes work out well! I just want to know if there is a simple way to write "[!(is.na(var1) & is.na(var2) & is.na(var3))]". In my actual data frame, I have over 30 variables. It would be great if I don't have to write this one by one. Thanks again!

      – Ruilin
      Jan 11 at 10:20






    • 1





      In the last step, specify the .SDcols and you can use !Reduce('&', .SD) to create the logical vector

      – akrun
      Jan 11 at 11:04











    • Also if it is all the columns except the id you could do smth like: out <- df[, lapply(.SD, shift_up), id]; out[rowSums(is.na(out[,!"id"])) < (length(out) - 1)].

      – snoram
      Jan 11 at 11:06

















    Hi snoram! Thanks for helping me with this! Your codes work out well! I just want to know if there is a simple way to write "[!(is.na(var1) & is.na(var2) & is.na(var3))]". In my actual data frame, I have over 30 variables. It would be great if I don't have to write this one by one. Thanks again!

    – Ruilin
    Jan 11 at 10:20





    Hi snoram! Thanks for helping me with this! Your codes work out well! I just want to know if there is a simple way to write "[!(is.na(var1) & is.na(var2) & is.na(var3))]". In my actual data frame, I have over 30 variables. It would be great if I don't have to write this one by one. Thanks again!

    – Ruilin
    Jan 11 at 10:20




    1




    1





    In the last step, specify the .SDcols and you can use !Reduce('&', .SD) to create the logical vector

    – akrun
    Jan 11 at 11:04





    In the last step, specify the .SDcols and you can use !Reduce('&', .SD) to create the logical vector

    – akrun
    Jan 11 at 11:04













    Also if it is all the columns except the id you could do smth like: out <- df[, lapply(.SD, shift_up), id]; out[rowSums(is.na(out[,!"id"])) < (length(out) - 1)].

    – snoram
    Jan 11 at 11:06





    Also if it is all the columns except the id you could do smth like: out <- df[, lapply(.SD, shift_up), id]; out[rowSums(is.na(out[,!"id"])) < (length(out) - 1)].

    – snoram
    Jan 11 at 11:06













    4














    Don't think this is the most efficient way to do it but one option



    library(rowr)

    df1 <- do.call(rbind, lapply(split(df, df$id), function(x) {
    data.frame(id = x$id[1], do.call(cbind.fill,c(sapply(x[-1], na.omit),fill = NA)))
    }))
    names(df1) <- names(df)
    df1


    # id var1 var2 var3
    #A.1 A 100 100 200
    #A.2 A 200 400 700
    #A.3 A 300 500 800
    #A.4 A <NA> 600 <NA>
    #B.1 B 100 400 500
    #B.2 B 200 <NA> 500
    #B.3 B 300 <NA> 600


    We split the dataframe into list of dataframe for every id and for each dataframe we remove the NA values using na.omit and use cbind.fill to fill the values with NA and finally merge the list of dataframes back into one using rbind with do.call.






    share|improve this answer


























    • Hi Ronak! Thanks for the excellent code! Just wondering if there is a way to keep the column names unchanged? In my actual data frame, the columns names are years; it will be very confusing if I mess it up. Thanks again for your help!

      – Ruilin
      Jan 11 at 10:13











    • @Ruilin yes, you can save the output into different object and rename it using the original dataframe. Updated the answer with that change.

      – Ronak Shah
      Jan 11 at 10:25


















    4














    Don't think this is the most efficient way to do it but one option



    library(rowr)

    df1 <- do.call(rbind, lapply(split(df, df$id), function(x) {
    data.frame(id = x$id[1], do.call(cbind.fill,c(sapply(x[-1], na.omit),fill = NA)))
    }))
    names(df1) <- names(df)
    df1


    # id var1 var2 var3
    #A.1 A 100 100 200
    #A.2 A 200 400 700
    #A.3 A 300 500 800
    #A.4 A <NA> 600 <NA>
    #B.1 B 100 400 500
    #B.2 B 200 <NA> 500
    #B.3 B 300 <NA> 600


    We split the dataframe into list of dataframe for every id and for each dataframe we remove the NA values using na.omit and use cbind.fill to fill the values with NA and finally merge the list of dataframes back into one using rbind with do.call.






    share|improve this answer


























    • Hi Ronak! Thanks for the excellent code! Just wondering if there is a way to keep the column names unchanged? In my actual data frame, the columns names are years; it will be very confusing if I mess it up. Thanks again for your help!

      – Ruilin
      Jan 11 at 10:13











    • @Ruilin yes, you can save the output into different object and rename it using the original dataframe. Updated the answer with that change.

      – Ronak Shah
      Jan 11 at 10:25
















    4












    4








    4







    Don't think this is the most efficient way to do it but one option



    library(rowr)

    df1 <- do.call(rbind, lapply(split(df, df$id), function(x) {
    data.frame(id = x$id[1], do.call(cbind.fill,c(sapply(x[-1], na.omit),fill = NA)))
    }))
    names(df1) <- names(df)
    df1


    # id var1 var2 var3
    #A.1 A 100 100 200
    #A.2 A 200 400 700
    #A.3 A 300 500 800
    #A.4 A <NA> 600 <NA>
    #B.1 B 100 400 500
    #B.2 B 200 <NA> 500
    #B.3 B 300 <NA> 600


    We split the dataframe into list of dataframe for every id and for each dataframe we remove the NA values using na.omit and use cbind.fill to fill the values with NA and finally merge the list of dataframes back into one using rbind with do.call.






    share|improve this answer















    Don't think this is the most efficient way to do it but one option



    library(rowr)

    df1 <- do.call(rbind, lapply(split(df, df$id), function(x) {
    data.frame(id = x$id[1], do.call(cbind.fill,c(sapply(x[-1], na.omit),fill = NA)))
    }))
    names(df1) <- names(df)
    df1


    # id var1 var2 var3
    #A.1 A 100 100 200
    #A.2 A 200 400 700
    #A.3 A 300 500 800
    #A.4 A <NA> 600 <NA>
    #B.1 B 100 400 500
    #B.2 B 200 <NA> 500
    #B.3 B 300 <NA> 600


    We split the dataframe into list of dataframe for every id and for each dataframe we remove the NA values using na.omit and use cbind.fill to fill the values with NA and finally merge the list of dataframes back into one using rbind with do.call.







    share|improve this answer














    share|improve this answer



    share|improve this answer








    edited Jan 11 at 10:24

























    answered Jan 11 at 7:47









    Ronak ShahRonak Shah

    35.2k103856




    35.2k103856













    • Hi Ronak! Thanks for the excellent code! Just wondering if there is a way to keep the column names unchanged? In my actual data frame, the columns names are years; it will be very confusing if I mess it up. Thanks again for your help!

      – Ruilin
      Jan 11 at 10:13











    • @Ruilin yes, you can save the output into different object and rename it using the original dataframe. Updated the answer with that change.

      – Ronak Shah
      Jan 11 at 10:25





















    • Hi Ronak! Thanks for the excellent code! Just wondering if there is a way to keep the column names unchanged? In my actual data frame, the columns names are years; it will be very confusing if I mess it up. Thanks again for your help!

      – Ruilin
      Jan 11 at 10:13











    • @Ruilin yes, you can save the output into different object and rename it using the original dataframe. Updated the answer with that change.

      – Ronak Shah
      Jan 11 at 10:25



















    Hi Ronak! Thanks for the excellent code! Just wondering if there is a way to keep the column names unchanged? In my actual data frame, the columns names are years; it will be very confusing if I mess it up. Thanks again for your help!

    – Ruilin
    Jan 11 at 10:13





    Hi Ronak! Thanks for the excellent code! Just wondering if there is a way to keep the column names unchanged? In my actual data frame, the columns names are years; it will be very confusing if I mess it up. Thanks again for your help!

    – Ruilin
    Jan 11 at 10:13













    @Ruilin yes, you can save the output into different object and rename it using the original dataframe. Updated the answer with that change.

    – Ronak Shah
    Jan 11 at 10:25







    @Ruilin yes, you can save the output into different object and rename it using the original dataframe. Updated the answer with that change.

    – Ronak Shah
    Jan 11 at 10:25













    3














    Here is an option with data.table. Convert the 'data.frame' to 'data.table' (setDT(df)), grouped by 'id', order the other column based on the NA values, then create an index to remove the rows where all the elements are NA



    library(data.table)
    df1 <- setDT(df)[, lapply(.SD, function(x) x[order(is.na(x))]), id]
    df1[df1[,!Reduce(`&`, lapply(.SD, is.na)), .SDcols = var1:var3]]
    # id var1 var2 var3
    #1: A 100 100 200
    #2: A 200 400 700
    #3: A 300 500 800
    #4: A <NA> 600 <NA>
    #5: B 100 400 500
    #6: B 200 <NA> 500
    #7: B 300 <NA> 600




    Or using the same logic with tidyverse. Grouped by 'id', change the order or elements in all other column with mutate_all by ordering on the logical vector (is.na(column)) and keep the rows having at least one non-NA (filter_at)



    library(tidyverse)
    df %>%
    group_by(id) %>%
    mutate_all(funs(.[order(is.na(.))])) %>%
    filter_at(vars(var1:var3), any_vars(!is.na(.)))
    # A tibble: 7 x 4
    # Groups: id [2]
    # id var1 var2 var3
    # <fct> <fct> <fct> <fct>
    #1 A 100 100 200
    #2 A 200 400 700
    #3 A 300 500 800
    #4 A <NA> 600 <NA>
    #5 B 100 400 500
    #6 B 200 <NA> 500
    #7 B 300 <NA> 600




    Ordering a vector/column based on logical indexing is simple.



    v1 <- c(1:3, NA, 5, NA, 7)
    order(is.na(v1)) #gives the index of order
    #[1] 1 2 3 5 7 4 6


    use that index to change the order of values



    v1[order(is.na(v1))]
    #[1] 1 2 3 5 7 NA NA





    share|improve this answer





















    • 1





      Thank you for the beautiful answer. Could you please explain this part ` mutate_all(funs(.[order(is.na(.))])) %>% in this answer

      – amrrs
      Jan 11 at 11:13













    • @amrrs Updated the answer. Hopefully, it helps

      – akrun
      Jan 11 at 11:19
















    3














    Here is an option with data.table. Convert the 'data.frame' to 'data.table' (setDT(df)), grouped by 'id', order the other column based on the NA values, then create an index to remove the rows where all the elements are NA



    library(data.table)
    df1 <- setDT(df)[, lapply(.SD, function(x) x[order(is.na(x))]), id]
    df1[df1[,!Reduce(`&`, lapply(.SD, is.na)), .SDcols = var1:var3]]
    # id var1 var2 var3
    #1: A 100 100 200
    #2: A 200 400 700
    #3: A 300 500 800
    #4: A <NA> 600 <NA>
    #5: B 100 400 500
    #6: B 200 <NA> 500
    #7: B 300 <NA> 600




    Or using the same logic with tidyverse. Grouped by 'id', change the order or elements in all other column with mutate_all by ordering on the logical vector (is.na(column)) and keep the rows having at least one non-NA (filter_at)



    library(tidyverse)
    df %>%
    group_by(id) %>%
    mutate_all(funs(.[order(is.na(.))])) %>%
    filter_at(vars(var1:var3), any_vars(!is.na(.)))
    # A tibble: 7 x 4
    # Groups: id [2]
    # id var1 var2 var3
    # <fct> <fct> <fct> <fct>
    #1 A 100 100 200
    #2 A 200 400 700
    #3 A 300 500 800
    #4 A <NA> 600 <NA>
    #5 B 100 400 500
    #6 B 200 <NA> 500
    #7 B 300 <NA> 600




    Ordering a vector/column based on logical indexing is simple.



    v1 <- c(1:3, NA, 5, NA, 7)
    order(is.na(v1)) #gives the index of order
    #[1] 1 2 3 5 7 4 6


    use that index to change the order of values



    v1[order(is.na(v1))]
    #[1] 1 2 3 5 7 NA NA





    share|improve this answer





















    • 1





      Thank you for the beautiful answer. Could you please explain this part ` mutate_all(funs(.[order(is.na(.))])) %>% in this answer

      – amrrs
      Jan 11 at 11:13













    • @amrrs Updated the answer. Hopefully, it helps

      – akrun
      Jan 11 at 11:19














    3












    3








    3







    Here is an option with data.table. Convert the 'data.frame' to 'data.table' (setDT(df)), grouped by 'id', order the other column based on the NA values, then create an index to remove the rows where all the elements are NA



    library(data.table)
    df1 <- setDT(df)[, lapply(.SD, function(x) x[order(is.na(x))]), id]
    df1[df1[,!Reduce(`&`, lapply(.SD, is.na)), .SDcols = var1:var3]]
    # id var1 var2 var3
    #1: A 100 100 200
    #2: A 200 400 700
    #3: A 300 500 800
    #4: A <NA> 600 <NA>
    #5: B 100 400 500
    #6: B 200 <NA> 500
    #7: B 300 <NA> 600




    Or using the same logic with tidyverse. Grouped by 'id', change the order or elements in all other column with mutate_all by ordering on the logical vector (is.na(column)) and keep the rows having at least one non-NA (filter_at)



    library(tidyverse)
    df %>%
    group_by(id) %>%
    mutate_all(funs(.[order(is.na(.))])) %>%
    filter_at(vars(var1:var3), any_vars(!is.na(.)))
    # A tibble: 7 x 4
    # Groups: id [2]
    # id var1 var2 var3
    # <fct> <fct> <fct> <fct>
    #1 A 100 100 200
    #2 A 200 400 700
    #3 A 300 500 800
    #4 A <NA> 600 <NA>
    #5 B 100 400 500
    #6 B 200 <NA> 500
    #7 B 300 <NA> 600




    Ordering a vector/column based on logical indexing is simple.



    v1 <- c(1:3, NA, 5, NA, 7)
    order(is.na(v1)) #gives the index of order
    #[1] 1 2 3 5 7 4 6


    use that index to change the order of values



    v1[order(is.na(v1))]
    #[1] 1 2 3 5 7 NA NA





    share|improve this answer















    Here is an option with data.table. Convert the 'data.frame' to 'data.table' (setDT(df)), grouped by 'id', order the other column based on the NA values, then create an index to remove the rows where all the elements are NA



    library(data.table)
    df1 <- setDT(df)[, lapply(.SD, function(x) x[order(is.na(x))]), id]
    df1[df1[,!Reduce(`&`, lapply(.SD, is.na)), .SDcols = var1:var3]]
    # id var1 var2 var3
    #1: A 100 100 200
    #2: A 200 400 700
    #3: A 300 500 800
    #4: A <NA> 600 <NA>
    #5: B 100 400 500
    #6: B 200 <NA> 500
    #7: B 300 <NA> 600




    Or using the same logic with tidyverse. Grouped by 'id', change the order or elements in all other column with mutate_all by ordering on the logical vector (is.na(column)) and keep the rows having at least one non-NA (filter_at)



    library(tidyverse)
    df %>%
    group_by(id) %>%
    mutate_all(funs(.[order(is.na(.))])) %>%
    filter_at(vars(var1:var3), any_vars(!is.na(.)))
    # A tibble: 7 x 4
    # Groups: id [2]
    # id var1 var2 var3
    # <fct> <fct> <fct> <fct>
    #1 A 100 100 200
    #2 A 200 400 700
    #3 A 300 500 800
    #4 A <NA> 600 <NA>
    #5 B 100 400 500
    #6 B 200 <NA> 500
    #7 B 300 <NA> 600




    Ordering a vector/column based on logical indexing is simple.



    v1 <- c(1:3, NA, 5, NA, 7)
    order(is.na(v1)) #gives the index of order
    #[1] 1 2 3 5 7 4 6


    use that index to change the order of values



    v1[order(is.na(v1))]
    #[1] 1 2 3 5 7 NA NA






    share|improve this answer














    share|improve this answer



    share|improve this answer








    edited Jan 11 at 11:19

























    answered Jan 11 at 10:26









    akrunakrun

    402k13193266




    402k13193266








    • 1





      Thank you for the beautiful answer. Could you please explain this part ` mutate_all(funs(.[order(is.na(.))])) %>% in this answer

      – amrrs
      Jan 11 at 11:13













    • @amrrs Updated the answer. Hopefully, it helps

      – akrun
      Jan 11 at 11:19














    • 1





      Thank you for the beautiful answer. Could you please explain this part ` mutate_all(funs(.[order(is.na(.))])) %>% in this answer

      – amrrs
      Jan 11 at 11:13













    • @amrrs Updated the answer. Hopefully, it helps

      – akrun
      Jan 11 at 11:19








    1




    1





    Thank you for the beautiful answer. Could you please explain this part ` mutate_all(funs(.[order(is.na(.))])) %>% in this answer

    – amrrs
    Jan 11 at 11:13







    Thank you for the beautiful answer. Could you please explain this part ` mutate_all(funs(.[order(is.na(.))])) %>% in this answer

    – amrrs
    Jan 11 at 11:13















    @amrrs Updated the answer. Hopefully, it helps

    – akrun
    Jan 11 at 11:19





    @amrrs Updated the answer. Hopefully, it helps

    – akrun
    Jan 11 at 11:19











    0














    here is a base solution, if your real case doesn't feature factors you can skip the first and last lines :



    df <- lapply(df,as.character)
    . <- lapply(split(df,df$id),lapply, na.omit)
    . <- lapply(., function(x) lapply(x, `length<-`, max(lengths(x[-1]))))
    df <- do.call(rbind,lapply(., do.call, what = data.frame))
    df <- lapply(df, factor)

    # id var1 var2 var3
    # A.1 A 100 100 200
    # A.2 A 200 400 700
    # A.3 A 300 500 800
    # A.4 A <NA> 600 <NA>
    # B.1 B 100 400 500
    # B.2 B 200 <NA> 500
    # B.3 B 300 <NA> 600





    share|improve this answer




























      0














      here is a base solution, if your real case doesn't feature factors you can skip the first and last lines :



      df <- lapply(df,as.character)
      . <- lapply(split(df,df$id),lapply, na.omit)
      . <- lapply(., function(x) lapply(x, `length<-`, max(lengths(x[-1]))))
      df <- do.call(rbind,lapply(., do.call, what = data.frame))
      df <- lapply(df, factor)

      # id var1 var2 var3
      # A.1 A 100 100 200
      # A.2 A 200 400 700
      # A.3 A 300 500 800
      # A.4 A <NA> 600 <NA>
      # B.1 B 100 400 500
      # B.2 B 200 <NA> 500
      # B.3 B 300 <NA> 600





      share|improve this answer


























        0












        0








        0







        here is a base solution, if your real case doesn't feature factors you can skip the first and last lines :



        df <- lapply(df,as.character)
        . <- lapply(split(df,df$id),lapply, na.omit)
        . <- lapply(., function(x) lapply(x, `length<-`, max(lengths(x[-1]))))
        df <- do.call(rbind,lapply(., do.call, what = data.frame))
        df <- lapply(df, factor)

        # id var1 var2 var3
        # A.1 A 100 100 200
        # A.2 A 200 400 700
        # A.3 A 300 500 800
        # A.4 A <NA> 600 <NA>
        # B.1 B 100 400 500
        # B.2 B 200 <NA> 500
        # B.3 B 300 <NA> 600





        share|improve this answer













        here is a base solution, if your real case doesn't feature factors you can skip the first and last lines :



        df <- lapply(df,as.character)
        . <- lapply(split(df,df$id),lapply, na.omit)
        . <- lapply(., function(x) lapply(x, `length<-`, max(lengths(x[-1]))))
        df <- do.call(rbind,lapply(., do.call, what = data.frame))
        df <- lapply(df, factor)

        # id var1 var2 var3
        # A.1 A 100 100 200
        # A.2 A 200 400 700
        # A.3 A 300 500 800
        # A.4 A <NA> 600 <NA>
        # B.1 B 100 400 500
        # B.2 B 200 <NA> 500
        # B.3 B 300 <NA> 600






        share|improve this answer












        share|improve this answer



        share|improve this answer










        answered Jan 14 at 13:12









        Moody_MudskipperMoody_Mudskipper

        21.8k32864




        21.8k32864






























            draft saved

            draft discarded




















































            Thanks for contributing an answer to Stack Overflow!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid



            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.


            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f54141835%2fshift-values-in-columns-up-if-there-are-missing-values-above%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            Popular posts from this blog

            How to change which sound is reproduced for terminal bell?

            Can I use Tabulator js library in my java Spring + Thymeleaf project?

            Title Spacing in Bjornstrup Chapter, Removing Chapter Number From Contents