Shift values in columns up if there are missing values above

I have a data frame like this:

df <- data.frame(id = c("A", "A", "A", "A", "A", "A", "A", "A", 

                    "B", "B", "B", "B", "B", "B"),

             var1 = c("100", "200", "300", NA, NA, NA, NA, NA,

                      "100", "200", "300", NA, NA, NA), 

             var2 = c("100", NA, NA, "400", "500", "600", NA, NA,

                      NA, NA, NA, "400", NA, NA),

             var3 = c("200", NA, NA, NA, NA, NA, "700", "800",

                      "500", NA, NA, NA, "500", "600"))

which looks like this:

  id var1 var2 var3

   A  100  100  200

   A  200 <NA> <NA>

   A  300 <NA> <NA>

   A <NA>  400 <NA>

   A <NA>  500 <NA>

   A <NA>  600 <NA>

   A <NA> <NA>  700

   A <NA> <NA>  800

   B  100 <NA>  500

   B  200 <NA> <NA>

   B  300 <NA> <NA>

   B <NA>  400 <NA>

   B <NA> <NA>  500

   B <NA> <NA>  600

I would like to shift values in columns up if there are missing values above (by group). The result should look like this:

  id var1 var2 var3

   A  100  100  200

   A  200  400  700

   A  300  500  800

   A <NA>  600 <NA>

   B  100  400  500

   B  200 <NA>  500

   B  300 <NA>  600

I have no idea how to do this. Any thoughts?

edited Jan 11 at 10:25

Ronak Shah

35.2k103856

asked Jan 11 at 7:07

Ruilin

442

Should "NA" be NA (i.e not a string?)

– Khaynes
Jan 11 at 7:23

1

Edit your question. NA is currently a string.

– NelsonGon
Jan 11 at 7:26

1

Are your numbers really of class character?

– LAP
Jan 11 at 7:44

add a comment |

I have a data frame like this:

df <- data.frame(id = c("A", "A", "A", "A", "A", "A", "A", "A", 

                    "B", "B", "B", "B", "B", "B"),

             var1 = c("100", "200", "300", NA, NA, NA, NA, NA,

                      "100", "200", "300", NA, NA, NA), 

             var2 = c("100", NA, NA, "400", "500", "600", NA, NA,

                      NA, NA, NA, "400", NA, NA),

             var3 = c("200", NA, NA, NA, NA, NA, "700", "800",

                      "500", NA, NA, NA, "500", "600"))

which looks like this:

  id var1 var2 var3

   A  100  100  200

   A  200 <NA> <NA>

   A  300 <NA> <NA>

   A <NA>  400 <NA>

   A <NA>  500 <NA>

   A <NA>  600 <NA>

   A <NA> <NA>  700

   A <NA> <NA>  800

   B  100 <NA>  500

   B  200 <NA> <NA>

   B  300 <NA> <NA>

   B <NA>  400 <NA>

   B <NA> <NA>  500

   B <NA> <NA>  600

I would like to shift values in columns up if there are missing values above (by group). The result should look like this:

  id var1 var2 var3

   A  100  100  200

   A  200  400  700

   A  300  500  800

   A <NA>  600 <NA>

   B  100  400  500

   B  200 <NA>  500

   B  300 <NA>  600

I have no idea how to do this. Any thoughts?

edited Jan 11 at 10:25

Ronak Shah

35.2k103856

asked Jan 11 at 7:07

Ruilin

442

Should "NA" be NA (i.e not a string?)

– Khaynes
Jan 11 at 7:23

1

Edit your question. NA is currently a string.

– NelsonGon
Jan 11 at 7:26

1

Are your numbers really of class character?

– LAP
Jan 11 at 7:44

add a comment |

I have a data frame like this:

df <- data.frame(id = c("A", "A", "A", "A", "A", "A", "A", "A", 

                    "B", "B", "B", "B", "B", "B"),

             var1 = c("100", "200", "300", NA, NA, NA, NA, NA,

                      "100", "200", "300", NA, NA, NA), 

             var2 = c("100", NA, NA, "400", "500", "600", NA, NA,

                      NA, NA, NA, "400", NA, NA),

             var3 = c("200", NA, NA, NA, NA, NA, "700", "800",

                      "500", NA, NA, NA, "500", "600"))

which looks like this:

  id var1 var2 var3

   A  100  100  200

   A  200 <NA> <NA>

   A  300 <NA> <NA>

   A <NA>  400 <NA>

   A <NA>  500 <NA>

   A <NA>  600 <NA>

   A <NA> <NA>  700

   A <NA> <NA>  800

   B  100 <NA>  500

   B  200 <NA> <NA>

   B  300 <NA> <NA>

   B <NA>  400 <NA>

   B <NA> <NA>  500

   B <NA> <NA>  600

I would like to shift values in columns up if there are missing values above (by group). The result should look like this:

  id var1 var2 var3

   A  100  100  200

   A  200  400  700

   A  300  500  800

   A <NA>  600 <NA>

   B  100  400  500

   B  200 <NA>  500

   B  300 <NA>  600

I have no idea how to do this. Any thoughts?

edited Jan 11 at 10:25

Ronak Shah

35.2k103856

asked Jan 11 at 7:07

Ruilin

442

I have a data frame like this:

df <- data.frame(id = c("A", "A", "A", "A", "A", "A", "A", "A", 

                    "B", "B", "B", "B", "B", "B"),

             var1 = c("100", "200", "300", NA, NA, NA, NA, NA,

                      "100", "200", "300", NA, NA, NA), 

             var2 = c("100", NA, NA, "400", "500", "600", NA, NA,

                      NA, NA, NA, "400", NA, NA),

             var3 = c("200", NA, NA, NA, NA, NA, "700", "800",

                      "500", NA, NA, NA, "500", "600"))

which looks like this:

  id var1 var2 var3

   A  100  100  200

   A  200 <NA> <NA>

   A  300 <NA> <NA>

   A <NA>  400 <NA>

   A <NA>  500 <NA>

   A <NA>  600 <NA>

   A <NA> <NA>  700

   A <NA> <NA>  800

   B  100 <NA>  500

   B  200 <NA> <NA>

   B  300 <NA> <NA>

   B <NA>  400 <NA>

   B <NA> <NA>  500

   B <NA> <NA>  600

I would like to shift values in columns up if there are missing values above (by group). The result should look like this:

  id var1 var2 var3

   A  100  100  200

   A  200  400  700

   A  300  500  800

   A <NA>  600 <NA>

   B  100  400  500

   B  200 <NA>  500

   B  300 <NA>  600

I have no idea how to do this. Any thoughts?

r dataframe

edited Jan 11 at 10:25

Ronak Shah

35.2k103856

asked Jan 11 at 7:07

Ruilin

442

edited Jan 11 at 10:25

Ronak Shah

35.2k103856

asked Jan 11 at 7:07

Ruilin

442

edited Jan 11 at 10:25

Ronak Shah

35.2k103856

edited Jan 11 at 10:25

Ronak Shah

35.2k103856

edited Jan 11 at 10:25

Ronak Shah

35.2k103856

asked Jan 11 at 7:07

Ruilin

442

asked Jan 11 at 7:07

Ruilin

442

asked Jan 11 at 7:07

Ruilin

442

Should "NA" be NA (i.e not a string?)

– Khaynes
Jan 11 at 7:23

1

Edit your question. NA is currently a string.

– NelsonGon
Jan 11 at 7:26

1

Are your numbers really of class character?

– LAP
Jan 11 at 7:44

add a comment |

Should "NA" be NA (i.e not a string?)

– Khaynes
Jan 11 at 7:23

1

Edit your question. NA is currently a string.

– NelsonGon
Jan 11 at 7:26

1

Are your numbers really of class character?

– LAP
Jan 11 at 7:44

Should "NA" be NA (i.e not a string?)

– Khaynes
Jan 11 at 7:23

Edit your question. NA is currently a string.

– NelsonGon
Jan 11 at 7:26

Are your numbers really of class character?

– LAP
Jan 11 at 7:44

add a comment |

4 Answers
4

active

oldest

votes

Here is a rough concept using data.table that can be refined:

library(data.table)

# Helper function:

shift_up <- function(x) {

  n <- length(x)

  x <- x[!is.na(x)]

  length(x) <- n

  x

}



setDT(df)

df[, lapply(.SD, shift_up), id][!(is.na(var1) & is.na(var2) & is.na(var3))]



   id var1 var2 var3

1:  A  100  100  200

2:  A  200  400  700

3:  A  300  500  800

4:  A <NA>  600 <NA>

5:  B  100  400  500

6:  B  200 <NA>  500

7:  B  300 <NA>  600

answered Jan 11 at 7:45

snoram

6,692832

Hi snoram! Thanks for helping me with this! Your codes work out well! I just want to know if there is a simple way to write "[!(is.na(var1) & is.na(var2) & is.na(var3))]". In my actual data frame, I have over 30 variables. It would be great if I don't have to write this one by one. Thanks again!

– Ruilin
Jan 11 at 10:20

1

In the last step, specify the .SDcols and you can use !Reduce('&', .SD) to create the logical vector

– akrun
Jan 11 at 11:04

Also if it is all the columns except the id you could do smth like: out <- df[, lapply(.SD, shift_up), id]; out[rowSums(is.na(out[,!"id"])) < (length(out) - 1)].

– snoram
Jan 11 at 11:06

add a comment |

Don't think this is the most efficient way to do it but one option

library(rowr)



df1 <- do.call(rbind, lapply(split(df, df$id), function(x) {

    data.frame(id = x$id[1], do.call(cbind.fill,c(sapply(x[-1], na.omit),fill = NA)))

}))

names(df1) <- names(df)

df1





#    id   var1   var2   var3

#A.1  A    100    100    200

#A.2  A    200    400    700

#A.3  A    300    500    800

#A.4  A   <NA>    600   <NA>

#B.1  B    100    400    500

#B.2  B    200   <NA>    500

#B.3  B    300   <NA>    600

We split the dataframe into list of dataframe for every id and for each dataframe we remove the NA values using na.omit and use cbind.fill to fill the values with NA and finally merge the list of dataframes back into one using rbind with do.call.

edited Jan 11 at 10:24

answered Jan 11 at 7:47

Ronak Shah

35.2k103856

Hi Ronak! Thanks for the excellent code! Just wondering if there is a way to keep the column names unchanged? In my actual data frame, the columns names are years; it will be very confusing if I mess it up. Thanks again for your help!

– Ruilin
Jan 11 at 10:13

@Ruilin yes, you can save the output into different object and rename it using the original dataframe. Updated the answer with that change.

– Ronak Shah
Jan 11 at 10:25

add a comment |

Here is an option with data.table. Convert the 'data.frame' to 'data.table' (setDT(df)), grouped by 'id', order the other column based on the NA values, then create an index to remove the rows where all the elements are NA

library(data.table)

df1 <- setDT(df)[,  lapply(.SD, function(x) x[order(is.na(x))]), id]

df1[df1[,!Reduce(`&`, lapply(.SD, is.na)), .SDcols = var1:var3]]

#   id var1 var2 var3

#1:  A  100  100  200

#2:  A  200  400  700

#3:  A  300  500  800

#4:  A <NA>  600 <NA>

#5:  B  100  400  500

#6:  B  200 <NA>  500

#7:  B  300 <NA>  600

Or using the same logic with tidyverse. Grouped by 'id', change the order or elements in all other column with mutate_all by ordering on the logical vector (is.na(column)) and keep the rows having at least one non-NA (filter_at)

library(tidyverse)

df %>% 

   group_by(id) %>% 

   mutate_all(funs(.[order(is.na(.))])) %>% 

   filter_at(vars(var1:var3), any_vars(!is.na(.)))

# A tibble: 7 x 4

# Groups:   id [2]

#  id    var1  var2  var3 

#  <fct> <fct> <fct> <fct>

#1 A     100   100   200  

#2 A     200   400   700  

#3 A     300   500   800  

#4 A     <NA>  600   <NA> 

#5 B     100   400   500  

#6 B     200   <NA>  500  

#7 B     300   <NA>  600

Ordering a vector/column based on logical indexing is simple.

v1 <- c(1:3, NA, 5, NA, 7)

order(is.na(v1)) #gives the index of order

#[1] 1 2 3 5 7 4 6

use that index to change the order of values

v1[order(is.na(v1))]

#[1]  1  2  3  5  7 NA NA

edited Jan 11 at 11:19

answered Jan 11 at 10:26

akrun

402k13193266

1

Thank you for the beautiful answer. Could you please explain this part ` mutate_all(funs(.[order(is.na(.))])) %>% in this answer

– amrrs
Jan 11 at 11:13

@amrrs Updated the answer. Hopefully, it helps

– akrun
Jan 11 at 11:19

add a comment |

here is a base solution, if your real case doesn't feature factors you can skip the first and last lines :

df <- lapply(df,as.character)

. <- lapply(split(df,df$id),lapply, na.omit)

. <- lapply(., function(x) lapply(x, `length<-`, max(lengths(x[-1]))))

df <- do.call(rbind,lapply(., do.call, what = data.frame))

df <- lapply(df, factor)



#     id var1 var2 var3

# A.1  A  100  100  200

# A.2  A  200  400  700

# A.3  A  300  500  800

# A.4  A <NA>  600 <NA>

# B.1  B  100  400  500

# B.2  B  200 <NA>  500

# B.3  B  300 <NA>  600

answered Jan 14 at 13:12

Moody_Mudskipper

21.8k32864

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f54141835%2fshift-values-in-columns-up-if-there-are-missing-values-above%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

4 Answers
4

active

oldest

votes

4 Answers
4

active

oldest

votes

Here is a rough concept using data.table that can be refined:

library(data.table)

# Helper function:

shift_up <- function(x) {

  n <- length(x)

  x <- x[!is.na(x)]

  length(x) <- n

  x

}



setDT(df)

df[, lapply(.SD, shift_up), id][!(is.na(var1) & is.na(var2) & is.na(var3))]



   id var1 var2 var3

1:  A  100  100  200

2:  A  200  400  700

3:  A  300  500  800

4:  A <NA>  600 <NA>

5:  B  100  400  500

6:  B  200 <NA>  500

7:  B  300 <NA>  600

answered Jan 11 at 7:45

snoram

6,692832

Hi snoram! Thanks for helping me with this! Your codes work out well! I just want to know if there is a simple way to write "[!(is.na(var1) & is.na(var2) & is.na(var3))]". In my actual data frame, I have over 30 variables. It would be great if I don't have to write this one by one. Thanks again!

– Ruilin
Jan 11 at 10:20

1

In the last step, specify the .SDcols and you can use !Reduce('&', .SD) to create the logical vector

– akrun
Jan 11 at 11:04

Also if it is all the columns except the id you could do smth like: out <- df[, lapply(.SD, shift_up), id]; out[rowSums(is.na(out[,!"id"])) < (length(out) - 1)].

– snoram
Jan 11 at 11:06

add a comment |

Here is a rough concept using data.table that can be refined:

library(data.table)

# Helper function:

shift_up <- function(x) {

  n <- length(x)

  x <- x[!is.na(x)]

  length(x) <- n

  x

}



setDT(df)

df[, lapply(.SD, shift_up), id][!(is.na(var1) & is.na(var2) & is.na(var3))]



   id var1 var2 var3

1:  A  100  100  200

2:  A  200  400  700

3:  A  300  500  800

4:  A <NA>  600 <NA>

5:  B  100  400  500

6:  B  200 <NA>  500

7:  B  300 <NA>  600

answered Jan 11 at 7:45

snoram

6,692832

Hi snoram! Thanks for helping me with this! Your codes work out well! I just want to know if there is a simple way to write "[!(is.na(var1) & is.na(var2) & is.na(var3))]". In my actual data frame, I have over 30 variables. It would be great if I don't have to write this one by one. Thanks again!

– Ruilin
Jan 11 at 10:20

1

In the last step, specify the .SDcols and you can use !Reduce('&', .SD) to create the logical vector

– akrun
Jan 11 at 11:04

Also if it is all the columns except the id you could do smth like: out <- df[, lapply(.SD, shift_up), id]; out[rowSums(is.na(out[,!"id"])) < (length(out) - 1)].

– snoram
Jan 11 at 11:06

add a comment |

Here is a rough concept using data.table that can be refined:

library(data.table)

# Helper function:

shift_up <- function(x) {

  n <- length(x)

  x <- x[!is.na(x)]

  length(x) <- n

  x

}



setDT(df)

df[, lapply(.SD, shift_up), id][!(is.na(var1) & is.na(var2) & is.na(var3))]



   id var1 var2 var3

1:  A  100  100  200

2:  A  200  400  700

3:  A  300  500  800

4:  A <NA>  600 <NA>

5:  B  100  400  500

6:  B  200 <NA>  500

7:  B  300 <NA>  600

answered Jan 11 at 7:45

snoram

6,692832

Here is a rough concept using data.table that can be refined:

library(data.table)

# Helper function:

shift_up <- function(x) {

  n <- length(x)

  x <- x[!is.na(x)]

  length(x) <- n

  x

}



setDT(df)

df[, lapply(.SD, shift_up), id][!(is.na(var1) & is.na(var2) & is.na(var3))]



   id var1 var2 var3

1:  A  100  100  200

2:  A  200  400  700

3:  A  300  500  800

4:  A <NA>  600 <NA>

5:  B  100  400  500

6:  B  200 <NA>  500

7:  B  300 <NA>  600

answered Jan 11 at 7:45

snoram

6,692832

answered Jan 11 at 7:45

snoram

6,692832

answered Jan 11 at 7:45

snoram

6,692832

answered Jan 11 at 7:45

snoram

6,692832

Hi snoram! Thanks for helping me with this! Your codes work out well! I just want to know if there is a simple way to write "[!(is.na(var1) & is.na(var2) & is.na(var3))]". In my actual data frame, I have over 30 variables. It would be great if I don't have to write this one by one. Thanks again!

– Ruilin
Jan 11 at 10:20

1

In the last step, specify the .SDcols and you can use !Reduce('&', .SD) to create the logical vector

– akrun
Jan 11 at 11:04

Also if it is all the columns except the id you could do smth like: out <- df[, lapply(.SD, shift_up), id]; out[rowSums(is.na(out[,!"id"])) < (length(out) - 1)].

– snoram
Jan 11 at 11:06

add a comment |

Hi snoram! Thanks for helping me with this! Your codes work out well! I just want to know if there is a simple way to write "[!(is.na(var1) & is.na(var2) & is.na(var3))]". In my actual data frame, I have over 30 variables. It would be great if I don't have to write this one by one. Thanks again!

– Ruilin
Jan 11 at 10:20

1

In the last step, specify the .SDcols and you can use !Reduce('&', .SD) to create the logical vector

– akrun
Jan 11 at 11:04

Also if it is all the columns except the id you could do smth like: out <- df[, lapply(.SD, shift_up), id]; out[rowSums(is.na(out[,!"id"])) < (length(out) - 1)].

– snoram
Jan 11 at 11:06

Hi snoram! Thanks for helping me with this! Your codes work out well! I just want to know if there is a simple way to write "[!(is.na(var1) & is.na(var2) & is.na(var3))]". In my actual data frame, I have over 30 variables. It would be great if I don't have to write this one by one. Thanks again!

– Ruilin
Jan 11 at 10:20

In the last step, specify the .SDcols and you can use !Reduce('&', .SD) to create the logical vector

– akrun
Jan 11 at 11:04

Also if it is all the columns except the id you could do smth like: out <- df[, lapply(.SD, shift_up), id]; out[rowSums(is.na(out[,!"id"])) < (length(out) - 1)].

– snoram
Jan 11 at 11:06

add a comment |

Don't think this is the most efficient way to do it but one option

library(rowr)



df1 <- do.call(rbind, lapply(split(df, df$id), function(x) {

    data.frame(id = x$id[1], do.call(cbind.fill,c(sapply(x[-1], na.omit),fill = NA)))

}))

names(df1) <- names(df)

df1





#    id   var1   var2   var3

#A.1  A    100    100    200

#A.2  A    200    400    700

#A.3  A    300    500    800

#A.4  A   <NA>    600   <NA>

#B.1  B    100    400    500

#B.2  B    200   <NA>    500

#B.3  B    300   <NA>    600

edited Jan 11 at 10:24

answered Jan 11 at 7:47

Ronak Shah

35.2k103856

Hi Ronak! Thanks for the excellent code! Just wondering if there is a way to keep the column names unchanged? In my actual data frame, the columns names are years; it will be very confusing if I mess it up. Thanks again for your help!

– Ruilin
Jan 11 at 10:13

@Ruilin yes, you can save the output into different object and rename it using the original dataframe. Updated the answer with that change.

– Ronak Shah
Jan 11 at 10:25

add a comment |

Don't think this is the most efficient way to do it but one option

library(rowr)



df1 <- do.call(rbind, lapply(split(df, df$id), function(x) {

    data.frame(id = x$id[1], do.call(cbind.fill,c(sapply(x[-1], na.omit),fill = NA)))

}))

names(df1) <- names(df)

df1





#    id   var1   var2   var3

#A.1  A    100    100    200

#A.2  A    200    400    700

#A.3  A    300    500    800

#A.4  A   <NA>    600   <NA>

#B.1  B    100    400    500

#B.2  B    200   <NA>    500

#B.3  B    300   <NA>    600

edited Jan 11 at 10:24

answered Jan 11 at 7:47

Ronak Shah

35.2k103856

Hi Ronak! Thanks for the excellent code! Just wondering if there is a way to keep the column names unchanged? In my actual data frame, the columns names are years; it will be very confusing if I mess it up. Thanks again for your help!

– Ruilin
Jan 11 at 10:13

@Ruilin yes, you can save the output into different object and rename it using the original dataframe. Updated the answer with that change.

– Ronak Shah
Jan 11 at 10:25

add a comment |

Don't think this is the most efficient way to do it but one option

library(rowr)



df1 <- do.call(rbind, lapply(split(df, df$id), function(x) {

    data.frame(id = x$id[1], do.call(cbind.fill,c(sapply(x[-1], na.omit),fill = NA)))

}))

names(df1) <- names(df)

df1





#    id   var1   var2   var3

#A.1  A    100    100    200

#A.2  A    200    400    700

#A.3  A    300    500    800

#A.4  A   <NA>    600   <NA>

#B.1  B    100    400    500

#B.2  B    200   <NA>    500

#B.3  B    300   <NA>    600

edited Jan 11 at 10:24

answered Jan 11 at 7:47

Ronak Shah

35.2k103856

Don't think this is the most efficient way to do it but one option

library(rowr)



df1 <- do.call(rbind, lapply(split(df, df$id), function(x) {

    data.frame(id = x$id[1], do.call(cbind.fill,c(sapply(x[-1], na.omit),fill = NA)))

}))

names(df1) <- names(df)

df1





#    id   var1   var2   var3

#A.1  A    100    100    200

#A.2  A    200    400    700

#A.3  A    300    500    800

#A.4  A   <NA>    600   <NA>

#B.1  B    100    400    500

#B.2  B    200   <NA>    500

#B.3  B    300   <NA>    600

edited Jan 11 at 10:24

answered Jan 11 at 7:47

Ronak Shah

35.2k103856

edited Jan 11 at 10:24

answered Jan 11 at 7:47

Ronak Shah

35.2k103856

answered Jan 11 at 7:47

Ronak Shah

35.2k103856

answered Jan 11 at 7:47

Ronak Shah

35.2k103856

Hi Ronak! Thanks for the excellent code! Just wondering if there is a way to keep the column names unchanged? In my actual data frame, the columns names are years; it will be very confusing if I mess it up. Thanks again for your help!

– Ruilin
Jan 11 at 10:13

@Ruilin yes, you can save the output into different object and rename it using the original dataframe. Updated the answer with that change.

– Ronak Shah
Jan 11 at 10:25

add a comment |

Hi Ronak! Thanks for the excellent code! Just wondering if there is a way to keep the column names unchanged? In my actual data frame, the columns names are years; it will be very confusing if I mess it up. Thanks again for your help!

– Ruilin
Jan 11 at 10:13

@Ruilin yes, you can save the output into different object and rename it using the original dataframe. Updated the answer with that change.

– Ronak Shah
Jan 11 at 10:25

Hi Ronak! Thanks for the excellent code! Just wondering if there is a way to keep the column names unchanged? In my actual data frame, the columns names are years; it will be very confusing if I mess it up. Thanks again for your help!

– Ruilin
Jan 11 at 10:13

@Ruilin yes, you can save the output into different object and rename it using the original dataframe. Updated the answer with that change.

– Ronak Shah
Jan 11 at 10:25

add a comment |

library(data.table)

df1 <- setDT(df)[,  lapply(.SD, function(x) x[order(is.na(x))]), id]

df1[df1[,!Reduce(`&`, lapply(.SD, is.na)), .SDcols = var1:var3]]

#   id var1 var2 var3

#1:  A  100  100  200

#2:  A  200  400  700

#3:  A  300  500  800

#4:  A <NA>  600 <NA>

#5:  B  100  400  500

#6:  B  200 <NA>  500

#7:  B  300 <NA>  600

library(tidyverse)

df %>% 

   group_by(id) %>% 

   mutate_all(funs(.[order(is.na(.))])) %>% 

   filter_at(vars(var1:var3), any_vars(!is.na(.)))

# A tibble: 7 x 4

# Groups:   id [2]

#  id    var1  var2  var3 

#  <fct> <fct> <fct> <fct>

#1 A     100   100   200  

#2 A     200   400   700  

#3 A     300   500   800  

#4 A     <NA>  600   <NA> 

#5 B     100   400   500  

#6 B     200   <NA>  500  

#7 B     300   <NA>  600

Ordering a vector/column based on logical indexing is simple.

v1 <- c(1:3, NA, 5, NA, 7)

order(is.na(v1)) #gives the index of order

#[1] 1 2 3 5 7 4 6

use that index to change the order of values

v1[order(is.na(v1))]

#[1]  1  2  3  5  7 NA NA

edited Jan 11 at 11:19

answered Jan 11 at 10:26

akrun

402k13193266

1

Thank you for the beautiful answer. Could you please explain this part ` mutate_all(funs(.[order(is.na(.))])) %>% in this answer

– amrrs
Jan 11 at 11:13

@amrrs Updated the answer. Hopefully, it helps

– akrun
Jan 11 at 11:19

add a comment |

library(data.table)

df1 <- setDT(df)[,  lapply(.SD, function(x) x[order(is.na(x))]), id]

df1[df1[,!Reduce(`&`, lapply(.SD, is.na)), .SDcols = var1:var3]]

#   id var1 var2 var3

#1:  A  100  100  200

#2:  A  200  400  700

#3:  A  300  500  800

#4:  A <NA>  600 <NA>

#5:  B  100  400  500

#6:  B  200 <NA>  500

#7:  B  300 <NA>  600

library(tidyverse)

df %>% 

   group_by(id) %>% 

   mutate_all(funs(.[order(is.na(.))])) %>% 

   filter_at(vars(var1:var3), any_vars(!is.na(.)))

# A tibble: 7 x 4

# Groups:   id [2]

#  id    var1  var2  var3 

#  <fct> <fct> <fct> <fct>

#1 A     100   100   200  

#2 A     200   400   700  

#3 A     300   500   800  

#4 A     <NA>  600   <NA> 

#5 B     100   400   500  

#6 B     200   <NA>  500  

#7 B     300   <NA>  600

Ordering a vector/column based on logical indexing is simple.

v1 <- c(1:3, NA, 5, NA, 7)

order(is.na(v1)) #gives the index of order

#[1] 1 2 3 5 7 4 6

use that index to change the order of values

v1[order(is.na(v1))]

#[1]  1  2  3  5  7 NA NA

edited Jan 11 at 11:19

answered Jan 11 at 10:26

akrun

402k13193266

1

Thank you for the beautiful answer. Could you please explain this part ` mutate_all(funs(.[order(is.na(.))])) %>% in this answer

– amrrs
Jan 11 at 11:13

@amrrs Updated the answer. Hopefully, it helps

– akrun
Jan 11 at 11:19

add a comment |

library(data.table)

df1 <- setDT(df)[,  lapply(.SD, function(x) x[order(is.na(x))]), id]

df1[df1[,!Reduce(`&`, lapply(.SD, is.na)), .SDcols = var1:var3]]

#   id var1 var2 var3

#1:  A  100  100  200

#2:  A  200  400  700

#3:  A  300  500  800

#4:  A <NA>  600 <NA>

#5:  B  100  400  500

#6:  B  200 <NA>  500

#7:  B  300 <NA>  600

library(tidyverse)

df %>% 

   group_by(id) %>% 

   mutate_all(funs(.[order(is.na(.))])) %>% 

   filter_at(vars(var1:var3), any_vars(!is.na(.)))

# A tibble: 7 x 4

# Groups:   id [2]

#  id    var1  var2  var3 

#  <fct> <fct> <fct> <fct>

#1 A     100   100   200  

#2 A     200   400   700  

#3 A     300   500   800  

#4 A     <NA>  600   <NA> 

#5 B     100   400   500  

#6 B     200   <NA>  500  

#7 B     300   <NA>  600

Ordering a vector/column based on logical indexing is simple.

v1 <- c(1:3, NA, 5, NA, 7)

order(is.na(v1)) #gives the index of order

#[1] 1 2 3 5 7 4 6

use that index to change the order of values

v1[order(is.na(v1))]

#[1]  1  2  3  5  7 NA NA

edited Jan 11 at 11:19

answered Jan 11 at 10:26

akrun

402k13193266

library(data.table)

df1 <- setDT(df)[,  lapply(.SD, function(x) x[order(is.na(x))]), id]

df1[df1[,!Reduce(`&`, lapply(.SD, is.na)), .SDcols = var1:var3]]

#   id var1 var2 var3

#1:  A  100  100  200

#2:  A  200  400  700

#3:  A  300  500  800

#4:  A <NA>  600 <NA>

#5:  B  100  400  500

#6:  B  200 <NA>  500

#7:  B  300 <NA>  600

library(tidyverse)

df %>% 

   group_by(id) %>% 

   mutate_all(funs(.[order(is.na(.))])) %>% 

   filter_at(vars(var1:var3), any_vars(!is.na(.)))

# A tibble: 7 x 4

# Groups:   id [2]

#  id    var1  var2  var3 

#  <fct> <fct> <fct> <fct>

#1 A     100   100   200  

#2 A     200   400   700  

#3 A     300   500   800  

#4 A     <NA>  600   <NA> 

#5 B     100   400   500  

#6 B     200   <NA>  500  

#7 B     300   <NA>  600

Ordering a vector/column based on logical indexing is simple.

v1 <- c(1:3, NA, 5, NA, 7)

order(is.na(v1)) #gives the index of order

#[1] 1 2 3 5 7 4 6

use that index to change the order of values

v1[order(is.na(v1))]

#[1]  1  2  3  5  7 NA NA

edited Jan 11 at 11:19

answered Jan 11 at 10:26

akrun

402k13193266

edited Jan 11 at 11:19

answered Jan 11 at 10:26

akrun

402k13193266

answered Jan 11 at 10:26

akrun

402k13193266

answered Jan 11 at 10:26

akrun

402k13193266

1

Thank you for the beautiful answer. Could you please explain this part ` mutate_all(funs(.[order(is.na(.))])) %>% in this answer

– amrrs
Jan 11 at 11:13

@amrrs Updated the answer. Hopefully, it helps

– akrun
Jan 11 at 11:19

add a comment |

1

Thank you for the beautiful answer. Could you please explain this part ` mutate_all(funs(.[order(is.na(.))])) %>% in this answer

– amrrs
Jan 11 at 11:13

@amrrs Updated the answer. Hopefully, it helps

– akrun
Jan 11 at 11:19

Thank you for the beautiful answer. Could you please explain this part ` mutate_all(funs(.[order(is.na(.))])) %>% in this answer

– amrrs
Jan 11 at 11:13

@amrrs Updated the answer. Hopefully, it helps

– akrun
Jan 11 at 11:19

add a comment |

here is a base solution, if your real case doesn't feature factors you can skip the first and last lines :

df <- lapply(df,as.character)

. <- lapply(split(df,df$id),lapply, na.omit)

. <- lapply(., function(x) lapply(x, `length<-`, max(lengths(x[-1]))))

df <- do.call(rbind,lapply(., do.call, what = data.frame))

df <- lapply(df, factor)



#     id var1 var2 var3

# A.1  A  100  100  200

# A.2  A  200  400  700

# A.3  A  300  500  800

# A.4  A <NA>  600 <NA>

# B.1  B  100  400  500

# B.2  B  200 <NA>  500

# B.3  B  300 <NA>  600

answered Jan 14 at 13:12

Moody_Mudskipper

21.8k32864

add a comment |

here is a base solution, if your real case doesn't feature factors you can skip the first and last lines :

df <- lapply(df,as.character)

. <- lapply(split(df,df$id),lapply, na.omit)

. <- lapply(., function(x) lapply(x, `length<-`, max(lengths(x[-1]))))

df <- do.call(rbind,lapply(., do.call, what = data.frame))

df <- lapply(df, factor)



#     id var1 var2 var3

# A.1  A  100  100  200

# A.2  A  200  400  700

# A.3  A  300  500  800

# A.4  A <NA>  600 <NA>

# B.1  B  100  400  500

# B.2  B  200 <NA>  500

# B.3  B  300 <NA>  600

answered Jan 14 at 13:12

Moody_Mudskipper

21.8k32864

add a comment |

here is a base solution, if your real case doesn't feature factors you can skip the first and last lines :

df <- lapply(df,as.character)

. <- lapply(split(df,df$id),lapply, na.omit)

. <- lapply(., function(x) lapply(x, `length<-`, max(lengths(x[-1]))))

df <- do.call(rbind,lapply(., do.call, what = data.frame))

df <- lapply(df, factor)



#     id var1 var2 var3

# A.1  A  100  100  200

# A.2  A  200  400  700

# A.3  A  300  500  800

# A.4  A <NA>  600 <NA>

# B.1  B  100  400  500

# B.2  B  200 <NA>  500

# B.3  B  300 <NA>  600

answered Jan 14 at 13:12

Moody_Mudskipper

21.8k32864

here is a base solution, if your real case doesn't feature factors you can skip the first and last lines :

df <- lapply(df,as.character)

. <- lapply(split(df,df$id),lapply, na.omit)

. <- lapply(., function(x) lapply(x, `length<-`, max(lengths(x[-1]))))

df <- do.call(rbind,lapply(., do.call, what = data.frame))

df <- lapply(df, factor)



#     id var1 var2 var3

# A.1  A  100  100  200

# A.2  A  200  400  700

# A.3  A  300  500  800

# A.4  A <NA>  600 <NA>

# B.1  B  100  400  500

# B.2  B  200 <NA>  500

# B.3  B  300 <NA>  600

answered Jan 14 at 13:12

Moody_Mudskipper

21.8k32864

answered Jan 14 at 13:12

Moody_Mudskipper

21.8k32864

answered Jan 14 at 13:12

Moody_Mudskipper

21.8k32864

answered Jan 14 at 13:12

Moody_Mudskipper

21.8k32864

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Cfrgtkky