Cfrgtkky

Question

I have a matrix:

m <- matrix(c(

  1,    1,    1,    0,    0,    0,

  0,    0,    0,    0,    0,    0,

  3,    0,    0,    0,    0,    0,

  3,    0,    0,    0,    0,    2,

  3,    0,    0,    0,    0,    0,

  3,    0,    0,    0,    2,    2),

  ncol = 6, byrow = TRUE)



     [,1] [,2] [,3] [,4] [,5] [,6]

[1,]    1    1    1    0    0    0

[2,]    0    0    0    0    0    0

[3,]    3    0    0    0    0    0

[4,]    3    0    0    0    0    2 # <- island 3, value 2

[5,]    3    0    0    0    0    0

[6,]    3    0    0    0    2    2 # <- island  4, also value 2

In this matrix, there are four 'islands', i.e. non-zero values separated by zeros:

(1) an island composed of three 1's, (2) four 3's, (3) one 2, and (4) two 2's.

Thus, two islands are composed of the value 2. I want to identify such 'duplicate' islands and change the values of one of the 'islands' (either will do) to the next available number (4 in this case):

     [,1] [,2] [,3] [,4] [,5] [,6]

[1,]    1    1    1    0    0    0

[2,]    0    0    0    0    0    0

[3,]    3    0    0    0    0    0

[4,]    3    0    0    0    0    2

[5,]    3    0    0    0    0    0

[6,]    3    0    0    0    4    4

Not clear (I see 3 'islands', -- why is the next value 4?, ...). Please explain it more and give an expected output as well. Also make sure your examples are reproducible — Nov 20 '18 at 10:47
There are 4 islands: one made up of 1's, one made up of 3's and 2 made up of 2's. 4 is a number that hasn't already been taken and is one more than the maximum number already used. — Nov 20 '18 at 10:49
As I said in the original question, change either of the islands labelled '2' to '4' but not both. — Nov 20 '18 at 10:52

score 2 · Accepted Answer · 2018-11-20 14:51:24Z

Fun question! Let's take a more involved case

(M <- matrix(c(1, 0, 3, 3, 3, 3, 1, 0, 0, 0, 0, 0, 1, 0, 3, 0, 2, 

               0, 0, 0, 3, 0, 0, 0, 0, 0, 0, 0, 0, 2, 1, 0, 0, 2, 0, 2), 6, 6))

#      [,1] [,2] [,3] [,4] [,5] [,6]

# [1,]    1    1    1    0    0    1

# [2,]    0    0    0    0    0    0

# [3,]    3    0    3    3    0    0

# [4,]    3    0    0    0    0    2

# [5,]    3    0    2    0    0    0

# [6,]    3    0    0    0    2    2

Here's a graph-based solution.

library(igraph)

# Indices of nonzero matrix elements

idx <- which(M != 0, arr.ind = TRUE)

# Adjacency matrix for matrix entries

# Two entries are adjacent if their column or row number differs by one

# Also, due to idx, an implicit condition is also that the two entries are the same

adj <- 1 * (as.matrix(dist(idx, method = "manhattan")) == 1)

# Creating loops as to take into account singleton islands

diag(adj) <- 1

# A corresponding graphs

g <- graph_from_adjacency_matrix(adj, mode = "undirected")

# Connected components of this graph

cmps <- clusters(g)

# Going over unique values of M

for(i in 1:max(M)) {

  # Islands of value i

  un <- unique(cmps$membership[M[idx] == i])

  # More than one island?

  if(length(un) > 1)

    # If so, let's go over islands 2, 3, ...

    for(cmp in un[-1])

      # ... and replace corresponding matrix entries by max(M) + 1

      M[idx[cmps$membership == cmp, , drop = FALSE]] <- max(M) + 1

}



M

#      [,1] [,2] [,3] [,4] [,5] [,6]

# [1,]    1    1    1    0    0    4

# [2,]    0    0    0    0    0    0

# [3,]    3    0    7    7    0    0

# [4,]    3    0    0    0    0    6

# [5,]    3    0    2    0    0    0

# [6,]    3    0    0    0    5    5

Also note that using adj alone we could find all the islands if we could find its permutation leading to a block-diagonal matrix with the maximal number of blocks. Then each block would correspond to an island. However, I couldn't find an R implementation of a relevant procedure.

I was trying to use a depth first search, this is perfect.

– user3651829
Nov 20 '18 at 13:22 — Nov 20 '18 at 13:22

score 1 · Accepted Answer · 2018-11-20 19:47:43Z

'Islands' of non-zero values can be identified by raster::clump*. Then use data.table convenience functions to identify which values should be updated.

library(raster)

library(data.table)



# get index of non-zero values. re-order to match the clump order

ix <- which(m != 0, arr.ind = TRUE)

ix <- ix[order(ix[ , "row"]), ]



# get clumps

cl <- clump(raster(m))

cl_ix <- cl@data@values



# put stuff in a data.table and order by x

d <- data.table(ix, x = m[ix], cl_ix = cl_ix[!is.na(cl_ix)])

setorder(d, x, cl_ix)



# for each x, create a counter of runs of clump index

d[ , g := rleid(cl_ix), by = x]



# for 'duplicated' runs...

# ...add to x based on runs of x and clump index runs

d[g > 1, x := max(d$x) + rleid(x, g)]



# update matrix

m2 <- m

m2[as.matrix(d[ , .(row, col)])] <- d$x



m

#      [,1] [,2] [,3] [,4] [,5] [,6]

# [1,]    1    1    1    0    0    1

# [2,]    0    0    0    0    0    0

# [3,]    3    0    3    3    0    0

# [4,]    3    0    0    0    0    2

# [5,]    3    0    2    0    0    0

# [6,]    3    0    0    0    2    2



m2

#      [,1] [,2] [,3] [,4] [,5] [,6]

# [1,]    1    1    1    0    0    4

# [2,]    0    0    0    0    0    0

# [3,]    3    0    7    7    0    0

# [4,]    3    0    0    0    0    2

# [5,]    3    0    5    0    0    0

# [6,]    3    0    0    0    6    6

*Note that the clump function requires that the igraph package is available.

score 0 · Accepted Answer · 2018-11-20 11:20:58Z

It was harder than I thought becase of the "not both" condition, I achieved the result with a while loop for now, we'll se if it can be improved:

(basically we move by row and check if the island is found, if so we end our research)

# some useful variables

i=1 # row counter

counter=0 # check if island is found

max_m <- max(m) #finds the max value in the matrix, to fill



while(counter == 0) {



  if (any(m[i, ] == 2)) { # check if we find the island in the row, otherwise skip

    row <- m[i, ]

    row[row == 2] <- max_m + 1 # here we change the value

    m[i, ] <- row

    counter <- counter + 1

  }



  i = i + 1 # we move up one row

  #cat("row number: ", i, "n") # sanity check to see if it was an infinite loop

}

m

#      [,1] [,2] [,3] [,4] [,5] [,6]

# [1,]    1    1    1    0    0    0

# [2,]    0    0    0    0    0    0

# [3,]    3    0    0    0    0    0

# [4,]    3    0    0    0    0    4

# [5,]    3    0    0    0    0    0

# [6,]    3    0    0    0    2    2

This is far from perfect, because we move by rows, so if the first island is across a column we would change only the first value.

Example of unexpected result:

#      [,1] [,2] [,3] [,4] [,5] [,6]

# [1,]    1    1    1    0    0    0

# [2,]    0    0    0    0    0    0

# [3,]    3    0    0    0    0    0

# [4,]    3    0    0    0    0    4

# [5,]    3    0    0    0    0    2 # problem here

# [6,]    3    0    0    0    0    0

Data used:

m <- matrix(c(rep(1, 3),

              rep(0, 9),

              3, 

              rep(0, 5),

              3,

              rep(0, 4),

              2,

              3,

              rep(0, 5),

              3,

              rep(0,3),

              rep(2, 2)),ncol=6,nrow=6, byrow = T)

score 0 · Accepted Answer · 2018-11-20 13:20:19Z

This can easily be achieved with package TraMineR.

islander <- function(mat) {

  require(TraMineR)

  rows.mat.seq <- seqdef(mat)  # seeks all sequences in rows 

  cols.mat.seq <- seqdef(t(mat))  # tranposed version

  rows <- seqpm(rows.mat.seq, 22)$MIndex  # seeks for sub sequence 2-2 in rows

  cols <- seqpm(cols.mat.seq, 22)$MIndex  # seeks for sub sequence 2-2 in columns

  if (length(cols) == 0) {  # the row case

    mat[rows, which(mat[rows, ] == 2)] <- 4

    return(mat)

  } else {  # the column case

    mat[which(mat[, cols] == 2), cols] <- 4

    return(mat)

  }

}

Yielding

> islander(row.mat)

...

     [,1] [,2] [,3] [,4] [,5] [,6]

[1,]    1    1    1    0    0    0

[2,]    0    0    0    0    0    0

[3,]    3    0    0    0    0    0

[4,]    3    0    0    0    0    2

[5,]    3    0    0    0    0    0

[6,]    3    0    0    0    4    4



> islander(col.mat)

...

     [,1] [,2] [,3] [,4] [,5] [,6]

[1,]    1    1    1    0    0    0

[2,]    0    0    0    0    0    0

[3,]    3    0    0    0    0    0

[4,]    3    0    0    0    0    0

[5,]    3    0    0    0    0    4

[6,]    3    0    0    2    0    4

Note: If your island is longer, you need to adept the code, e.g. for length of island is 3 do seqpm(., 222). It is certainly possible to implement the consideration of all cases into the function.

Data

row.mat <- structure(c(1, 0, 3, 3, 3, 3, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 

                   0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 0, 0, 0, 2, 0, 2), .Dim = c(6L, 

                                                                                      6L))

col.mat <- structure(c(1, 0, 3, 3, 3, 3, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 

                    0, 0, 0, 0, 0, 0, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 2), .Dim = c(6L, 

                                                                                       6L))



> row.mat

     [,1] [,2] [,3] [,4] [,5] [,6]

[1,]    1    1    1    0    0    0

[2,]    0    0    0    0    0    0

[3,]    3    0    0    0    0    0

[4,]    3    0    0    0    0    2

[5,]    3    0    0    0    0    0

[6,]    3    0    0    0    2    2

> col.mat

     [,1] [,2] [,3] [,4] [,5] [,6]

[1,]    1    1    1    0    0    0

[2,]    0    0    0    0    0    0

[3,]    3    0    0    0    0    0

[4,]    3    0    0    0    0    0

[5,]    3    0    0    0    0    2

[6,]    3    0    0    2    0    2

score 2 · Accepted Answer · 2018-11-20 14:51:24Z

Fun question! Let's take a more involved case

(M <- matrix(c(1, 0, 3, 3, 3, 3, 1, 0, 0, 0, 0, 0, 1, 0, 3, 0, 2, 

               0, 0, 0, 3, 0, 0, 0, 0, 0, 0, 0, 0, 2, 1, 0, 0, 2, 0, 2), 6, 6))

#      [,1] [,2] [,3] [,4] [,5] [,6]

# [1,]    1    1    1    0    0    1

# [2,]    0    0    0    0    0    0

# [3,]    3    0    3    3    0    0

# [4,]    3    0    0    0    0    2

# [5,]    3    0    2    0    0    0

# [6,]    3    0    0    0    2    2

Here's a graph-based solution.

library(igraph)

# Indices of nonzero matrix elements

idx <- which(M != 0, arr.ind = TRUE)

# Adjacency matrix for matrix entries

# Two entries are adjacent if their column or row number differs by one

# Also, due to idx, an implicit condition is also that the two entries are the same

adj <- 1 * (as.matrix(dist(idx, method = "manhattan")) == 1)

# Creating loops as to take into account singleton islands

diag(adj) <- 1

# A corresponding graphs

g <- graph_from_adjacency_matrix(adj, mode = "undirected")

# Connected components of this graph

cmps <- clusters(g)

# Going over unique values of M

for(i in 1:max(M)) {

  # Islands of value i

  un <- unique(cmps$membership[M[idx] == i])

  # More than one island?

  if(length(un) > 1)

    # If so, let's go over islands 2, 3, ...

    for(cmp in un[-1])

      # ... and replace corresponding matrix entries by max(M) + 1

      M[idx[cmps$membership == cmp, , drop = FALSE]] <- max(M) + 1

}



M

#      [,1] [,2] [,3] [,4] [,5] [,6]

# [1,]    1    1    1    0    0    4

# [2,]    0    0    0    0    0    0

# [3,]    3    0    7    7    0    0

# [4,]    3    0    0    0    0    6

# [5,]    3    0    2    0    0    0

# [6,]    3    0    0    0    5    5

Also note that using adj alone we could find all the islands if we could find its permutation leading to a block-diagonal matrix with the maximal number of blocks. Then each block would correspond to an island. However, I couldn't find an R implementation of a relevant procedure.

I was trying to use a depth first search, this is perfect.

– user3651829
Nov 20 '18 at 13:22 — Nov 20 '18 at 13:22

score 1 · Accepted Answer · 2018-11-20 19:47:43Z

'Islands' of non-zero values can be identified by raster::clump*. Then use data.table convenience functions to identify which values should be updated.

library(raster)

library(data.table)



# get index of non-zero values. re-order to match the clump order

ix <- which(m != 0, arr.ind = TRUE)

ix <- ix[order(ix[ , "row"]), ]



# get clumps

cl <- clump(raster(m))

cl_ix <- cl@data@values



# put stuff in a data.table and order by x

d <- data.table(ix, x = m[ix], cl_ix = cl_ix[!is.na(cl_ix)])

setorder(d, x, cl_ix)



# for each x, create a counter of runs of clump index

d[ , g := rleid(cl_ix), by = x]



# for 'duplicated' runs...

# ...add to x based on runs of x and clump index runs

d[g > 1, x := max(d$x) + rleid(x, g)]



# update matrix

m2 <- m

m2[as.matrix(d[ , .(row, col)])] <- d$x



m

#      [,1] [,2] [,3] [,4] [,5] [,6]

# [1,]    1    1    1    0    0    1

# [2,]    0    0    0    0    0    0

# [3,]    3    0    3    3    0    0

# [4,]    3    0    0    0    0    2

# [5,]    3    0    2    0    0    0

# [6,]    3    0    0    0    2    2



m2

#      [,1] [,2] [,3] [,4] [,5] [,6]

# [1,]    1    1    1    0    0    4

# [2,]    0    0    0    0    0    0

# [3,]    3    0    7    7    0    0

# [4,]    3    0    0    0    0    2

# [5,]    3    0    5    0    0    0

# [6,]    3    0    0    0    6    6

*Note that the clump function requires that the igraph package is available.

score 0 · Accepted Answer · 2018-11-20 11:20:58Z

It was harder than I thought becase of the "not both" condition, I achieved the result with a while loop for now, we'll se if it can be improved:

(basically we move by row and check if the island is found, if so we end our research)

# some useful variables

i=1 # row counter

counter=0 # check if island is found

max_m <- max(m) #finds the max value in the matrix, to fill



while(counter == 0) {



  if (any(m[i, ] == 2)) { # check if we find the island in the row, otherwise skip

    row <- m[i, ]

    row[row == 2] <- max_m + 1 # here we change the value

    m[i, ] <- row

    counter <- counter + 1

  }



  i = i + 1 # we move up one row

  #cat("row number: ", i, "n") # sanity check to see if it was an infinite loop

}

m

#      [,1] [,2] [,3] [,4] [,5] [,6]

# [1,]    1    1    1    0    0    0

# [2,]    0    0    0    0    0    0

# [3,]    3    0    0    0    0    0

# [4,]    3    0    0    0    0    4

# [5,]    3    0    0    0    0    0

# [6,]    3    0    0    0    2    2

This is far from perfect, because we move by rows, so if the first island is across a column we would change only the first value.

Example of unexpected result:

#      [,1] [,2] [,3] [,4] [,5] [,6]

# [1,]    1    1    1    0    0    0

# [2,]    0    0    0    0    0    0

# [3,]    3    0    0    0    0    0

# [4,]    3    0    0    0    0    4

# [5,]    3    0    0    0    0    2 # problem here

# [6,]    3    0    0    0    0    0

Data used:

m <- matrix(c(rep(1, 3),

              rep(0, 9),

              3, 

              rep(0, 5),

              3,

              rep(0, 4),

              2,

              3,

              rep(0, 5),

              3,

              rep(0,3),

              rep(2, 2)),ncol=6,nrow=6, byrow = T)

score 0 · Accepted Answer · 2018-11-20 13:20:19Z

This can easily be achieved with package TraMineR.

islander <- function(mat) {

  require(TraMineR)

  rows.mat.seq <- seqdef(mat)  # seeks all sequences in rows 

  cols.mat.seq <- seqdef(t(mat))  # tranposed version

  rows <- seqpm(rows.mat.seq, 22)$MIndex  # seeks for sub sequence 2-2 in rows

  cols <- seqpm(cols.mat.seq, 22)$MIndex  # seeks for sub sequence 2-2 in columns

  if (length(cols) == 0) {  # the row case

    mat[rows, which(mat[rows, ] == 2)] <- 4

    return(mat)

  } else {  # the column case

    mat[which(mat[, cols] == 2), cols] <- 4

    return(mat)

  }

}

Yielding

> islander(row.mat)

...

     [,1] [,2] [,3] [,4] [,5] [,6]

[1,]    1    1    1    0    0    0

[2,]    0    0    0    0    0    0

[3,]    3    0    0    0    0    0

[4,]    3    0    0    0    0    2

[5,]    3    0    0    0    0    0

[6,]    3    0    0    0    4    4



> islander(col.mat)

...

     [,1] [,2] [,3] [,4] [,5] [,6]

[1,]    1    1    1    0    0    0

[2,]    0    0    0    0    0    0

[3,]    3    0    0    0    0    0

[4,]    3    0    0    0    0    0

[5,]    3    0    0    0    0    4

[6,]    3    0    0    2    0    4

Note: If your island is longer, you need to adept the code, e.g. for length of island is 3 do seqpm(., 222). It is certainly possible to implement the consideration of all cases into the function.

Data

row.mat <- structure(c(1, 0, 3, 3, 3, 3, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 

                   0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 0, 0, 0, 2, 0, 2), .Dim = c(6L, 

                                                                                      6L))

col.mat <- structure(c(1, 0, 3, 3, 3, 3, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 

                    0, 0, 0, 0, 0, 0, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 2), .Dim = c(6L, 

                                                                                       6L))



> row.mat

     [,1] [,2] [,3] [,4] [,5] [,6]

[1,]    1    1    1    0    0    0

[2,]    0    0    0    0    0    0

[3,]    3    0    0    0    0    0

[4,]    3    0    0    0    0    2

[5,]    3    0    0    0    0    0

[6,]    3    0    0    0    2    2

> col.mat

     [,1] [,2] [,3] [,4] [,5] [,6]

[1,]    1    1    1    0    0    0

[2,]    0    0    0    0    0    0

[3,]    3    0    0    0    0    0

[4,]    3    0    0    0    0    0

[5,]    3    0    0    0    0    2

[6,]    3    0    0    2    0    2

搜尋此網誌

Cfrgtkky

Identify duplicate islands in a matrix and change their values

4 Answers
4

Your Answer

Post as a guest

4 Answers
4

4 Answers
4

Post as a guest

Popular posts from this blog

Biblatex bibliography style without URLs when DOI exists (in Overleaf with Zotero bibliography)

ComboBox Display Member on multiple fields

Is it possible to collect Nectar points via Trainline?

Identify duplicate islands in a matrix and change their values

4 Answers 4

Your Answer

Sign up or log in

Post as a guest

Post as a guest

4 Answers 4

4 Answers 4

Sign up or log in

Post as a guest

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Popular posts from this blog

Biblatex bibliography style without URLs when DOI exists (in Overleaf with Zotero bibliography)

ComboBox Display Member on multiple fields

Is it possible to collect Nectar points via Trainline?

4 Answers
4

4 Answers
4

4 Answers
4