How to select values based on specific condition from the matrix in R
1
0
Entering edit mode
8.3 years ago
MAPK ★ 2.0k

Hi Guys,

I have a large matrix as shown below mymatrix. I would like to know if there is any way I can get the result in the form of list or matrix for each position with only those nucleotides that have values( i.e ones without NA's) and in decreasing order. For example, I want to get the result in these format:

In the form of matrix:

pos 161111     T(17)  C(1)
pos 99022222        G(24)      A(3)


or in the form of list

pos 161111
T                    C
17                   1

pos 99022222
G                    A
24                   3


and so forth...Thank you.

mymatrix

pos        A   C   G   T   N
1611111    NA  1   NA  17  NA
99022222   3   NA  24  NA  NA
99092333   NA  5   NA  91  NA
233232333  2   22  NA  NA  NA

R • 2.1k views
0
Entering edit mode

How large of a matrix are we talking here? An efficient solution might be needed if it is too large. Otherwise, this problem is relatively easy and I will answer it when you reply.

0
Entering edit mode

It's a fairly large matrix. Thank you!

0
Entering edit mode

What dimensions? dim(mymatrix)

0
Entering edit mode

Right now my matrix is of 6023 by 8.

0
Entering edit mode

OK, that's not too bad. I will write a quick answer for it in a sec.

0
Entering edit mode

Thank you, I would really appreciate that!

2
Entering edit mode
8.3 years ago
Steven Lakin ★ 1.8k

This is not by any means pretty code, but it should work for you and output it in tab delimited format in a file in your output directory. You can then re-read that into R using read.table() with sep set to \t. I didn't include the decreasing order since it is late here, but with a little imagination, you could probably add it.

transformMyMatrix <- function(mymatrix, outputFile) {
for(i in 1:nrow(mymatrix)) {
temp <- paste(c("pos(", mymatrix[i, "pos"], ")"), collapse='')
for(j in 2:ncol(mymatrix)) {
if(!is.na(mymatrix[i,j])) {
temp <- c(temp, paste(c(names(mymatrix)[j], "(", mymatrix[i,j], ")"), collapse=''))
}
}
write.table(t(as.matrix(temp)), file=outputFile, sep="\t", append=T, quote=F, row.names=F, col.names=F)
}
}


Then call the function:

transformMyMatrix(mymatrix, "outputFile.txt")


For example, here is what I get with that:

mymatrix
pos  A  C  G  T  N
1   1611111 NA  1 NA 17 NA
2  99022222  3 NA 24 NA NA
3  99092333 NA  5 NA 91 NA
4 233232333  2 22 NA NA NA

transformMyMatrix(mymatrix, outputFile="newMatrix.txt")
newMatrix
V1   V2    V3
1   pos(1611111) C(1) T(17)
2  pos(99022222) A(3) G(24)
3  pos(99092333) C(5) T(91)
4 pos(233232333) A(2) C(22)


Be aware that if you try to read it back into R in a data frame format, it will automatically fill in empty spots with NAs if the # of columns are uneven, so you might have to address that.

0
Entering edit mode

Thank you so much!