removing the genes with 0 standard deviation
2
0
Entering edit mode
8.5 years ago
zizigolu ★ 4.3k

Friends,

I have about 22000 genes in rows and 100 samples in column in my microarray normalized file. I was going to remove genes with 0 standard deviation then I did like below,

I opened rstudio,

setwd("E:/normalization")
RMA <- read.delim("E:/normalization/RMA.txt", header=FALSE)
mycounts <- read.table("RMA.txt", sep="\t", header=TRUE)
Mat_sd <-apply(mycounts, 1,sd)
ids <- which(Mat_sd<0.1)
mycounts <- mycounts[-ids,]
write.table(mycounts, file = "RMAsd.txt", dec = ".", sep = "\t", quote = FALSE)

but the output file is empty, means I have only samples name in column and nothing in rows anymore. Then what was my fault in the above code? Even I tried for sd<1 but again empty rows.

R software-error • 5.4k views
ADD COMMENT
0
Entering edit mode

why is the "ids" variable negative?

ADD REPLY
0
Entering edit mode

it's not negative. indexing with negative integer vectors removes those. op wanted to remove rows with sd < 0.1, that's the opposite of your solution.

ADD REPLY
0
Entering edit mode

Sorry Michael,

won't you tell me a solution? because really i can't get what i should do

ADD REPLY
4
Entering edit mode
8.5 years ago
Michael 54k

Don't negative index by using which, what you are doing is in principle like saying:

matrix(1,nrow=2, ncol=2)[-which(c(FALSE,FALSE)), ]

Most likely your condition is all FALSE or NA. While you expect to get the complete matrix in that case, which(FALSE) yields integer(0) and - integer(0) is still integer(0). So you get an empty matrix in case you should get the full matrix.

You can use indexing with logical vectors, and note that your rows might contain NA's here are correct alternative solutions:

mycounts.filtered <- mycounts[ apply(mycounts, 1, sd, na.rm=TRUE) >= 0.1,]
mycounts.filtered <- mycounts[ ! apply(mycounts, 1, sd, na.rm=TRUE) < 0.1,]
ADD COMMENT
0
Entering edit mode

thank you so much

I did like below

matrix(1,nrow=2, ncol=2)[-which(c(FALSE,FALSE)), ]
     [,1] [,2]

mycounts.filtered <- mycounts[ ! apply(mycounts, 1, sd, na.rm=TRUE) < 0.1,]
There were 50 or more warnings (use warnings() to see the first 50)
> write.table(mycounts2, file = "RMAsd.txt", dec = ".", sep = "\t", quote = FALSE)

but the output file is empty as already

ADD REPLY
1
Entering edit mode

And what are these warnings? Btw you wrote the wrong table, if you just want to copy-paste the code that doesn't work, at least you have to understand it a little:

mycounts.filtered <- mycounts[ ! apply(mycounts, 1, sd, na.rm=TRUE) < 0.1,]
There were 50 or more warnings (use warnings() to see the first 50)
> write.table(mycounts.filtered, file = "RMAsd.txt", dec = ".", sep = "\t", quote = FALSE)
#-------------^^^^^^^^^^^^^^^^^----- should be mycounts.filtered, not mycounts like in your script

what does head(mycounts) give?

ADD REPLY
0
Entering edit mode

Michael,

thank you extend to the world, the file is not empty anymore and the rows decreased from 32550 to 21272, means some genes have been removed

ADD REPLY
1
Entering edit mode

And the warnings came from non-numeric identifiers in the first column of your data, which result in NAs introduced by coercion warnings, and you have now removed the first column, am I right ;)

ADD REPLY
0
Entering edit mode

thank you, the output file was contain an extra column before identifier column that I removed that in excel manually

ADD REPLY
1
Entering edit mode

mycounts[,-1] would have accomplished the same...

ADD REPLY
0
Entering edit mode

thank you, really it is not easy to edit the file manually

ADD REPLY
1
Entering edit mode
8.5 years ago

try re-loading your data,

mycounts <- read.table("RMA.txt", sep="\t", header=TRUE)
mycounts2<-mycounts[which(apply(mycounts, 1, sd)<1),]
write.table(mycounts2, file = "RMAsd.txt", dec = ".", sep = "\t", quote = FALSE)
ADD COMMENT
1
Entering edit mode

This is even more wrong then the op, don't use which to index, hint, what happens if none of the rows satisfies condition?

ADD REPLY
0
Entering edit mode

actually I don't know anything, my supervisor asked me to remove such a genes

ADD REPLY
0
Entering edit mode

sorry you mean I should not use ids <- which(Mat_sd<0.1), then how to remove those rows?

ADD REPLY
0
Entering edit mode

thank you, but file is empty yet

after mycounts2<-mycounts[which(apply(mycounts, 1, sd)<1),]

There were 50 or more warnings (use warnings() to see the first 50)

> warnings()
Warning messages:
1: In var(if (is.vector(x)) x else as.double(x), na.rm = na.rm) :
  NAs introduced by coercion
ADD REPLY
0
Entering edit mode

Try sd(mycounts2, na.rm=TRUE) and then any NAs in the column var will be ignored.

You should also check out your data to make sure the NA's should be NA's and there haven't been read in errors, commands like head(data), tail(data), and str(data) should help with that.

ADD REPLY
0
Entering edit mode

thanks but empty yet

sd(mycounts2, na.rm=TRUE)
[1] NA

> head(data)

1 function (..., list = character(), package = NULL, lib.loc = NULL, 
2     verbose = getOption("verbose"), envir = .GlobalEnv)            
3 {                                                                  
4     fileExt <- function(x) {                                       
5         db <- grepl("\\\\.[^.]+\\\\.(gz|bz2|xz)$", x)              
6         ans <- sub(".*\\\\.", "", x)                               
> 
> 
ADD REPLY

Login before adding your answer.

Traffic: 2338 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6