Question: removing the genes with 0 standard deviation
0
gravatar for A
5.0 years ago by
A3.9k
A3.9k wrote:

friends,

i have about 22000 genes in rows and 100 samples in column in my microarray normalizd file. i was going to remove genes with 0 standard deviation then i did like below,

i opened rstudio,

setwd("E:/normalization")

RMA <- read.delim("E:/normalization/RMA.txt", header=FALSE)

mycounts <- read.table("RMA.txt", sep="\t", header=TRUE)

Mat_sd <-apply(mycounts, 1,sd)

ids <- which(Mat_sd<0.1)

 mycounts <- mycounts[-ids,]

write.table(mycounts, file = "RMAsd.txt", dec = ".", sep = "\t", quote = FALSE)

but the output file is empty, means i have only samples name in column and nothing in rows anymore...then what was my fault in the above code?? even i tried for sd<1 but again empty rows.

myposts R software error • 2.7k views
ADD COMMENTlink modified 5.0 years ago by Michael Dondrup47k • written 5.0 years ago by A3.9k

why is the "ids" variable negative?

ADD REPLYlink written 5.0 years ago by informatics bot620

it's not negative. indexing with negative integer vectors removes those. op wanted to remove rows with sd < 0.1, that's the opposite of your solution.

ADD REPLYlink written 5.0 years ago by Michael Dondrup47k

Sorry Michael,

won't you tell me a solution? because really i can't get what i should do

ADD REPLYlink written 5.0 years ago by A3.9k
4
gravatar for Michael Dondrup
5.0 years ago by
Bergen, Norway
Michael Dondrup47k wrote:

Don't negative index by using which, what you are doing is in principle like saying:

matrix(1,nrow=2, ncol=2)[-which(c(FALSE,FALSE)), ]

Most likely your condition is all FALSE or NA. While you expect to get the complete matrix in that case, which(FALSE) yields integer(0) and - integer(0) is still integer(0). So you get an empty matrix in case you should get the full matrix.

You can use indexing with logical vectors, and note that your rows might contain NA's here are correct alternative solutions:

mycounts.filtered <- mycounts[ apply(mycounts, 1, sd, na.rm=TRUE) >= 0.1,]
mycounts.filtered <- mycounts[ ! apply(mycounts, 1, sd, na.rm=TRUE) < 0.1,]
ADD COMMENTlink modified 11 months ago by RamRS30k • written 5.0 years ago by Michael Dondrup47k

thank you so much

I did like below

matrix(1,nrow=2, ncol=2)[-which(c(FALSE,FALSE)), ]
     [,1] [,2]

mycounts.filtered <- mycounts[ ! apply(mycounts, 1, sd, na.rm=TRUE) < 0.1,]
There were 50 or more warnings (use warnings() to see the first 50)
> write.table(mycounts2, file = "RMAsd.txt", dec = ".", sep = "\t", quote = FALSE)

but the output file is empty as already

ADD REPLYlink modified 11 months ago by RamRS30k • written 5.0 years ago by A3.9k
1

And what are these warnings? Btw you wrote the wrong table, if you just want to copy-paste the code that doesn't work, at least you have to understand it a little:

mycounts.filtered <- mycounts[ ! apply(mycounts, 1, sd, na.rm=TRUE) < 0.1,]
There were 50 or more warnings (use warnings() to see the first 50)
> write.table(mycounts.filtered, file = "RMAsd.txt", dec = ".", sep = "\t", quote = FALSE)
#-------------^^^^^^^^^^^^^^^^^----- should be mycounts.filtered, not mycounts like in your script

what does head(mycounts) give?

ADD REPLYlink modified 11 months ago by RamRS30k • written 5.0 years ago by Michael Dondrup47k

Michael,

thank you extend to the world, the file is not empty anymore and the rows decreased from 32550 to 21272, means some genes have been removed

ADD REPLYlink modified 11 months ago by RamRS30k • written 5.0 years ago by A3.9k
1

And the warnings came from non-numeric identifiers in the first column of your data, which result in NAs introduced by coercion warnings, and you have now removed the first column, am I right ;)

ADD REPLYlink modified 11 months ago by RamRS30k • written 5.0 years ago by Michael Dondrup47k

thank you, the output file was contain an extra column before identifier column that I removed that in excel manually

ADD REPLYlink modified 11 months ago by RamRS30k • written 5.0 years ago by A3.9k
1

mycounts[,-1] would have accomplished the same...

ADD REPLYlink modified 11 months ago by RamRS30k • written 5.0 years ago by Michael Dondrup47k

thank you, really it is not easy to edit the file manually

ADD REPLYlink modified 11 months ago by RamRS30k • written 5.0 years ago by A3.9k
1
gravatar for informatics bot
5.0 years ago by
United States
informatics bot620 wrote:

try re-loading your data,

mycounts <- read.table("RMA.txt", sep="\t", header=TRUE)
mycounts2<-mycounts[which(apply(mycounts, 1, sd)<1),]
write.table(mycounts2, file = "RMAsd.txt", dec = ".", sep = "\t", quote = FALSE)
ADD COMMENTlink modified 11 months ago by RamRS30k • written 5.0 years ago by informatics bot620
1

This is even more wrong then the op, don't use which to index, hint, what happens if none of the rows satisfies condition?

ADD REPLYlink written 5.0 years ago by Michael Dondrup47k

actually I don't know anything, my supervisor asked me to remove such a genes

ADD REPLYlink modified 11 months ago by RamRS30k • written 5.0 years ago by A3.9k

sorry you mean I should not use ids <- which(Mat_sd<0.1), then how to remove those rows?

ADD REPLYlink modified 11 months ago by RamRS30k • written 5.0 years ago by A3.9k

thank you, but file is empty yet

after mycounts2<-mycounts[which(apply(mycounts, 1, sd)<1),]

There were 50 or more warnings (use warnings() to see the first 50)

> warnings()
Warning messages:
1: In var(if (is.vector(x)) x else as.double(x), na.rm = na.rm) :
  NAs introduced by coercion
ADD REPLYlink modified 11 months ago by RamRS30k • written 5.0 years ago by A3.9k

Try sd(mycounts2, na.rm=TRUE) and then any NAs in the column var will be ignored.

You should also check out your data to make sure the NA's should be NA's and there haven't been read in errors, commands like head(data), tail(data), and str(data) should help with that.

ADD REPLYlink modified 11 months ago by RamRS30k • written 5.0 years ago by informatics bot620

thanks but empty yet

sd(mycounts2, na.rm=TRUE)
[1] NA

> head(data)

1 function (..., list = character(), package = NULL, lib.loc = NULL, 
2     verbose = getOption("verbose"), envir = .GlobalEnv)            
3 {                                                                  
4     fileExt <- function(x) {                                       
5         db <- grepl("\\\\.[^.]+\\\\.(gz|bz2|xz)$", x)              
6         ans <- sub(".*\\\\.", "", x)                               
> 
> 
ADD REPLYlink modified 11 months ago by RamRS30k • written 5.0 years ago by A3.9k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1675 users visited in the last hour