Question: removing the genes with 0 standard deviation
0
A3.9k wrote:

friends,

i have about 22000 genes in rows and 100 samples in column in my microarray normalizd file. i was going to remove genes with 0 standard deviation then i did like below,

i opened rstudio,

setwd("E:/normalization")

Mat_sd <-apply(mycounts, 1,sd)

ids <- which(Mat_sd<0.1)

mycounts <- mycounts[-ids,]

write.table(mycounts, file = "RMAsd.txt", dec = ".", sep = "\t", quote = FALSE)

but the output file is empty, means i have only samples name in column and nothing in rows anymore...then what was my fault in the above code?? even i tried for sd<1 but again empty rows.

myposts R software error • 2.7k views
modified 5.0 years ago by Michael Dondrup47k • written 5.0 years ago by A3.9k

why is the "ids" variable negative?

it's not negative. indexing with negative integer vectors removes those. op wanted to remove rows with sd < 0.1, that's the opposite of your solution.

Sorry Michael,

won't you tell me a solution? because really i can't get what i should do

4
Michael Dondrup47k wrote:

Don't negative index by using `which`, what you are doing is in principle like saying:

``````matrix(1,nrow=2, ncol=2)[-which(c(FALSE,FALSE)), ]
``````

Most likely your condition is all `FALSE` or `NA`. While you expect to get the complete matrix in that case, `which(FALSE)` yields `integer(0)` and `- integer(0)` is still `integer(0)`. So you get an empty matrix in case you should get the full matrix.

You can use indexing with logical vectors, and note that your rows might contain NA's here are correct alternative solutions:

``````mycounts.filtered <- mycounts[ apply(mycounts, 1, sd, na.rm=TRUE) >= 0.1,]
mycounts.filtered <- mycounts[ ! apply(mycounts, 1, sd, na.rm=TRUE) < 0.1,]
``````

thank you so much

I did like below

``````matrix(1,nrow=2, ncol=2)[-which(c(FALSE,FALSE)), ]
[,1] [,2]

mycounts.filtered <- mycounts[ ! apply(mycounts, 1, sd, na.rm=TRUE) < 0.1,]
There were 50 or more warnings (use warnings() to see the first 50)
> write.table(mycounts2, file = "RMAsd.txt", dec = ".", sep = "\t", quote = FALSE)
``````

but the output file is empty as already

1

And what are these warnings? Btw you wrote the wrong table, if you just want to copy-paste the code that doesn't work, at least you have to understand it a little:

``````mycounts.filtered <- mycounts[ ! apply(mycounts, 1, sd, na.rm=TRUE) < 0.1,]
There were 50 or more warnings (use warnings() to see the first 50)
> write.table(mycounts.filtered, file = "RMAsd.txt", dec = ".", sep = "\t", quote = FALSE)
#-------------^^^^^^^^^^^^^^^^^----- should be mycounts.filtered, not mycounts like in your script
``````

what does `head(mycounts)` give?

Michael,

thank you extend to the world, the file is not empty anymore and the rows decreased from 32550 to 21272, means some genes have been removed

1

And the warnings came from non-numeric identifiers in the first column of your data, which result in `NAs introduced by coercion` warnings, and you have now removed the first column, am I right ;)

thank you, the output file was contain an extra column before identifier column that I removed that in excel manually

1

`mycounts[,-1]` would have accomplished the same...

thank you, really it is not easy to edit the file manually

1
informatics bot620 wrote:

``````mycounts <- read.table("RMA.txt", sep="\t", header=TRUE)
mycounts2<-mycounts[which(apply(mycounts, 1, sd)<1),]
write.table(mycounts2, file = "RMAsd.txt", dec = ".", sep = "\t", quote = FALSE)
``````
1

This is even more wrong then the op, don't use which to index, hint, what happens if none of the rows satisfies condition?

actually I don't know anything, my supervisor asked me to remove such a genes

sorry you mean I should not use `ids <- which(Mat_sd<0.1)`, then how to remove those rows?

thank you, but file is empty yet

after `mycounts2<-mycounts[which(apply(mycounts, 1, sd)<1),]`

``````There were 50 or more warnings (use warnings() to see the first 50)

> warnings()
Warning messages:
1: In var(if (is.vector(x)) x else as.double(x), na.rm = na.rm) :
NAs introduced by coercion
``````

Try `sd(mycounts2, na.rm=TRUE)` and then any NAs in the column var will be ignored.

You should also check out your data to make sure the NA's should be NA's and there haven't been read in errors, commands like `head(data)`, `tail(data)`, and `str(data)` should help with that.

thanks but empty yet

``````sd(mycounts2, na.rm=TRUE)
 NA

1 function (..., list = character(), package = NULL, lib.loc = NULL,
2     verbose = getOption("verbose"), envir = .GlobalEnv)
3 {
4     fileExt <- function(x) {
5         db <- grepl("\\\\.[^.]+\\\\.(gz|bz2|xz)\$", x)
6         ans <- sub(".*\\\\.", "", x)
>
>
``````