I have Affymetrix gene level expression matrix (genes in the rows and sample ID on the columns), and I tried to quantify the variation of the expressed genes by using
coefficient of variation (CV) method. However, I found a pretty unusual value of
CV when I made a plot and realized something wrong in gene expression data. Here is what I did in R for computing
SD <- apply(eset_HTA20,1, sd) CV <- base::sqrt(exp(SD^2)-1)
but I tried to see the value range of
CV, I found something strange:
> summary(CV) Min. 1st Qu. Median Mean 3rd Qu. Max. 0.04753 0.12946 0.16494 0.20181 0.22925 15.00777
max(CV) should be less than or equal to 1, but I got
15.00777, which means that something wrong with gene expression data. The gene expression data was already preprocessed (normalized, done with background correction). I don't where this problem comes from.
why I use
I used CV to measure the variation of genes which are expressed and want to keep the genes which show high variation, but the value range of CV is not reasonable here.
How can I track down this problem? why I have
CV value with more than one? how can I correct this irregularity? any strategy? any idea to fix the potential problem here?