Question

Getting rid of noise in gene expression

1

Entering edit mode

7.7 years ago

ghunt ▴ 10

Hi, I am working with a data set containing gene-expression of cancer patients. And I am being told that the data obtained can be noisy. The gene expression value ranges from 0 to 20. And the number of patients is close to 2000. There are close to 50K of gene expression value of illumina id.

What would be the best way to filter out the noise due to the error of the illumina sequencing technique. Is there a general technique to get rid of noise.

Thanks.

genome noise-removal • 2.9k views

ADD COMMENT • link updated 7.7 years ago by informatics bot ▴ 760 • written 7.7 years ago by ghunt ▴ 10

0

Entering edit mode

If the data contains values 0 to 20 and an "illumina id", it is not sequencing data. It is microarray most likely.

ADD REPLY • link 7.7 years ago by igor 13k

score 2 · Answer 1 · 2016-07-23

There are many ways to reduce noise in RNA-seq gene expression data. I personally have found the following approach useful when dealing with heterogeneous tissue and >100 samples.

1.) Remove genes with low gene expression.

2.) Remove samples that lack adequate sequencing depth (My lab usually sequences at least 8 million mapped genes)

3.) Remove samples based upon their standard deviations away from the mean on a PCA/MDS plot.

4.) Use R packages such as PEER and sva/combat to remove batch effects from the data.

5.) Profile you data with tools such as WGCNA, see if any individual samples are driving non-nonsensical modules that don't relate to biology.