How can I select highly variable genes from the RNASeq gene dataset?
1
0
Entering edit mode
3.9 years ago

How can I select only the genes with a certain variance threshold, I know techniques like Z statistic and T statistics and then calculation p-value and then correct for FWER or FDR? but is there some modern easy to use techniques that is also available in R with some solid references that I can deploy to my data? I have RNASeq FPKM dataset to be specific.

Regards

RNA-Seq Significant • 3.0k views
ADD COMMENT
0
Entering edit mode

what's wrong with the typical Z and T statistics with FDR ???

ADD REPLY
0
Entering edit mode

See the great and well explained answer from the person below.

ADD REPLY
1
Entering edit mode
3.9 years ago
Michael 53k

There is no need to cite simple statistics such as variance or Z statistics, however, you might consider median absolute deviation (MAD) as a robust alternative to variance, afaik this is commonly used as a filtering step in network analysis. If you do the filtering in R or another software, you can add a sentence like "all statistical analyses were done in R (R Core Team (2018))". You might also use other more advanced differential expression statistics like limma, DESeq2, etc. If you are using a particular package, you cite this package.

In general, there is no single best or authoritative way of filtering prior to downstream analyses.

In particular, I do not know what your intended downstream analysis is, so there is not much more I can recommend at this stage.

R Core Team (2018). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. URL https://www.R-project.org/.

Caveat: do not use FPKM, use CPM or TPM instead (this has been discussed here many times).

ADD COMMENT
0
Entering edit mode

My objective is to do coexpression analysis after I filter some genes with low variance, is that now more clearer to you now what I want to do?

ADD REPLY
0
Entering edit mode

For co-expression analysis, you might use MAD with a certain threshold (e.g. MAD > 2) but remember that such thresholds are necessarily arbitrary.

ADD REPLY

Login before adding your answer.

Traffic: 1187 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6