Question

Statistical analysis mutliple groups of RNA Seq data

2

Entering edit mode

11.2 years ago

ygowtha ▴ 20

Hello

I am trying to do some statistical analysis on RNA-Seq data. I have 5 different treatments with biological duplicates for each conditions. I am trying to analyze these data for differential gene expression among different treatments. Since pairwise testing of the data in initial stages would lead to greater amount of false discovery in genes, what is the best approach to do statistical analysis on multiple groups? Would performing ANOVA analysis be a good method to start with? And what bioinformatics package could I use to perform this analysis (edgeR or BaySeq)?

Thanks

RNA-Seq statistics • 9.7k views

ADD COMMENT • link updated 3.8 years ago by Ram 45k • written 11.2 years ago by ygowtha ▴ 20

0

Entering edit mode

Dou you have controls for the treatments? What's the species? And why you think false discovery should be problem for pairwise testing? You mentioned edgeR, you can adjust your p-values there for multiple testing and choose your expected false discovery rate for whole dataset (because initially you will have separate p-values for each gene, which means that if every single gene has 5 % probability of being called as false positive and you are testing thousands of genes, you will get a lot of false discoveries )

ADD REPLY • link 11.2 years ago by Biomonika (Noolean) 3.2k

0

Entering edit mode

Thank you for your reply Noolean. The species is Chinese Hamster Ovary cell lines. We have used two cell lines but 5 treatments in all. We have about 26000 genes and even with a p-value of 0.01, 260 genes would represent false positives. And this would be just for 1 pairwise testing. Since we have more pairwise comparisons to be made, it looks like it would lead to a good number of false positives. And since we multiple groups, I was wondering if one-way ANOVA was the way to go. If we look at the counts for a gene across all 5 conditions, there are better chances of lower false positives?

I am new to this RNA-Seq game. Thank you for being patient.

ADD REPLY • link updated 5.5 years ago by Ram 45k • written 11.2 years ago by ygowtha ▴ 20

1

Entering edit mode

11.2 years ago

Michael Love ★ 2.6k

In addition to ANOVA test mentioned above. I would recommend looking a PCA plot or MDS plots to get a sense of the distance between samples across conditions. See the MDS plot examples in the edgeR vignette or the PCA examples in the DESeq2 vignette.

ADD COMMENT • link 11.2 years ago by Michael Love ★ 2.6k

Ram · Accepted Answer · 2014-05-02

EdgeR would work for you (it's not the only one, but it's one I've used successfully for these types of experiments). It can handle more complex experimental designs in a better way than just doing a large collection of pairwise tests.

In the manual, check out the sections "More complex experiments (GLM functionality)" and "An ANOVA-like test for any differences". In the GLM mode, you can give it an arbitrary experimental design with multiple treatments, replicates, and batches, and it will do the "right thing".