I am trying to do some statistical analysis on RNA-Seq data. I have 5 different treatments with biological duplicates for each conditions. I am trying to analyze these data for differential gene expression among different treatments. Since pairwise testing of the data in initial stages would lead to greater amount of false discovery in genes, what is the best approach to do statistical analysis on multiple groups? Would performing ANOVA analysis be a good method to start with? And what bioinformatics package could I use to perform this analysis (edgeR or BaySeq)?
Dou you have controls for the treatments? What's the species? And why you think false discovery should be problem for pairwise testing? You mentioned edgeR, you can adjust your p-values there for multiple testing and choose your expected false discovery rate for whole dataset (because initially you will have separate p-values for each gene, which means that if every single gene has 5 % probability of being called as false positive and you are testing thousands of genes, you will get a lot of false discoveries )
Thank you for your reply Noolean. The species is Chinese Hamster Ovary cell lines. We have used two cell lines but 5 treatments in all. We have about 26000 genes and even with a p-value of 0.01, 260 genes would represent false positives. And this would be just for 1 pairwise testing. Since we have more pairwise comparisons to be made, it looks like it would lead to a good number of false positives. And since we multiple groups, I was wondering if one-way ANOVA was the way to go. If we look at the counts for a gene across all 5 conditions, there are better chances of lower false positives?
I am new to this RNA-Seq game. Thank you for being patient.