Question: Selecting the consensus/overlapped genes from the DE study
gravatar for glady
6 months ago by
glady240 wrote:

Hello everyone, I have 9 RNAseq samples(human), each with 3 replicates. I have performed the read mapping with STAR and quantification with RSEM. The differential expression(DE) study was performed by EBSeq, DESeq2 and limma. To keep our downstream analysis as stringent as possible, we decided to select the overlapped genes between these 3 algorithms. We have got a good overlap(57.3%) between these 3 methods

What my questions is....... 1) Can we go ahead with the overlapped genes? Is this scientifically right?

2) Or should I just select one from the three and then go ahead with it? if yes, then why?

rna-seq • 235 views
ADD COMMENTlink modified 6 months ago by h.mon24k • written 6 months ago by glady240

DESeq2 and limma-voom are, in my experience, the most reliable tools. Taking the overlap between different methods will mainly select for genes that are more strongly DE.

ADD REPLYlink written 6 months ago by Martombo2.4k

Is it okay to go ahead with the overlaps? Because even though I'm getting a good intersection(65%) between limma & DESeq2, the way read counts are normalized in limma & DESeq2 are different.

I hope this doesn't create a problem for the reviewers.

ADD REPLYlink written 6 months ago by glady240

Well, if you have genes with significant changes (statistical and/or expression), then almost all of the methods will pick up. Let us say you are looking at genes that are in twilight zone, that is where the methods matter. Some are sensitive to certain kinds of studies and rest to some other. Look at the manuscripts in your field and see the most used method (effective) and use that. glady. In addition, using different methods is one thing and getting accepted by scientific community is another thing.

ADD REPLYlink written 6 months ago by cpad011211k

He has only three biological replicates for each treatment, so there is a good chance a reasonable proportion of his results are in the twilight zone.

ADD REPLYlink written 6 months ago by h.mon24k

Most of the genes are in the twilight zone. The intersection between the three is somewhere around 58%. While the intersection between limma & DESeq2 is 65%.

ADD REPLYlink written 6 months ago by glady240
gravatar for h.mon
6 months ago by
h.mon24k wrote:

My statistical skills are just rudimentary, so take the advice bellow with a grain of salt:

Although there are several papers using "ensemble" methods for various tasks and showing they perform better than any single tool, I am not aware if this has been done already for RNAseq. My feeling is such method would alter the nominal fdr and statistical power (if you knew your statistical power beforehand) in non-obvious ways. This may or may not be a problem, depending on what you want to do downstream.

Did you check the literature to see if EBSeq, DESeq2 and limma are good tools, i.e., they appropriately control false positive rate as reported, and they have good sensitivity? There is no point in including a tool that call incorrect results.

My suggestion would be to use one tool, chosen before performing the analysis. Now that you already performed with three tools, you are risking p-hacking by choosing the most "interesting" or "biologically plausible" results. If you want to choose one tool now, either do that randomly, or review the literature to choose the best according to it, and not due to your results at hand.

ADD COMMENTlink written 6 months ago by h.mon24k

Thank you for your reply.

DESeq2 & limma are good tools, you produce lower rates of false positives from these two tools as compared to the others. And this is according to the literature, not from my results. However, in my data as well I have observed the same.

ADD REPLYlink written 6 months ago by glady240

Personally, for RNA-seq, having looked at the methods behind each 'tool', I don't feel comfortable using any method other than DESeq2. It's an intelligent method by an intelligent group of people that does better than any other at modelling biases that exist in RNA-seq. For microarray, I only use limma, which is the supreme method in that realm.

As such, I would just use DESeq2 and set cut-offs for fold change and FDR-adjusted P value accordingly.

In saying this, it's not bad science to just overlap the consensus lists from different tools. Just make it clear in your methods what you are doing. Also, be aware of your own internal biases when doing this, as to which h.mon has alluded.

ADD REPLYlink modified 6 months ago • written 6 months ago by Kevin Blighe39k

Are you trying to validate results for further analyses, so you want the most stringent set? Then I believe it is fine to take the intersection of (reliable) tools results. If your only concern are the reviewers, then it would be simpler to stick to one tool results.

ADD REPLYlink written 6 months ago by h.mon24k

Yes, you are right. I wanted to keep the results as stringent as I can for the downstream analysis.

ADD REPLYlink written 6 months ago by glady240
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 766 users visited in the last hour