Question: RNAseq differential expression analysis : no significative FDR but significative GO enrichment
1
gravatar for guillaume.rbt
22 days ago by
guillaume.rbt730
France
guillaume.rbt730 wrote:

Hi all,

I'm currently doing RNAseq differential expression analysis, on which I've got no significative genes with FDR < 0.05. (I'm working on human tumor biopsies data, with 111 samples.)

However, when I perform GO enrichment analysis on the top hits (p-value < 0.05, logFC > 1 or < -1), it gives significative enriched pathways, which seems consistent from a biology point a view.

This bring me two questions :

  • Could those results be relevant? Would the biological signal detected with GO enrichment in the top hits counteract the fact that there is no significative genes detected?
  • If so, how could I illustrate those findings, I've tried to do heatmaps on a subset of genes belonging to a specific pathway, but, as the gene expression differential is rather low between the two studied conditions, there is no relevant clustering of the samples in the figure between the two conditions. (see below for example the type of expression patterns that I get for one gene between the two studied conditions)

enter image description here

Thank you in advance for any input

ADD COMMENTlink modified 22 days ago • written 22 days ago by guillaume.rbt730
1

Could you explain how you did your DGE analysis please ? Also could you explain your experiment design (how many groups, and sample per group)

ADD REPLYlink written 22 days ago by Nicolas Rosewick8.3k

I study the difference of gene expression between a group of responder (n=60) and a group of non responders (n=51) to a treatment. My dataset is made of data from 3 different studies, hence I've corrected for study variations by taking the study as a confounding effect in my design (I use Limma/Voom).

ADD REPLYlink written 22 days ago by guillaume.rbt730
1

Have you run and interpreted a PCA analysis of your data ?

I think this is important, since if you have not well separated samples, the DE analysis will fail. In some cases, it could be worthy to discard some of your samples based upon the PCA analysis

ADD REPLYlink modified 21 days ago • written 22 days ago by Antonio R. Franco4.2k

Yes I run PCA before doing my differential expression analysis. There were no clustering between the responders and non responders group, but a clustering linked to the studies.

ADD REPLYlink written 22 days ago by guillaume.rbt730
1

Did you combine independent datasets into the same ststistical analysis? If so it is normal and expected what you see, that is called a batch effect. What do you mean by study?

ADD REPLYlink modified 22 days ago • written 22 days ago by ATpoint25k

Yes I mean that there is a batch effect, which should be corrected in the design I've used.

ADD REPLYlink modified 21 days ago • written 21 days ago by guillaume.rbt730
1

I do not think this is possible / a good approach. If you really have three independent studies and the studies are = groups that you use in your design there is no way to distinguish biological from batch effect. You would need replicates of all conditions in each group. Are the three studies at least identical in terms of sample preparation = same RNA preparation regime, same sample prep kit (probably the most important factor) etc, or is this completely different?

ADD REPLYlink modified 21 days ago • written 21 days ago by ATpoint25k

Unfortunately the details of RNA preparation are not given for two of the datasets that I've used. I know that a strong batch effect is present, that's why I'm cautious with the results.

I've tried other ways of correcting the batch effect (using the Limma function removeBatchEffect before the differential expression test, and also analysing independantly each dataset then doing a meta-analysis of p-values with Stouffer's test) When I cross the results of each meat-analysis method I get similar results, with the same seemingly relevant biological signal.

ADD REPLYlink written 21 days ago by guillaume.rbt730

So does each dataset contain the groups you are analyzing so responders and non-responders or are the non-responders from one study and the responders from another study?

ADD REPLYlink written 21 days ago by ATpoint25k
1

Fortunately all datasets contains both responders and non responders samples.

ADD REPLYlink written 21 days ago by guillaume.rbt730

The PCA will let you how much dispersion you have in your data in general terms. If data are not clearly separated into clusters, I would expect a weak DE result

ADD REPLYlink written 21 days ago by Antonio R. Franco4.2k
1

Hi,

Be aware, that during DGE analysis you are looking at differences on the gene level. When for instance, the severity of the cancer developement comes from an higher amount of erroneously spliced mRNA transcripts in one sample group, the summarized gene expression stays the same, because every read which is associated to the same gene is counted as a hit.

Generally, I personally, would think that in most cases you probably have differentially expressed genes in the comparison between 2 groups. However, if your to groups a very heterogeneous, showing higher differences inbetween individuals of the same group than between the sample groups, than you get no significantly expressed genes in your analysis. If this could be the case I would ask the provider of the data: Where all the samples prepared at the same facility? Do you have the same gender distribution in both groups? Do the donors come from the same region? Were the samples collected at roughly the same stage of cancer progression. If not, you have to include this information during definition of the design for the DGE analysis software.

Edit: I didn't saw your comment before posting. So I would only ask the question, wether the difference between the groups could potentially come from alternative splicing?

ADD REPLYlink modified 22 days ago • written 22 days ago by caggtaagtat860

I indeed have higher differences inbetween individuals of the same group than between the sample groups,hence I wasn't expecting highly significative different response in my results. I didn't think about the possibility of different alternative splicing between the groups, thanks for the idea I will dig into that!

ADD REPLYlink written 22 days ago by guillaume.rbt730
1

I have encountered this situation quite a few times and tend to accept and report results based on GO/Pathway enrichment significance. One argument is that DE p-value < 0.05 and fdr > 0.05 does not mean your full set of genes is insignificant. Rather, your set of genes is likely to include some proportion of false positives which may actually be filtered-out by the enrichment analysis. The second argument is more abstract in that you gain "bits of small evidence" in your DE while enrichment provides you with the "big picture". Having said that, I would check the workflow by submitting several random sets from your gene list and make sure you do not always end up with enriched terms.

ADD REPLYlink written 22 days ago by jomo018520

I had the same idea that the GO enrichment could act as a filter for false positive. Thank for the tip of testing with random sets of genes, I will check that.

ADD REPLYlink written 22 days ago by guillaume.rbt730
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1705 users visited in the last hour