Question: QuasiSeq vs. DESeq2 results
0
gravatar for Assa Yeroslaviz
5.7 years ago by
Assa Yeroslaviz1.4k
Munich
Assa Yeroslaviz1.4k wrote:

I have a different question but to the same topic, so I hope i don't need to start a new question.

I have analysed the same data now both with the two-step normalization using DESeq2 but also with the erccdashboard package.

In the DESeq analysis i have got almost 400 genes with an adjusted p-value<=0.05. Not one of these genes has a log2FC>0!

This is because most of the genes with a positive log2FC has 'NA'  in the padj column (~90 from ~6000 genes with log2FC>0 are !='NA' in the results list. All the other genes are NA).

If I am doing the DESeq2 analysis without the independent filtering (by setting it in the results function to FALSE), I am getting only 12 genes with an adjusted p-value<=0.05.

As I was wondering why that is, I have also tested the erccdashboard with its QuasiSeq normalisation and differential expression analysis procedure. Here I am getting quite a different picture.

I have "only" 1720 genes with significant adjusted p-value <=0.05, but around 50% are positive (log2FC>0). Most of the statistically significant down-regulated genes are common to DESeq2 and erccdashboard, but none of the up-regulated genes in QuasiSeq.

Is there a reasonable explanation for such behaviour?

How come QuasiSeq calculate a significant p-values for different genes? Are the p-values in the erccdashbaord result files being adjusted for multiple testing hypothesis?

 

thansk

Assa

ADD COMMENTlink modified 5.6 years ago by Biostar ♦♦ 20 • written 5.7 years ago by Assa Yeroslaviz1.4k
I think what is happening in your analysis with deseq2 is that the threshold for independent filtering is set unreasonably high. This happens if in general there is not much differential expression in your data. deseq2 tries to maximize the number of DEGs by excluding, in this case, a high proportion of genes from testing (having so a lighter multiple test correction). this is usually something you don't want. you said you performed a deseq2 analysis without independent filtering. that's the right way to improve your results, but you should also manually exclude some of the lowly expressed genes. otherwise, as you said, you get very few significant results. anyway, does the MA plot look good?
ADD REPLYlink written 5.7 years ago by Martombo2.7k

well,...

the MA plots looks similar, both with the tendency to be below 0. This happens whether I apply the independent filtering or not in the results function. here are the two plots, the first is with the ind. filtering, the second without.

 

I will try to run a manually independent filtering by removing the lowest 30% quantile or by removing all the genes with a summed read counts below 50/100 etc. I need to play a bit with the parameters to get better results.

ADD REPLYlink written 5.7 years ago by Assa Yeroslaviz1.4k
ok this situation is quite peculiar and not really similar to what I suggested before. was this produced with deseq2 size factor normalization or did you use some custom normalization factors? is a widespread downregulation something you expect in your data?
ADD REPLYlink modified 5.7 years ago • written 5.7 years ago by Martombo2.7k

Yes to both your questions.

We have expected a global down-regulation of transcription. We have added Spike-ins (ERCC) exactly to prove that fact. For that reason, I have first calculated the size Factors for the ERCC subset of the data and than used these values for the global analysis of all the genes (without ERCC). This is what gave me this behaviour).

ADD REPLYlink written 5.7 years ago by Assa Yeroslaviz1.4k
2

If you expect global down-regulation, I would also recommend you turn off the beta prior, by setting betaPrior=FALSE when you call DESeq(). The beta prior is useful for most experiments for producing LFC which are robust across experiments, but it relies on zero-centered prior distribution for the LFC.

Another recommendation is to use LFC thresholds (see our paper and vignette), instead of LFC = 0 for the null. Thresholds are useful for defining meaningful sets, when the null is clearly false for most genes.

ADD REPLYlink written 5.7 years ago by Michael Love2.1k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 952 users visited in the last hour
_