Question: Ballgown finds few DE genes compared to DESeq
1
gravatar for corend
2.7 years ago by
corend70
corend70 wrote:

I have RNA-seq data with 2 conditions and 3 replicates per conditions.

I ran the New Tuxedo pipeline and also created some read count tables with prepDE.

I analysed differentially expressed genes with Ballgown and DESeq2.

With a treshold of 1 log2FoldChange and 0.01 padj in DESeq2: 14400 /32000 (45%) of DE genes

With a treshold of 0.01 pval in Ballgown: 3678/32000 (of DE genes), even with no fold change treshold, the number of DE genes is (very) lower. In ballgown, what is the difference between qval and pval ? Which one corresponds to padj in DESeq2 ?

I expect many DE genes as conditions are very different biogically (testis vs ovary, same species).

Why do I have a so large difference between softwares?

rna-seq deseq2 ballgown • 2.2k views
ADD COMMENTlink modified 2.4 years ago by shenwei13760 • written 2.7 years ago by corend70
1

Two things: 1) two completely different statistical frameworks, 2) two different pval cutoffs. Hope the one you used for Ballgown is FDR-adjusted. if so, why 0.05 there and 0.01 in DESeq?

ADD REPLYlink written 2.7 years ago by ATpoint46k

I agree for point 1), but even if the statistical methods are different I expect approximately the same results no ?

For point 2) I edit my post thanks !

ADD REPLYlink written 2.7 years ago by corend70

Hi, I am wondering if you have solved this issue. I am getting the same problem that ballgown gave significant less DF genes compared to DESeq2. It might not be the "FPKM" as my tophat-cufflink-cuffdiff produces the similar result as DESeq2. Thank you!@

ADD REPLYlink written 2.4 years ago by shenwei13760

Please use Add comment rather than the answer field for comments. Is there any specific reason you use ballgown rather than DESeq2 or edgeR?

ADD REPLYlink written 2.4 years ago by ATpoint46k

corend, if you could follow up with ATpoint, that would be great. Also, one should never expect that these programs produce the same results.

ADD REPLYlink modified 12 months ago by ATpoint46k • written 2.4 years ago by Kevin Blighe71k
1

In addition (and sorry if I revive a zombie-post), I think that the correct way to perform the analysis with ballgown would be to set libadjust to FALSE, otherwise you will get FPKM (which already normalize somehow for the success of a sequencing run) that are then scaled as (quoting the manual) "the sum of the sample’s log expression measurements below the 75thpercentile of those measurement".

ADD REPLYlink written 12 months ago by Fabio Marroni2.7k
4
gravatar for Kevin Blighe
2.7 years ago by
Kevin Blighe71k
Republic of Ireland
Kevin Blighe71k wrote:

pval is the nominal p-value. qval is the adjusted p-value, which are also known as q-values (not many people know this).

Ballgown may be using FPKM data when conducting the differential expression analysis. FPKM is not suitable for this purpose. Please confirm the type of normalisation that you used in Ballgown.

When you ran DESeq2, did you use the lfcShrink() function? - see the piece of code that I posted here: A: DESeq2 Appropriate Settings for Poorly Clustering Samples?

ADD COMMENTlink modified 2.6 years ago • written 2.7 years ago by Kevin Blighe71k

I am not sure to understand what do you mean with

may be sing FPKM data when conducting the differential expression analysis

I used this command line in ballgown:

results_genes = stattest(bg, feature="gene",
                         covariate="Tissue", getFC=TRUE,
                         meas="FPKM")

In DESeq2, I didn't use lfcShrink(), but I used betaPrior=TRUE. In your post, if I understand well, you say that lfcShrink() is usefull when replicates badly group ?

ADD REPLYlink written 2.6 years ago by corend70
1

Sorry, that was a type error on my part. I meant to write:

Ballgown may be using FPKM data when conducting the differential expression analysis

Based on the code that you've provided, it is indeed using FPKM. FPKM is not suitable for conducting differential expression analysis, just so you are aware. This may indirectly contribute to the problem that you find.

Oh, the use of lfcShrink() is not confined to cases where replicates group poorly. If you have used betaPrior=TRUE, then do not worry about lfcShrink(), for now. lfcShrink is part of the latest updates to the DESeq2 package.

Apart from everything that I've already mentioned, differences across differential expression analysis tools should be expected.

ADD REPLYlink modified 2.6 years ago • written 2.6 years ago by Kevin Blighe71k

Thanks a lot, I will keep using DESeq2 for DE analysis and use Ballgown to have an idea of gene expression level in FPKM. You can switch this to an answer!

ADD REPLYlink written 2.6 years ago by corend70
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1008 users visited in the last hour
_