Question: DESeq2 log2FoldChange vs Salmon log2 TPM
0
gravatar for liartom2
8 weeks ago by
liartom210
liartom210 wrote:

Hi! I am having trouble with analyzing outputs from salmon-tximport-DESeq2 pipeline. So, naturally, I used counts to perform difseq analysis, And then I use mean TPM values to analyze various aspects, like see median expression level of some subset of genes, etc. One peculiar thing is when I plot log2 TPM treated vs log2 TPM untreated and then color dots based on their being identified as differentially expressed (log2Fold change > 1, or < -1, and p adjusted < 0.05 in DESeq2 output), I see assimilarity of up- and down-regulated genes in relation to the x=y line. Can someone please explain to me why this happens? Here is the resulting plot I can attach some R code that I used too

rna-seq salmon deseq2 R • 205 views
ADD COMMENTlink modified 8 weeks ago by Antonio R. Franco4.0k • written 8 weeks ago by liartom210

This is interesting. According to the plot a lot of the highly expressed genes are down-regulated, if you think it's not biology then DESeq normalization was off, did you use the default one?

ADD REPLYlink written 8 weeks ago by Asaf5.6k

No, we think this can actually occur because of biology

ADD REPLYlink written 8 weeks ago by liartom210

Then I think the what you see is the effect of different normalization. Try plotting the normalized counts from DESeq (basically MAplot) and see if the picture is more balanced, my guess is that it should be.

ADD REPLYlink written 8 weeks ago by Asaf5.6k

Yeah I did MA-plot and it looks ok, but the thing is that I am using TPMs for subsequent analysis and now I'm not sure if they're not a total garbage. DESeq doesn't produce normalized counts for each of conditions though, so I can't use it either. I'm also kinda curious what normalization did they use for this logFC. Based on their tutorials and article I figured that only genes with low read counts should be affected by their normalization algo

ADD REPLYlink modified 8 weeks ago • written 8 weeks ago by liartom210
1

You're confusing two things - normalization and dispersion estimate. Normalization is bringing all libraries to a comparable level which is done by multiplying the read counts by a normalization factor which is different for each library and determined using several methods. I guess DESeq and the one used for computing TPM were different. You can get the normalized counts from DESeq2 using counts(dds, normalized=TRUE) and you can use the rlog function if you really want to work with log values.

ADD REPLYlink written 8 weeks ago by Asaf5.6k

Thank you, Asaf! Although you explained normalization which I already knew and said nothing about dispersion estimate. Could you please elaborate a little? I am really confused. Can I use these normalized counts to compare expression instead of TPMs?

ADD REPLYlink written 8 weeks ago by liartom210
1

You can use the normalized expression but it's best if you used the DESeq results directly. You can read about the dispersion estimation in DESeq2 manual and paper, in short, it's their way of estimating the "noise" of each gene.

ADD REPLYlink written 8 weeks ago by Asaf5.6k

You can read about the dispersion estimation in DESeq2 manual and paper

Thanks! I will

it's best if you used the DESeq results directly

Yeah, maybe, but don't you want to include all others genes when you analyze let's say ChIP-seq and RNA-seq together and not only those 1000 that are differentially expressed? Or do you mean that I should use only log2fc and baseMean (I still don't understand the use of this) from DESeq output to test hypotheses?

ADD REPLYlink written 8 weeks ago by liartom210

Basically baseMean and log2fc (with SE) give you all you should know

ADD REPLYlink written 8 weeks ago by Asaf5.6k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1376 users visited in the last hour