Question

normalized FPKM matrix file EdgeR and GO Mapping

0

Entering edit mode

10.0 years ago

Biogeek ▴ 470

Can someone provide me with a clear cut answer to the below:

Can I use the normalised .FPKM matrix file which feeds into EdgeR from RSEM abundance counts to annotate the top X amount of sequences unregulated/down-regulated between 2 conditions in Blast2goPro

I am reading FPKM values are unreliable....? What I am doing currently is working with the above mentioned normalised FPKM matrix file via Excel to filter out sequences present between conditions to try and group the sequences being up-regulated /down-regulated in respect to their GO mapping and annotations in Blast2go Pro. I work out fold change, then Log2 value. Would this be bullshit? Whilst I have wonderful heat maps generated by EdgeR to show diff. expression can I be confident in using the same FPKM values used to generate said heat maps for annotation analysis? Someone with a bit of knowledge in regards to this please help me.

I have used Trinity/ RSEM/EdgeR pipeline. Alternative methods are welcome but I'm needing somewhat speedy replies.

Any help appreciated.Thanks.

FPKM RNA-Seq differential-exp GOmapping • 6.7k views

ADD COMMENT • link updated 2.6 years ago by Ram 43k • written 10.0 years ago by Biogeek ▴ 470

0

Entering edit mode

Are you just using edgeR for heatmaps (in which case, why bother, just use heatmap.2) or are you trying to use it for statistics too?

ADD REPLY • link 10.0 years ago by Devon Ryan 104k

0

Entering edit mode

I am using EdgeR for heat maps, but I also want statistics from it. I want to find the top 1000 upregulated genes between 2 conditions, then once I find these, I want to feed the sequences into Blast2GOPro via the Trinity .fasta file I already have with the sequences in it.

ADD REPLY • link 10.0 years ago by Biogeek ▴ 470

0

Entering edit mode

I posted a new comment while you were writing this reply, so I'll just refer to it below.

ADD REPLY • link 10.0 years ago by Devon Ryan 104k

0

Entering edit mode

or would it make more sense to use the EdgeR.de.results files?

ADD REPLY • link 10.0 years ago by Biogeek ▴ 470

0

Entering edit mode

BTW, regarding using edgeR (or limma/voom or DESeq(2)) with FPKM/RPKM values, I'll just link to one of Gordon Smyth's many replies on the subject from the bioconductor email list.

ADD REPLY • link 10.0 years ago by Devon Ryan 104k

Ram · Answer 1 · 2014-05-12

edgeR needs raw counts, not FPKM values. In general, RPKM/FPKM values are relatively standard practice: there may be factors that influence them, but I would consider that to be kind of like a batch effect. I think there will always be some confounding factors between experiments (such as different sample preparation, etc.). However, I don't think this is the most important issue in your specific context:

edgeR will give funky results sometimes (which is why I don't use edgeR). If you are trying to compare the count-based edgeR analysis to an analysis based upon FPKM (or just the fold-change values calculated by FPKM), then you probably will notice differences. However, I these are most likely the fault of the edgeR calculation rather then the FPKM calculation. For example, consider the fold-changes for this gene:

https://drive.google.com/file/d/0B1xpw6_kQMKucTZ4cGpKQmptOTg/edit?pli=1

FYI, this is referenced from this blog post:

http://cdwscience.blogspot.com/2013/11/rna-seq-differential-expression.html
You say you are using Trinity/RSEM/edgeR followed by some sort of GO analysis. I am a bit confused because Trinity is for de novo assembly, but I would always recommend a direct alignment when possible. I would use de novo assembly when a reference genome hasn't be characterized, but I would imagine such a genome would be unlikely to have GO terms. In other words, my general answer is that GO enrichment following analysis of FPKM expression values should be OK (I do this routinely), but it sounds like your strategy may have other issues that would cause GO enrichment to be problematic.