Question

Stringtie: use htseq-count or prepDE.py to extract reads?

0

Entering edit mode

6.2 years ago

heyang • 0

Hi All,

I am currently working on downstream analysis of my RNAseq data. I have been using stringtie and its prepDE.py to extract the reads for DESeq2 for DE. Basically following http://ccb.jhu.edu/software/stringtie/index.shtml?t=manual. And also thinking of comparing DE with edgeR.

I came across a tutorial (https://github.com/griffithlab/rnaseq_tutorial/wiki/Expression), where they ran htseq-count on alignments instead to produce raw counts for edgeR.

I read questions in forum saying the two outputs are different.

So, my question is: which reads do I use?

Many thanks!

Stringtie RNA-Seq DESeq2 edgeR • 5.2k views

ADD COMMENT • link updated 6.2 years ago by ATpoint 88k • written 6.2 years ago by heyang • 0

0

Entering edit mode

I will put a question back to you: why did you choose to use StringTie? - you were obviously interested in de novo transcriptome assembly (via HISAT2 / StringTie)?

Results will of course differ between both approaches, but I would expect the real hits to be found from both datasets. It's the other genes that are on the fringes of expression and/or statistical significance that will differ.

ADD REPLY • link 6.2 years ago by Kevin Blighe 89k

score 5 · Answer 1 · 2019-05-01

I recommend using tximport to correct for length bias between transcripts of the same gene. It is from the same developer as DESeq2 and are fully integrated into each other. You might also consider using salmon for quantification rather than classical alignment. Salmon features an elaborate way of dealing with multimappers and corrects for GC bias. Also, save yourself some time and do not start comparing edgeR and DESeq2. A proper comparison is not straight-forward and requires extensive knowledge of how exactly the two tools perform the analysis in order to make the comparison fair/adequate and reproducible. Better read benchmarking papers or blogs like this one from the DESeq2 developer:

https://mikelove.wordpress.com/2016/09/28/deseq2-or-edger/

Both tools are established and perform well. For most users it comes down to choosing the one you feel more comfortable with. In any case, try to validate important genes that you use to make a hypothesis with either independent experiments or published datasets from a comparable setup if possible.

score 2 · Answer 2 · 2019-05-01

2

Entering edit mode

6.2 years ago

Kristoffer Vitting-Seerup ★ 4.2k

I wrote a section about consideration for quantification of RNASeq in my vignette that might be useful.

ADD COMMENT • link 6.2 years ago by Kristoffer Vitting-Seerup ★ 4.2k