Question

HTSeq count versus summarizeOverlaps, mismathc of exon counts

0

Entering edit mode

7.4 years ago

Gama313 ▴ 120

Hello to everybody

I am a newbie and I've recently start analysing RNASeq data. I used respectively:

HTSeqcount:

dexseq_annotation.py;

dexseq_count.py;

summarizeOverlaps:

exonicParts= (txdb, aggregateGenes=FALSE);

se=summarizeOverlaps(exonicParts,bamfiles,mode="Union", ignore.strand=TRUE, singleEnd=TRUE, fragments=FALSE, inter.feature=FALSE)

I did it in order to obtain counts at the exon level as input of DEXSeq for assessing the exon usage. The problem is that I got very different counts in dependence of which program I used: in particular, summarizeoverlaps counts as uniquely mapped reads too many of them (I got 18.000.000 total unique reads for HTSeq vs 42.000.000 apparently unique reads for summarizeOverlaps). Could somebody explin me why it happens? I will thank you in advice

Filippo

RNA-Seq • 1.7k views

ADD COMMENT • link updated 7.4 years ago by GouthamAtla 12k • written 7.4 years ago by Gama313 ▴ 120

0

Entering edit mode

I think I might help you but could you please reformat your message in such a way that the list is better visible?

ADD REPLY • link 7.4 years ago by Matteo Schiavinato ★ 3.6k

score 0 · Answer 1 · 2016-12-06

0

Entering edit mode

7.4 years ago

Devon Ryan 104k

Do not reinvent the wheel by using R, as you've noticed you're likely to get the wrong results. The DEXSeq scripts are known to produce the correct results, do not use anything else.

The reason you get different results in R is because you're counting different alignments in a different manner.

ADD COMMENT • link 7.4 years ago by Devon Ryan 104k