Question

Multibamcov Or Htseq-Count To Count Read Per Feature ?

1

Entering edit mode

11.9 years ago

Nicolas Rosewick 10k

Hi,

I'm wondering what is the best method to extract the number of reads for each feature in a gtf (or gff, bed,...) file. I tried htseq-count and multiBamCov but they gave me different results.it seems that multiBam count all the reads (complete and partial aligned) associated with each exon. It means there are many reads are count twice or more time.

After doing DE analysis (DESeq) on both read count matrix (one from htseq, one from multiBamCov), the results are quite surprising.

pval adjusted < 0.05 multiBamCov : 123 gene differentially expressed htseq : 880 gene

Intersection : 118 gene

So which one to use ? is it possible to change multiBamCov to be more strict ? maybe is it possible to use other tools from bedtools ?

Thanks,

N.

htseq read counts • 4.6k views

ADD COMMENT • link updated 11.9 years ago by Ryan Dale 5.0k • written 11.9 years ago by Nicolas Rosewick 10k

score 0 · Answer 1 · 2012-06-14

0

Entering edit mode

11.9 years ago

Nicolas Rosewick 10k

edit >nothing..

ADD COMMENT • link 11.9 years ago by Nicolas Rosewick 10k

0

Entering edit mode

Give people some time... It's been less than a day since your question was posted, people on the planet are working in different time zones.

ADD REPLY • link 11.9 years ago by Leonor Palmeira 3.8k

0

Entering edit mode

Sorry to be too hasty

ADD REPLY • link 11.9 years ago by Nicolas Rosewick 10k

score 0 · Answer 2 · 2012-06-15

htseq-count tends to be more selective about what reads will be counted. For example, it won't count ambiguous reads according to rules detailed at http://www-huber.embl.de/users/anders/HTSeq/doc/count.html. htseq-count also does not count multimappers (reads with BAM flag 0x100). Depending on your data, these differences could greatly influence the final results.