Question

HTseq-Count: Long processing time

1

Entering edit mode

2.0 years ago

Sarah ▴ 10

Hi everyone, I'm processing BAM files using htseq-count and it takes very long time to produce the counts for each file. It is about pair-end reads (around 50 million sequence each). It takes 75 minutes to count this pair; is that normal? Thanks.

htseq-count --max-reads-in-buffer=24000000000 -s no -r pos -t exon -i gene_id -f bam Sample1_Aligned.sortedByCoord.out.bam Homo_sapiens.GRCh38.106.gtf > Sample1-output_basename.counts

htseq linux • 1.8k views

ADD COMMENT • link updated 20 months ago by Ram 43k • written 2.0 years ago by Sarah ▴ 10

score 3 · Answer 1 · 2022-04-17

3

Entering edit mode

2.0 years ago

ATpoint 82k

The original paper stated that on a normal laptop (back in 2015) it can process 600.000 read pairs per minute, which would mean 50M reads (I guess you have 50mio pairs) take 83min, so yes, that's normal. A much faster alternative is featureCounts from the subread package.

ADD COMMENT • link 2.0 years ago by ATpoint 82k

0

Entering edit mode

Thank you. I used featureCounts as you suggested. However, the output count.txt is a text file. The question now is how can I read it in R? Do you have any suggestion?

ADD REPLY • link 2.0 years ago by Sarah ▴ 10

0

Entering edit mode

read.delim