HTseq-Count: Long processing time
1
1
Entering edit mode
2.0 years ago
Sarah ▴ 10

Hi everyone, I'm processing BAM files using htseq-count and it takes very long time to produce the counts for each file. It is about pair-end reads (around 50 million sequence each). It takes 75 minutes to count this pair; is that normal? Thanks.

htseq-count --max-reads-in-buffer=24000000000 -s no -r pos -t exon -i gene_id -f bam Sample1_Aligned.sortedByCoord.out.bam Homo_sapiens.GRCh38.106.gtf > Sample1-output_basename.counts
htseq linux • 1.8k views
ADD COMMENT
3
Entering edit mode
2.0 years ago
ATpoint 82k

The original paper stated that on a normal laptop (back in 2015) it can process 600.000 read pairs per minute, which would mean 50M reads (I guess you have 50mio pairs) take 83min, so yes, that's normal. A much faster alternative is featureCounts from the subread package.

ADD COMMENT
0
Entering edit mode

Thank you. I used featureCounts as you suggested. However, the output count.txt is a text file. The question now is how can I read it in R? Do you have any suggestion?

ADD REPLY
0
Entering edit mode
read.delim
ADD REPLY
0
Entering edit mode

featureCounts is so much faster! Thank you, ATPoint!

ADD REPLY

Login before adding your answer.

Traffic: 2567 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6