Does HTseq remove duplicates marked by Picard tools?
Entering edit mode
7.9 years ago

It is not clear to me whether HTseq counts duplicate reads as one read (duplicates marked with Picard-tools). Does anyone have any information on this?

P.S. I am aware of this post How Does Htseq Handle Duplicated Rna-Seq Reads?. The answer seems a bit vague to me though.

htseq RNA-Seq • 3.3k views
Entering edit mode
7.9 years ago

Well it is not advisable to remove PCR duplicates in RNAseq analysis. As Devon precisely explained in the post that you have referenced above "Particularly with highly expressed genes, you're quite likely to observe what would otherwise be termed PCR duplicates that aren't actually duplicates". Now coming back to your question, as normally people don't remove PCR duplicates from RNA-seq bam files, HTSeq has not been coded to ignore reads flagged as PCR duplicates by Picard or any other tool. But if you don't want to consider duplicate reads for your analysis you can remove them and feed the non-duplicate reads to HTSeq as shown below:

samtools view -F 1024 input.bam | htseq-count -r 'pos'  -  Reference.gtf > output_counts.txt

Login before adding your answer.

Traffic: 1593 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6