Question: Does HTseq remove duplicates marked by Picard tools?
gravatar for jollier.liu
5.2 years ago by
United States
jollier.liu0 wrote:

It is not clear to me whether HTseq counts duplicate reads as one read (duplicates marked with Picard-tools). Does anyone have any information on this?

P.S. I am aware of this post How Does Htseq Handle Duplicated Rna-Seq Reads?. The answer seems a bit vague to me though.  

rna-seq htseq • 2.3k views
ADD COMMENTlink modified 5.2 years ago by Ashutosh Pandey12k • written 5.2 years ago by jollier.liu0
gravatar for Ashutosh Pandey
5.2 years ago by
Ashutosh Pandey12k wrote:

Well it is not advisable to remove PCR duplicates in RNAseq analysis. As Devon precisely explained in the post that you have referenced above "Particularly with highly expressed genes, you're quite likely to observe what would otherwise be termed PCR duplicates that aren't actually duplicates". Now coming back to your question, as normally people don't remove PCR duplicates from RNA-seq bam files, HTSeq has not been coded to ignore reads flagged as PCR duplicates by Picard or any other tool. But if you don't want to consider duplicate reads for your analysis you can remove them and feed the non-duplicate reads to HTSeq as shown below:

samtools view -F 1024 input.bam | htseq-count -r 'pos'  -  Reference.gtf > output_counts.txt
ADD COMMENTlink modified 5.2 years ago • written 5.2 years ago by Ashutosh Pandey12k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1106 users visited in the last hour