Question: Remove duplicate reads before variant calling?
gravatar for O.rka
5 weeks ago by
O.rka170 wrote:

I have a very large fastq file that I want to do variant calling on. Would it make sense to remove duplicate reads before mapping using bowtie2? After this I was going to use freebayes.

What effect will removing duplicate reads have on my resulting variant file compared to keeping all of the reads?

Will I have lower confidence variant calls?

snp • 157 views
ADD COMMENTlink modified 5 weeks ago by i.sudbery7.0k • written 5 weeks ago by O.rka170

define "very large fastq file". IMO it's not a good idea to remove duplicate reads before alignment. You should mark duplicate reads (e.g. using picard markduplicates) on the aligned file (bam) and then call variants with the "markduplicated" bam file.

ADD REPLYlink modified 5 weeks ago • written 5 weeks ago by Nicolas Rosewick8.6k
gravatar for i.sudbery
5 weeks ago by
Sheffield, UK
i.sudbery7.0k wrote:

Removing duplicates before alignment (using something like tally) will save you time on the mapping, and shouldn't affect the results, but you will still have to remove duplicates after alignment using Picard MarkDuplicates because two reads can be PCR duplicates without having exactly the same sequence.

TO be honest, unless your fastq is VERY big, alignment is rarely the most time consuming part of varient calling, and it might make sense to alignment everything and the MarkDuplicates, since you have to do this anyway.

ADD COMMENTlink written 5 weeks ago by i.sudbery7.0k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1544 users visited in the last hour