Remove duplicate reads before variant calling?
1
1
Entering edit mode
4.3 years ago
O.rka ▴ 720

I have a very large fastq file that I want to do variant calling on. Would it make sense to remove duplicate reads before mapping using bowtie2? After this I was going to use freebayes.

What effect will removing duplicate reads have on my resulting variant file compared to keeping all of the reads?

Will I have lower confidence variant calls?

snp • 2.1k views
ADD COMMENT
2
Entering edit mode

define "very large fastq file". IMO it's not a good idea to remove duplicate reads before alignment. You should mark duplicate reads (e.g. using picard markduplicates) on the aligned file (bam) and then call variants with the "markduplicated" bam file.

ADD REPLY
2
Entering edit mode
4.3 years ago

Removing duplicates before alignment (using something like tally) will save you time on the mapping, and shouldn't affect the results, but you will still have to remove duplicates after alignment using Picard MarkDuplicates because two reads can be PCR duplicates without having exactly the same sequence.

TO be honest, unless your fastq is VERY big, alignment is rarely the most time consuming part of varient calling, and it might make sense to alignment everything and the MarkDuplicates, since you have to do this anyway.

ADD COMMENT

Login before adding your answer.

Traffic: 1262 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6