Question

How do I deal with duplicate reads when using vg to analyse WGS data?

0

Entering edit mode

4.2 years ago

lhomas • 0

I am analyzing WGS data with vg, following the recommendation made on the vgTeam GitHub wiki page ("Working with a whole genome variation graph" and "Whole-genome calling and genotyping"), and I am unsure what to do about the issue of duplicate reads. Does vg take care of this in the commands recommended in the above pages? or is there another vg tool I should be using on the GAMs prior to variant calling?

Thanks in advance.

vg • 847 views

ADD COMMENT • link updated 4.2 years ago by glenn.hickey ▴ 520 • written 4.2 years ago by lhomas • 0

score 0 · Answer 1 · 2020-02-17

This is a great question. There is indeed no vg tool yet to mark duplicates. The only workaround, which isn't great, is to use a BAM file to detect duplicates. Please make a feature request on github! We are working on some changes to replace GAM as a default format which should make it possible to write such a tool more efficiently soon,.