How do I deal with duplicate reads when using vg to analyse WGS data?
1
0
Entering edit mode
16 months ago
lhomas • 0

I am analyzing WGS data with vg, following the recommendation made on the vgTeam GitHub wiki page ("Working with a whole genome variation graph" and "Whole-genome calling and genotyping"), and I am unsure what to do about the issue of duplicate reads. Does vg take care of this in the commands recommended in the above pages? or is there another vg tool I should be using on the GAMs prior to variant calling?

Thanks in advance.

vg • 346 views
ADD COMMENT
0
Entering edit mode
16 months ago
glenn.hickey ▴ 240

This is a great question. There is indeed no vg tool yet to mark duplicates. The only workaround, which isn't great, is to use a BAM file to detect duplicates. Please make a feature request on github! We are working on some changes to replace GAM as a default format which should make it possible to write such a tool more efficiently soon,.

ADD COMMENT

Login before adding your answer.

Traffic: 2400 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6