Evaluating alignment performance in vg
Entering edit mode
2.2 years ago
cmirchan ▴ 10


I am currently trying to compare the alignment performance between vg and BWA-Mem. However, what I have found is not what I expected, vg does more or less the same as BWA in terms of number of mapped reads and number of correctly mapped reads. For inputs I have a VCF of ~1000 samples, a linear reference FASTA, and a VCF of a sample not in the large VCF. I have followed the workflows outlined in the wiki for construction, indexing, simulation and mapping.

The script map-sim seems to do what I am trying to achieve, and I have tried to emulate the workflow laid out in the script, but some parts have my confused. I am most interested in the simulation part of the script, particularly how the sim-base, sim-ref, and hap-base graphs are constructed.

Thanks in advance,


vg vgteam • 689 views
Entering edit mode
2.1 years ago
ivar.grytten ▴ 40

I've also had struggles getting the map-sim script to work -- simlarly to you I was stuck at creating the sim-base and hap-base graphs.

I figured it out in the end, and have made a simple tool here based on the map-sim script basically just requires an individual vcf file (e.g. the HG002 file that the vg paper used in their benchmarks) as input, and it will let you simulate reads on a graph from that individual (the simulation is actually done with some custom Python code because I needed to simulate many reads and then vg sim was too slow):


This code is not very well tested, but I think it should work if you get the dependencies right by running pip install on the setup file.

Entering edit mode
2.0 years ago
glenn.hickey ▴ 250

toil-vg has scripts to generate haplotpye sequences to simulate from. There's an (old) example here: https://github.com/vgteam/toil-vg/wiki/Chr21-simulation-experiment-on-AWS


Login before adding your answer.

Traffic: 2447 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6