vg de novo calling strategy
Entering edit mode
8 months ago

Hi all,

I am not sure I understand correctly how we are supposed to use the vg pipeline when we don't have a variant set yet (in vcf format).

My objective is to use vg to investigate SVs and SNPs of mitochondrial sequences in a species complex. In the end I am interested in the coverage of each variant (ossibility of multiple mitochondria in those species). Maybe using graph genomes for mitochondria is not the best option either, but I wanted to avoid assembly issues (for example chimeras or misassembly of repetitive regions) when several mitochondria are found in one individual.

What I did so far from what I could grasp:

  • Get mitochondrial sequences from genbank and produce a starting graph using pggb (this will capture some existing variants, but not all) + index this graph
  • map PE reads with vg giraffe for hundreds of individuals
  • augment the graph for each individual
  • vg call for each individual producing an individual vcf

From there, I thought I could merge the vcfs, build a new graph containing all variation, map a second time and finally call a second time with all variation (then merge again all vcfs). However there does not seem to be a way to augment the reference graph (.vg format) using the merged vcf (vg construct only takes fasta sequences).

Am I misled in thinking I need to remap everything a second time? Is the first call enough to capture all variation in all individuals?

For example the answer to this issue suggests this is the way to proceed. Will I be missing variants by stopping at merging the first called vcfs. How do I get the same set of variants from all individuals from that point? I also need coverage of de novo variants identified in some individuals in the other individuals, which I don't get by stopping at the first call.

I will gladly take any advice or reference on this, thanks !

vg • 228 views

Login before adding your answer.

Traffic: 2190 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6