Question

vg de novo calling strategy

0

Entering edit mode

3.2 years ago

alexis.simon • 0

Hi all,

I am not sure I understand correctly how we are supposed to use the vg pipeline when we don't have a variant set yet (in vcf format).

My objective is to use vg to investigate SVs and SNPs of mitochondrial sequences in a species complex. In the end I am interested in the coverage of each variant (ossibility of multiple mitochondria in those species). Maybe using graph genomes for mitochondria is not the best option either, but I wanted to avoid assembly issues (for example chimeras or misassembly of repetitive regions) when several mitochondria are found in one individual.

What I did so far from what I could grasp:

Get mitochondrial sequences from genbank and produce a starting graph using pggb (this will capture some existing variants, but not all) + index this graph
map PE reads with vg giraffe for hundreds of individuals
augment the graph for each individual
vg call for each individual producing an individual vcf

From there, I thought I could merge the vcfs, build a new graph containing all variation, map a second time and finally call a second time with all variation (then merge again all vcfs). However there does not seem to be a way to augment the reference graph (.vg format) using the merged vcf (vg construct only takes fasta sequences).

Am I misled in thinking I need to remap everything a second time? Is the first call enough to capture all variation in all individuals?

For example the answer to this issue suggests this is the way to proceed. Will I be missing variants by stopping at merging the first called vcfs. How do I get the same set of variants from all individuals from that point? I also need coverage of de novo variants identified in some individuals in the other individuals, which I don't get by stopping at the first call.

I will gladly take any advice or reference on this, thanks !

vg • 615 views

ADD COMMENT • link 3.2 years ago by alexis.simon • 0