Question

Very few snp and indels variation were identified using PAV variation input file base on vg call

0

Entering edit mode

10 months ago

Wanglh • 0

Hi all,

We want to find the snp and indels variation from the result vcf file BS_graph_call.vcf by using the pan_genome vg analysis software.

There are only fewer than 20 snp and indels variational lines were found in the result vcf file BS_graph_call.vcf. But when we call snp and indels information base on the reference Ah.genome.fa and query BS.genome.fa by using Mummer software, there were approximately 500,000 highly reliable variant sites were identified.

We want to find the reason that why only a very few snp and indels variation were identified from the final result vcf file BS_graph_call.vcf base on the vg software.

The detail command line were shown belown:

Only PAV( Presence-absence Variation) were saved on the input PAV.vcf.gz file.

vg autoindex --workflow giraffe -r Ah.genome.fa -v PAV.vcf.gz -p Ah -t 100
vg giraffe -Z Ah.giraffe.gbz -m Ah.min -d Ah.dist -f BS_1.fq.gz -f BS_2.fq.gz -t 4 > BS.giraffe.gam
vg pack -x Ah.giraffe.gbz -g BS.giraffe.gam -Q 5 -o BS.pack -t 4
vg call -t 4 Ah.giraffe.gbz -k BS.pack > BS_graph_call.vcf

vg version v1.48.0 "Gallipoli"
Compiled with g++ (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0 on Linux
Linked against libstd++ 20210601
Built by ubuntu@ip-172-31-9-38

vg • 478 views

ADD COMMENT • link updated 10 months ago by Ram 43k • written 10 months ago by Wanglh • 0