Question

low variant sites similarity between linear reference and vg call path

0

Entering edit mode

4 months ago

PolenP ▴ 10

I have aligned my samples to a graph vs a linear reference genomes, however, i only get ~20% shared SNPs, and only 70% concordance within this shared SNPs.

The variants from linear reference genome were called using GATK.

This is how I tried mapping the shortreads with the pangenome graph from PGGB. The vg version is vg_v1.64.0

1) VG GIRAFFE. Map the cleaned shortreads to the indexed graph (using vg autoindex).

singularity exec vg_v1.64.0.sif vg giraffe \ -p \ -t 8 \ -Z file.giraffe.gbz \ -d file.dist \ -m file.min \ -f read1.fq.gz -f read2.fq.gz > sample123.gam

2) VG PACK

singularity exec vg_v1.64.0.sif vg pack -x file.giraffe.gbz -g file.gam --min-mapq 10 --threads 8 -o file.pack

3) VG CALL

singularity exec vg_v1.64.0.sif vg call file.giraffe.gbz -k file.pack -p path3 -p path4 --sample sample123 --genotype-snarls --all-snarls --threads 8 > sample123.vcf

bgzip sample123.vcf

4) Filter the variants

bcftools view -f PASS sample123.vcf.gz -Oz -o sample123.PASS.vcf.gz

bcftools view -v snps sample123.PASS.vcf.gz | \ bgzip -c > sample123.PASS.SNPs.vcf.gz

Can you suggest what may have possibly cause the low similarity? Could it be my vgcall script is incorrect?

call vg • 402 views

ADD COMMENT • link updated 4 months ago by GenoMax 154k • written 4 months ago by PolenP ▴ 10