low variant sites similarity between linear reference and vg call path
0
0
Entering edit mode
11 weeks ago
PolenP • 0

I have aligned my samples to a graph vs a linear reference genomes, however, i only get ~20% shared SNPs, and only 70% concordance within this shared SNPs.

The variants from linear reference genome were called using GATK.

This is how I tried mapping the shortreads with the pangenome graph from PGGB. The vg version is vg_v1.64.0

1) VG GIRAFFE. Map the cleaned shortreads to the indexed graph (using vg autoindex).

singularity exec vg_v1.64.0.sif vg giraffe \ -p \ -t 8 \ -Z file.giraffe.gbz \ -d file.dist \ -m file.min \ -f read1.fq.gz -f read2.fq.gz > sample123.gam

2) VG PACK

singularity exec vg_v1.64.0.sif vg pack -x file.giraffe.gbz -g file.gam --min-mapq 10 --threads 8 -o file.pack

3) VG CALL

singularity exec vg_v1.64.0.sif vg call file.giraffe.gbz -k file.pack -p path3 -p path4 --sample sample123 --genotype-snarls --all-snarls --threads 8 > sample123.vcf

bgzip sample123.vcf

4) Filter the variants

bcftools view -f PASS sample123.vcf.gz -Oz -o sample123.PASS.vcf.gz

bcftools view -v snps sample123.PASS.vcf.gz | \ bgzip -c > sample123.PASS.SNPs.vcf.gz

Can you suggest what may have possibly cause the low similarity? Could it be my vgcall script is incorrect?

call vg • 329 views
ADD COMMENT

Login before adding your answer.

Traffic: 4485 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6