PacBio's variantCaller's vcf inconsistent with it's consensus sequence
Entering edit mode
2.1 years ago
mark.rose ▴ 50

I'm having a problem with variant calling on some PacBio RS II data. Ive aligned my reads against different small reference (~15kb) using blasr and then ran the resulting bam file through PacBio's samtlink version 6 variantCaller as follows:

variantCaller --algorithm arrow --log-file variantCaller.log --annotateGFF --reportEffectiveCoverage --noEvidenceConsensusCall lowercasereference --minCoverage 40 --coverage 100 --minMapQV 40 --minConfidence 10 --minReadScore 0.75 --minSnr 3.75 --minZScore -3.5 --minAccuracy 0.82  --numWorkers 5 -r ref.fa -o 129C02-vs-ref.fa.sort.vcf -o 129C02-vs-ref.fa.sort.gff -o 129C02-vs-ref.fa.sort.fasta -o 129C02-vs-ref.fa.sort.fastq 129C02-vs-ref.fa.sort.bam

As you can see, among my outputs are a vcf and a fasta file. It is my understanding, perhaps incorrect, that the fasta file is the consensus sequence determined by the reads and that the differences between the original reference and it should be the variants identified in the vcf file (barring the presence of larger structural variants). This does not appear to always be the case. For one reference, the vcf indicates a one 1bp deletion but if I align the reference and the consensus fasta output, two, widely separate 1-bp deletions are obseved, only one being the deletion reported in the vcf.

Can anyone explain this discrepancy?

pacbio variant variantCaller vcf consensus • 531 views

Login before adding your answer.

Traffic: 929 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6