Question: Varscan variant calll error on pacbio data
0
gravatar for mark.rose
3.0 years ago by
mark.rose30
United States
mark.rose30 wrote:

Hello

I'm trying to call variants on an alignment of pacbio data to a reference thusly:

blasr sample.bas.h5 ref.fa -sam  -clipping soft -nproc 4 -minSubreadLength 9500 > sample.sam

I then sampled this output for uniquely aligning reads and converted into a bam file

Then I performed mpileup (samtools version 0.1.19-44428cd)

samtools mpileup -BAQ0 -d 1000 -f ref.fa sub_sample.sort.bam > sub_sample.sort.bam.BAQ0d1000.mpileup

And finally, varscan (v2.3.9)

 java -jar ~/BioInfo/bin/VarScan.v2.3.9.jar mpileup2snp sub_sample.sort.bam.BAQ0d1000.mpileup  --min-var-freq 0.2 --min-avg-qual 0 --output-vcf 1 >sub_sample.sort.bam.BAQ0d1000.mpileup.snp.vcf

This reports: 4 variant positions (1 SNP, 3 indel) 3 were failed by the strand-filter 1 variant positions reported (1 SNP, 0 indel)

And sub_sample.sort.bam.BAQ0d1000.mpileup.snp.vcf shows:

#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT  Sample1
ref      12869   .       C       G       .       PASS    ADP=148;WT=0;HET=1;HOM=0;NC=0   GT:GQ:SDP:DP:RD:AD:FREQ:PVAL:RBQ:ABQ:RDF:RDR:ADF:ADR    0/1:255:148:148:219:270:26.68%:7.7825E-105:10:10:219:0:269:1

sub_sample.sort.bam.BAQ0d1000.mpileup.snp.vcf for this position has:

ref      12869   c       148     ...............-1T......+1G.-1T.+1T,,,.,..*..,.+1T,,..,,..*.,g,.,.+1T,..,,+1g,.,*,+1c,.,,..+1T.,.A.,,,,,,,,,.,,+1c,,,,*.,.-1099TTGCTGATGGCTGCTCCTTTGGCTCTTGGGGATGTGGAGATTGAAATCATTGATAAATTAATCTCCATTCCGTACGTCGAAATGACATTGAGATTGATGGAGCGTTTTGGTGTGAAAGCAGAGCATTCTGATAGCTGGGACAGATTCTACATTAAGGGAGGTCAAAAATACAAGTCCCCTAAAAATGCCTATGTTGAAGGTGATGCCTCAAGCGCAAGCTATTTCTTGGCTGGTGCTGCAATTACTGGAGGGACTGTGACTGTGGAAGGTTGTGGCACCACCAGTTTGCAGGGTGATGTGAAGTTTGCTGAGGTACTGGAGATGATGGGAGCGAAGGTTACATGGACCGAGACTAGCGTAACTGTTACTGGCCCACCGCGGGAGCCATTTGGGAGGAAACACCTCAAGGCGATTGATGTCAACATGAACAAGATGCCTGATGTCGCCATGACTCTTGCTGTGGTTGCCCTCTTTGCCGATGGCCCGACAGCCATCAGAGACGTGGCTTCCTGGAGAGTAAAGGAGACCGAGAGGATGGTTGCGATCCGGACGGAGCTAACCAAGCTGGGAGCATCTGTTGAGGAAGGGCCGGACTACTGCATCATCACGCCGCCGGAGAAGCTGAACGTGACGGCGATCGACACGTACGACGACCACAGGATGGCGATGGCTTTCTCCCTTGCCGCCTGTGCCGAGGTCCCCGTCACCATCCGGGACCCTGGGTGCACCCGGAAGACCTTCCCCGACTACTTCGATGTGCTGAGCACTTTCGTCAAGAATTAAGCTCTAGAAGAAGCTTCGACGAATTTCCCCGATCGTTCAAACATTTGGCAATAAAGTTTCTTAAGATTGAATCCTGTTGCCGGTCTTGCGATGATTATCATATAATTTCTGTTGAATTACGTTAAGCATGTAATAATTAACATGTAATGCATGACGTTATTTATGAGATGGGTTTTTATGATTAGAGTCCCGCAATTATACATTTAATACGCGATAGAAAACAAAATATAGCGCGCAAACTAGGATAAATTATCGCGCCCGGTGTCATCTATGTTACTAGATCGGGAATTGCGGCCGCTCTAGAACTAGTGGATCC,.+1A.-1T.-1T,.,.,,,,.,,,,,+2cc,,,,,..,,,t,+1c.,.,,.,.,,,.,,,.t,t.,....,        //".(&//&/-+&.(///'*/$+,-*-/...#+%./-/.,%$/%-."'-/#'/.)-/-'.,,&(*#-/./#/.-(,-/..(--*.(-.*./$-/)"///(,//.-/..-.//.)'//./)"(/*$/.-/+-.%/-,.('./.'&.-*+

Now the questions.

  1. The read depth passsing my set quality quality filter (ADP) is 148, yet the number of reference reads (RD) and alternate reads (AD) is 219 and 270, respectively. How is that possible?
  2. The alternate allele frequency (FREQ) is 26.68%. How is this being calculated given the info in question 1?
  3. How is this SNP passing my --min-var-freq 0.2 threshold given the low incidence of "G" at this position as shown in the pileup?

Thanks for your help

pacbio mpileup blasr varscan • 1.1k views
ADD COMMENTlink written 3.0 years ago by mark.rose30
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 786 users visited in the last hour