Question: Variant calling pacbio bacteria
0
gravatar for rmf
18 months ago by
rmf1.1k
rmf1.1k wrote:

I have PacBio reads from three bacterial genomes as three samples. They were de-novo assembled using HGAP4 and polished. One of samples was taken as reference. And subreads from each sample was mapped back to this reference for variant calling. Variant calling was performed using two workflows:

subreads -> ngmlr -> Sniffles -> VCF
subreads -> pbmm2 -> pbsv -> VCF

Both VCF files have no DP, MQ or any of the standard quality metrics. Is this normal?

  • How can we access the quality of these variants? Visual examination seems tricky due to huge number of indels in the BAM file.
  • Another point is that I couldn't find any ploidy option in any of these tools. Is that not important? I am guessing these tools are developed for diploids.

Here is the PBSV vcf (696 rows in total)

pb_501_001  8   pbsv.CNV.12 T   <CNV>   .   NearContigEnd   SVTYPE=cnv;END=132000;SVLEN=131992;SHADOWED CN  3
pb_501_001  8   pbsv.BND.pb_501_001:8-pb_501_001:5167147    T   ]pb_501_001:5167147]T   .   NearContigEnd   SVTYPE=BND;CIPOS=-7,95;MATEID=pbsv.BND.pb_501_001:5167147-pb_501_001:8;MATEDIST=5167139 GT:AD:DP    1/1:0,343:343

Here is the VCF from Sniffles (891 rows in total)

pb_501_001  1   0   N   <DUP>   .   PASS    SUPP=3;SUPP_VEC=111;SVLEN=5167155;SVTYPE=DUP;SVMETHOD=SURVIVOR1.0.7;CHR2=pb_501_001;END=5167156;CIPOS=0,0;CIEND=0,0;STRANDS=-+  GT:PSV:LN:DR:ST:QV:TY:ID:RAL:AAL:CO 1/1:NA:5167155:0,136:-+:.:DUP:0:NA:NA:pb_501_001_1-pb_501_001_5167156   1/1:NA:5167155:0,219:-+:.:DUP:0:NA:NA:pb_501_001_1-pb_501_001_5167156   1/1:NA:5167155:0,100:-+:.:DUP:0:NA:NA:pb_501_001_1-pb_501_001_5167156
pb_501_001  1612    1   N   <INS>   .   PASS    SUPP=3;SUPP_VEC=111;SVLEN=321;SVTYPE=INS;SVMETHOD=SURVIVOR1.0.7;CHR2=pb_501_001;END=1612;CIPOS=0,0;CIEND=0,0;STRANDS=+- GT:PSV:LN:DR:ST:QV:TY:ID:RAL:AAL:CO 0/0:NA:321:24,2:+-:.:INS:1:NA:NA:pb_501_001_1612-pb_501_001_1612    0/0:NA:321:48,2:+-:.:INS:1:NA:NA:pb_501_001_1612-pb_501_001_16120/0:NA:321:19,3:+-:.:INS:1:NA:NA:pb_501_001_1612-pb_501_001_1612

These are the types of variants in the sniffles VCF.

cat sniffles.vcf | grep -v "^#" | cut -f5 | sort | uniq -c
    101 <DEL>
     86 <DUP>
     12 <DUP/INS>
    340 <INS>
    162 <INV>
    154 <INVDUP>
     36 <INV/INVDUP>

I had to use subreads because I wasn't able to generate CCS from this data. Are there any other tools to call variants (mostly SNPs) from bacterial pacbio data? I tried PBHoover, but unfortunately it just crashes when I run it.

ADD COMMENTlink modified 17 months ago by wrowell0 • written 18 months ago by rmf1.1k
0
gravatar for wrowell
17 months ago by
wrowell0
Pacific Biosciences
wrowell0 wrote:

For pbsv, DP is reported for all variant types except for CNV. You can see DP in the FORMAT field for the example you provided:

pb_501_001  8   pbsv.BND.pb_501_001:8-pb_501_001:5167147    T   ]pb_501_001:5167147]T   .   NearContigEnd   SVTYPE=BND;CIPOS=-7,95;MATEID=pbsv.BND.pb_501_001:5167147-pb_501_001:8;MATEDIST=5167139 GT:AD:DP    1/1:0,343:343

While pbsv is designed for diploid samples, you can identify variants in haploid organisms by either post filtering for GT 1/1, or by increasing the percent of supporting reads in pbsv call.

  -P,--call-min-read-perc-one-sample          Ignore calls supported by < P% of reads in every sample. [20]

You may need to experiment a bit here, but setting this to 80% or higher should give you the results you expect without sacrificing sensitivity.

ADD COMMENTlink written 17 months ago by wrowell0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1943 users visited in the last hour
_