Question: Arrow Assembly polishing Error "KeyError: 'BASECALLERVERSION'"
0
gravatar for David_emir
7 months ago by
David_emir300
India
David_emir300 wrote:

Hello All,

I am in process of polishing an assembly produced by Falcon using arrow. However, it is failing with an error "Keyerror BASECALLERVERSION". These are the steps I followed:

  1. pbalign reference.fasta falcon_draft_assembly.fa --nproc 32 quvir.sam
  2. samtools view -bS quvir.sam > quvir.bam
  3. samtools sort quvir.bam > sorted_quvir.bam
  4. samtools index sorted_quvir.bam
  5. now ran Quiver to Polish assemblies

    quiver -j32 sorted_quvir.bam -r corrected_new_workaround.fasta -o variants.gff -o consensus_quiver.fasta --> throws an error as "KeyError: 'BASECALLERVERSION'"

  6. Tried with Arrow as well and it produces the following error

    arrow sorted_quvir.bam --referenceFilename corrected_new_workaround.fasta -o arrow-polished-consensus.fasta -o arrow-polished-consensus.gff -o arrow-polished-consensus.fastq -j 32

Please let me know where I am going wrong I have Error is as follows. Please note: I am using GenomicConsensus/3.0.2

Thanks a lot for your kind help, Sincerely, Dave

[W::hts_idx_load2] The index file is older than the data file: /gpfs/projects/sysbio/development/denovo/2_denovo_assembly/falcon/2_arabidopsis/falcon_test_1/falcon_test_1/pbalign_test/quiver_test/sorted_quvir.bam.bai 'BASECALLERVERSION' Traceback (most recent call last): File "/gpfs/software/genomics/GenomicConsensus/pitchfork/deployment/lib/python2.7/site-packages/pbcommand/cli/core.py", line 137, in _pacbio_main_runner return_code = exe_main_func(args, *kwargs) File "/gpfs/software/genomics/GenomicConsensus/pitchfork/deployment/lib/python2.7/site-packages/GenomicConsensus/main.py", line 351, in args_runner return tr.main() File "/gpfs/software/genomics/GenomicConsensus/pitchfork/deployment/lib/python2.7/site-packages/GenomicConsensus/main.py", line 265, in main with AlignmentSet(options.inputFilename) as peekFile: File "/gpfs/software/genomics/GenomicConsensus/pitchfork/deployment/lib/python2.7/site-packages/pbcore/io/dataset/DataSetIO.py", line 2723, in __init__ super(AlignmentSet, self).__init__(files, *kwargs) File "/gpfs/software/genomics/GenomicConsensus/pitchfork/deployment/lib/python2.7/site-packages/pbcore/io/dataset/DataSetIO.py", line 1987, in __init__ super(ReadSet, self).__init__(files, *kwargs) File "/gpfs/software/genomics/GenomicConsensus/pitchfork/deployment/lib/python2.7/site-packages/pbcore/io/dataset/DataSetIO.py", line 477, in __init__ self.updateCounts() File "/gpfs/software/genomics/GenomicConsensus/pitchfork/deployment/lib/python2.7/site-packages/pbcore/io/dataset/DataSetIO.py", line 2541, in updateCounts self.assertIndexed() File "/gpfs/software/genomics/GenomicConsensus/pitchfork/deployment/lib/python2.7/site-packages/pbcore/io/dataset/DataSetIO.py", line 2371, in assertIndexed self._assertIndexed((IndexedBamReader, CmpH5Reader)) File "/gpfs/software/genomics/GenomicConsensus/pitchfork/deployment/lib/python2.7/site-packages/pbcore/io/dataset/DataSetIO.py", line 1944, in _assertIndexed self._openFiles() File "/gpfs/software/genomics/GenomicConsensus/pitchfork/deployment/lib/python2.7/site-packages/pbcore/io/dataset/DataSetIO.py", line 2068, in _openFiles resource = IndexedBamReader(location) File "/gpfs/software/genomics/GenomicConsensus/pitchfork/deployment/lib/python2.7/site-packages/pbcore/io/align/BamIO.py", line 388, in __init__ super(IndexedBamReader, self).__init__(fname, referenceFastaFname) File "/gpfs/software/genomics/GenomicConsensus/pitchfork/deployment/lib/python2.7/site-packages/pbcore/io/align/BamIO.py", line 202, in __init__ self._loadReadGroupInfo() File "/gpfs/software/genomics/GenomicConsensus/pitchfork/deployment/lib/python2.7/site-packages/pbcore/io/align/BamIO.py", line 115, in _loadReadGroupInfo basecallerVersion = ".".join(ds["BASECALLERVERSION"].split(".")[0:2]) KeyError: 'BASECALLERVERSION' [ERROR] 'BASECALLERVERSION' Traceback (most recent call last): File "/gpfs/software/genomics/GenomicConsensus/pitchfork/deployment/lib/python2.7/site-packages/pbcommand/cli/core.py", line 137, in _pacbio_main_runner return_code = exe_main_func(args, *kwargs) File "/gpfs/software/genomics/GenomicConsensus/pitchfork/deployment/lib/python2.7/site-packages/GenomicConsensus/main.py", line 351, in args_runner return tr.main() File "/gpfs/software/genomics/GenomicConsensus/pitchfork/deployment/lib/python2.7/site-packages/GenomicConsensus/main.py", line 265, in main with AlignmentSet(options.inputFilename) as peekFile: File "/gpfs/software/genomics/GenomicConsensus/pitchfork/deployment/lib/python2.7/site-packages/pbcore/io/dataset/DataSetIO.py", line 2723, in __init__ super(AlignmentSet, self).__init__(files, *kwargs) File "/gpfs/software/genomics/GenomicConsensus/pitchfork/deployment/lib/python2.7/site-packages/pbcore/io/dataset/DataSetIO.py", line 1987, in __init__ super(ReadSet, self).__init__(files, *kwargs) File "/gpfs/software/genomics/GenomicConsensus/pitchfork/deployment/lib/python2.7/site-packages/pbcore/io/dataset/DataSetIO.py", line 477, in __init__ self.updateCounts() File "/gpfs/software/genomics/GenomicConsensus/pitchfork/deployment/lib/python2.7/site-packages/pbcore/io/dataset/DataSetIO.py", line 2541, in updateCounts self.assertIndexed() File "/gpfs/software/genomics/GenomicConsensus/pitchfork/deployment/lib/python2.7/site-packages/pbcore/io/dataset/DataSetIO.py", line 2371, in assertIndexed self._assertIndexed((IndexedBamReader, CmpH5Reader)) File "/gpfs/software/genomics/GenomicConsensus/pitchfork/deployment/lib/python2.7/site-packages/pbcore/io/dataset/DataSetIO.py", line 1944, in _assertIndexed self._openFiles() File "/gpfs/software/genomics/GenomicConsensus/pitchfork/deployment/lib/python2.7/site-packages/pbcore/io/dataset/DataSetIO.py", line 2068, in _openFiles resource = IndexedBamReader(location) File "/gpfs/software/genomics/GenomicConsensus/pitchfork/deployment/lib/python2.7/site-packages/pbcore/io/align/BamIO.py", line 388, in __init__ super(IndexedBamReader, self).__init__(fname, referenceFastaFname) File "/gpfs/software/genomics/GenomicConsensus/pitchfork/deployment/lib/python2.7/site-packages/pbcore/io/align/BamIO.py", line 202, in __init__ self._loadReadGroupInfo() File "/gpfs/software/genomics/GenomicConsensus/pitchfork/deployment/lib/python2.7/site-packages/pbcore/io/align/BamIO.py", line 115, in _loadReadGroupInfo basecallerVersion = ".".join(ds["BASECALLERVERSION"].split(".")[0:2]) KeyError: 'BASECALLERVERSION'

ADD COMMENTlink modified 4 months ago by whm5680192400 • written 7 months ago by David_emir300

It seems from reading https://github.com/PacificBiosciences/pitchfork/issues/316 that the KeyError 'BASECALLERVERSION' problem is to do with missing PacBio headers in the aligned BAM file. I've logged an issue as https://github.com/PacificBiosciences/pbcore/issues/117 using pbalign and arrow

ADD REPLYlink written 4 months ago by Peter5.7k
0
gravatar for liu3yang
7 months ago by
liu3yang0
liu3yang0 wrote:

you can use subreads bam file instead of fasta in the first step.

ADD COMMENTlink written 7 months ago by liu3yang0
0
gravatar for whm568019240
4 months ago by
whm5680192400 wrote:

Dear David, Have you solved the problem? I met the problem as you describe.Differently,I use blasr but pbalign for mapping.When I run arrow, the ERROR came up.Could you please how to fix it? Regards, Alex

ADD COMMENTlink written 4 months ago by whm5680192400

I just hit something similar, logged as https://github.com/PacificBiosciences/pbcore/issues/117

ADD REPLYlink written 4 months ago by Peter5.7k

It seems you can't use a BAM file made by mapping a FASTA file in this way as it is missing PacBio meta-data which is expected (e.g. the BASECALLERVERSION information). You should be able to map the raw unaligned PacBio BAM file, or use the *.subreadset.xml file which also has metadata.

ADD REPLYlink written 4 months ago by Peter5.7k

Dear Peter, Thanks for your solutions!

ADD REPLYlink written 3 months ago by whm5680192400
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1280 users visited in the last hour