Non-standard Genbank format
0
0
Entering edit mode
2.2 years ago
mrmrwinter ▴ 30

Hi,

When trying to load a genbank file into DNA Features Viewer i get the following error message.

    ---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
File ~/miniconda3/envs/appraisal/envs/dna_features_viewer/lib/python3.10/site-packages/Bio/File.py:72, in as_handle(handleish, mode, **kwargs)
     71 try:
---> 72     with open(handleish, mode, **kwargs) as fp:
     73         yield fp

TypeError: expected str, bytes or os.PathLike object, not TextIOWrapper

During handling of the above exception, another exception occurred:

ValueError                                Traceback (most recent call last)
Input In [6], in <module>
      7 # record.plot(figure_width=12);
      9 fig, (ax1, ax2) = plt.subplots(
     10     2, 1, figsize=(12, 3), sharex=True, gridspec_kw={"height_ratios": [4, 1]}
     11 )
---> 15 fullseq = SeqIO.read("flanking_genes_annotations_annotations.gb", "genbank")
     16 graphic_record = BiopythonTranslator().translate_record(fullseq)
     17 graphic_record.plot(ax=ax1, with_ruler=False, strand_in_label_threshold=4)

File ~/miniconda3/envs/appraisal/envs/dna_features_viewer/lib/python3.10/site-packages/Bio/SeqIO/__init__.py:654, in read(handle, format, alphabet)
    652 iterator = parse(handle, format, alphabet)
    653 try:
--> 654     record = next(iterator)
    655 except StopIteration:
    656     raise ValueError("No records found in handle") from None

File ~/miniconda3/envs/appraisal/envs/dna_features_viewer/lib/python3.10/site-packages/Bio/SeqIO/Interfaces.py:74, in SequenceIterator.__next__(self)
     72 def __next__(self):
     73     try:
---> 74         return next(self.records)
     75     except Exception:
     76         if self.should_close_stream:

File ~/miniconda3/envs/appraisal/envs/dna_features_viewer/lib/python3.10/site-packages/Bio/GenBank/Scanner.py:516, in InsdcScanner.parse_records(self, handle, do_features)
    514 with as_handle(handle) as handle:
    515     while True:
--> 516         record = self.parse(handle, do_features)
    517         if record is None:
    518             break

File ~/miniconda3/envs/appraisal/envs/dna_features_viewer/lib/python3.10/site-packages/Bio/GenBank/Scanner.py:499, in InsdcScanner.parse(self, handle, do_features)
    493 from Bio.GenBank.utils import FeatureValueCleaner
    495 consumer = _FeatureConsumer(
    496     use_fuzziness=1, feature_cleaner=FeatureValueCleaner()
    497 )
--> 499 if self.feed(handle, consumer, do_features):
    500     return consumer.data
    501 else:

File ~/miniconda3/envs/appraisal/envs/dna_features_viewer/lib/python3.10/site-packages/Bio/GenBank/Scanner.py:465, in InsdcScanner.feed(self, handle, consumer, do_features)
    458     return False
    460 # We use the above class methods to parse the file into a simplified format.
    461 # The first line, header lines and any misc lines after the features will be
    462 # dealt with by GenBank / EMBL specific derived classes.
    463 
    464 # First line and header:
--> 465 self._feed_first_line(consumer, self.line)
    466 self._feed_header_lines(consumer, self.parse_header())
    468 # Features (common to both EMBL and GenBank):

File ~/miniconda3/envs/appraisal/envs/dna_features_viewer/lib/python3.10/site-packages/Bio/GenBank/Scanner.py:1571, in GenBankScanner._feed_first_line(self, consumer, line)
   1569     consumer.size(line.split()[-2])
   1570 else:
-> 1571     raise ValueError("Did not recognise the LOCUS line layout:\n" + line)

ValueError: Did not recognise the LOCUS line layout:
LOCUS       PGA_scaffold_21__1_                                     23-FEB-2022

From what i can find online, this is because the LOCUS line in the genbank file is not formatted correctly, so Biopython is throwing an error.

This genbank file was created by exporting annotations from UGENE genome browser.

Is there a way around this? Is thbere a format i can jump through that will transform it properly?

Thanks

biopython ugene genbank • 1.3k views
ADD COMMENT
0
Entering edit mode
LOCUS       SCU49845     5028 bp    DNA             PLN       21-JUN-1999

The LOCUS field contains a number of different data elements, including locus name, sequence length, molecule type, GenBank division, and modification date.

Your LOCUS line is missing some data. I am guessing that must be a part of the problem.

What other formats can you export from UGENE?

ADD REPLY
0
Entering edit mode

Yeah it is. My LOCUS line reads:

LOCUS       SCAFFOLD                             22-FEB-2022

UGENE exports in BED, CSV, GTF, GFF, and a few other niche ones.

ADD REPLY
0
Entering edit mode

Depends on what you were trying to do then. Perhaps you could use GTF/GFF formats.

ADD REPLY
0
Entering edit mode

Agreed. I'll try and pull a .bed or .gff out and convert it to a genbank file elsewhere. Thanks

ADD REPLY
0
Entering edit mode

perhaps fixing up the locus line solves the problem, it just needs a few more fields

https://www.ncbi.nlm.nih.gov/Sitemap/samplerecord.html

ADD REPLY
0
Entering edit mode

This was happening due to a mismatch between the scaffold name in the genbank file and the scaffold sequence in the genbank file.

ADD REPLY

Login before adding your answer.

Traffic: 1582 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6