Hi,
I am trying to build a custom reference database for two marker genes (COI & 16S) from the full NCBI nt database downloaded form here: ftp://ftp.ncbi.nlm.nih.gov/blast/db/
I downloaded all nt files (nt.00 to nt.22) unziped them and am now trying to follow this tutorial https://git.metabarcoding.org/obitools/obitools3/wikis/Wolf-tutorial-with-the-OBITools3 to build a reference database with ecopcr.
However, I fail at the first step when trying to import the files
obi import --genbank-input nt/nt.00.tar Fabian_Work/refdb
fails with
2020-03-31 18:12:25,809 [import : INFO ] obi import: imports an object (file(s), obiview, taxonomy...) into a DMS
2020-03-31 18:12:26,070 [import : INFO ] Opened file: nt/nt.00.tar
2020-03-31 18:14:21,529 [import : INFO ] Importing 269 entries
Could not import sequence id: b'Alias' (error raised: 'NoneType' object has no attribute 'group' )
Traceback (most recent call last):
File "python/obitools3/parsers/genbank.pyx", line 40, in obitools3.parsers.genbank.genbankParser
AttributeError: 'NoneType' object has no attribute 'group'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/Applications/OBITools/obitools3/obi3-env/bin/obi", line 62, in <module>
config[root_config_name]['module'].run(config)
File "python/obitools3/commands/import.pyx", line 253, in obitools3.commands.import.run
File "python/obitools3/parsers/genbank.pyx", line 151, in genbankIterator_file
File "python/obitools3/parsers/genbank.pyx", line 56, in obitools3.parsers.genbank.genbankParser
IndexError: list index out of range
Also, the documentation says that
For EMBL files, you can give the path to a directory with several EMBL files.
is that true for genbank files, too? If not, can I somehow import them into the same DMS?
Any help is greatly appreciated.
Fabian
Are you sure this tool is designed to use entire
nt
database? Can it reads compressed tar files (which is what you are trying to use)?maybe not? in OBITools there was an
obiconvert
function that could convert the nt files to ecoPCRdb format. But I haven't seen this function implemented in OBITools3. And I cannot get OBITools installed on mac OSX Catalina (It always fails with wrong python, regardless which Python I use or if I try to install it via anaconda)If OBITools has any 32-bit code in it, it will not work on macOS catalina. I have a feeling that you are using the wrong input. You could get the "COI" gene sequences from NCBI here and use them as input.
Afaik OBITolls is a python package but the previous version required Python2 and the new version is build for python3. Not sure about 32 bit code. For now, I am trying to build a ref db with the EMBL files instead (as in the tutorial) and will leave the NCBI files unless someone knows how they should be imported. Thanks for your help!