ecoTag hanging when assigning taxonomy in Obitools?
2
1
Entering edit mode
5.2 years ago

Hi there,

I am currently trying to build a trnL reference library from the EMBL Release 130 for a plant metabarcoding project. I have been trying to follow the tutorial outlined on the Obitools webpage but I've been running into several issues.

One issue I'm having is with the taxonomy assignment using the ecoTag function. Every time I try to run the ecoTag program, it only goes to about 74% completion and then the time reads out 00:00:00 and the cache size is printed. I've tried increasing the cache size by several orders of magnitude but to no avail. Am I missing something in what could be causing this part of the program to hang?

Code below:

ecotag -d ./GlobalRef/EMBL_Taxo/R130/embl_r130 -R ./GlobalRef/gh_database_r130.fasta --sort=count -m 0.98 -r --cache-size=1000000000 physeq_merged_count_sort.fasta > physeq_merged_count_sort_ecotagGlobal_r130.fasta

next-gen sequencing • 1.6k views
ADD COMMENT
0
Entering edit mode

I have the same issue! No matter the cache or adding the --skip-on-error option... Have you solved this by now?

ADD REPLY
1
Entering edit mode
5.2 years ago

Yes! It turned out that some of my sequence records were empty and that was causing the issues. You can remove empty sequences using this command: obigrep --lmin=1

ADD COMMENT
0
Entering edit mode
5.2 years ago

Hi Patrick, I am trying to run ecotag also guided by the tutorial, but I am having some problems. I am getting an error when I try to run ecotag, the error is written below, and I think it has to do with my database.

MacBook: taniavc$ ecotag -t ~/data/TAXA/nodes -R ~/data/DABA/Teleostei_mito.fasta S1_seeds_nonsingleton.fasta > S1.ecotag.fasta
Reading taxonomy dump file...
List all taxonomy rank...
Indexing taxonomy...
Indexing parent and rank...
Adding scientific name...
Adding taxid alias...
Adding deleted taxid...
Reading reference DB ...  : 304600
Traceback (most recent call last):
  File "/Users/taniavc/miniconda3/bin/ecotag", line 346, in <module>
    taxonlink[seqid]=int(seq['taxid'])
  File "src/obitools/_obitools.pyx", line 263, in obitools._obitools.BioSequence.__getitem__
  File "src/obitools/_obitools.pyx", line 217, in obitools._obitools.BioSequence.getKey
KeyError: 'taxid'

Info: Inside ~/data/TAXA/nodes I have: citations.dmp, division.dmp, merged.dmp, nodes.dmp, delnodes.dmp, gencode.dmp, names.dmp. I constructed my DB from Genbank not using EcoPCR, and its a Fasta file. My S1_seeds_nonsingleton.fasta is the result of the previous Obitools steps.

Any ideas for what I am doing wrong? Thanks!

ADD COMMENT
0
Entering edit mode

When compiling the database, use obigrep -A taxid to consider reference sequences with a valid taxid to your ~/data/DABA/Teleostei_mito.fasta S1_seeds_nonsingleton.fasta database.

ADD REPLY

Login before adding your answer.

Traffic: 2806 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6