Question: Obiconvert produces empty EcoPCR database...why?
gravatar for mingala
8 months ago by
mingala0 wrote:

Hi all,

New to the OBITools suite, and am trying to use ecoPCR to develop bat-specific COI primers to modify into blocking primers for metabarcoding.

I have downloaded bat COI sequences and the fasta file looks like so:

> MH234219 organism=Hypsugo dolichodon; taxid=1897726; Hypsugo dolichodon voucher CBC02156 cytochrome oxidase subunit 1 (COI) gene, partial cds; mitochondrial

When I try to convert this file to ecoPCR format using the following command, it skips all the entries and produces an empty ecoPCR database. Without the --skip on error flag, it says the sequences do not have taxid's (which they do in the header). Anyone know why this is happening?? Thanks in advance.

> obiconvert --fasta --ecopcrdb-output=ECOPCROUTPUT  / newsequences.fasta > 'my_bat_COI_database' --skip-on-error
ADD COMMENTlink modified 4 weeks ago by patrick_freeman0 • written 8 months ago by mingala0

Doing that still results in the same output.

ADD REPLYlink written 8 months ago by mingala0

Same thing is happening to me after applying obiaddtaxids using an NCBI taxdump. Did you ever get resolution on this?

ADD REPLYlink written 4 weeks ago by patrick_freeman0

I had a similar issue myself - I was trying to convert a "homemade" fasta file to ecopcr format and kept getting a 'sequence has no taxid' error. My fasta headers only have a sequence name followed by the taxid - they don't have any of the other variables shown above - e.g. >Species name (sampleXYZ) taxid=12345

I tried various different things - removing parentheses from sequence names, replacing spaces in sequence names with underscores, making sure my header whitespaces were the same format as an old obitools output fasta file and, lastly, making sure I had a semi-colon (;) after my taxid codes (i.e. >Species_name_sampleXYZ taxid=12345; ). It was putting in the semi-colon that finally got obiconvert working for me.

I'm not really sure why mingala's example file above isn't working, as there is already a semi-colon after taxid, but maybe editing the fasta so that that 'taxid' field immediately follows the accession number (instead of the 'organism' field) would help? I have a suspicion that obiconvert expects to see the taxid straight after the sequence name/accession, although I'm not really sure - I had a look at the .py scripts referenced in my error messages to try and figure out the formatting requirements, but my coding knowledge is pretty basic and I had trouble understanding them.

ADD REPLYlink written 22 days ago by klrdna0

P.S. After I solved this issue, I got another error - 'Keyerror: 12345'. It seems that these errors are caused by using an outdated taxonomy dump, and arise when you have a recently created taxid in your fasta that isn't present in your tax dump. Downloading the most recent tax dump files from NCBI fixed this for me.

ADD REPLYlink written 22 days ago by klrdna0
gravatar for h.mon
8 months ago by
h.mon24k wrote:
ECOPCROUTPUT  / newsequences.fasta

Remove the / from your command-line:

ECOPCROUTPUT newsequences.fasta
ADD COMMENTlink modified 4 weeks ago • written 8 months ago by h.mon24k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1882 users visited in the last hour