Question: Obiconvert produces empty EcoPCR database...why?
0
gravatar for mingala
17 months ago by
mingala0
mingala0 wrote:

Hi all,

New to the OBITools suite, and am trying to use ecoPCR to develop bat-specific COI primers to modify into blocking primers for metabarcoding.

I have downloaded bat COI sequences and the fasta file looks like so:

> MH234219 organism=Hypsugo dolichodon; taxid=1897726; Hypsugo dolichodon voucher CBC02156 cytochrome oxidase subunit 1 (COI) gene, partial cds; mitochondrial
accctttatcttttatttggtgcttgagccggtatagtgggcaccgcattaagtctttta
attcgcgctgaattaggtcaaccaggagccctacttggagatgaccagatttataatgta
atcgtaactgctcatgcttttgtgataattttctttatagtcatacccattataattgga
ggcttcggaaattgacttgtcccattaataattggggctcctgatatagcattcccgcga
ataaataatataagcttttgacttcttcctccttctttcttactacttctggcatcatct
atagtagaagcgggcgcgggaacaggctgaacagtttatccccccttagcgggaaattta
gcccatgcaggagcctccgtggacttaacaattttttctctacacttagcaggtgtctca
tcaatcttaggagcaattaactttattactacaattattaatataaaacctcccgctctt
tcccaatatcaaacaccattatttgtatgatctgttctaatcacagctgtacttcttcta
ttatcccttcctgtattagctgctggtattacaatactattgacagaccgaaacctaaac
acgaccttttttgacccagctggcggaggagatcctattctataccaacatctattt

When I try to convert this file to ecoPCR format using the following command, it skips all the entries and produces an empty ecoPCR database. Without the --skip on error flag, it says the sequences do not have taxid's (which they do in the header). Anyone know why this is happening?? Thanks in advance.

> obiconvert --fasta --ecopcrdb-output=ECOPCROUTPUT  / newsequences.fasta > 'my_bat_COI_database' --skip-on-error
ADD COMMENTlink modified 6 months ago by steffie1110 • written 17 months ago by mingala0

Doing that still results in the same output.

ADD REPLYlink written 16 months ago by mingala0

Same thing is happening to me after applying obiaddtaxids using an NCBI taxdump. Did you ever get resolution on this?

ADD REPLYlink written 9 months ago by patrick_freeman0

I had a similar issue myself - I was trying to convert a "homemade" fasta file to ecopcr format and kept getting a 'sequence has no taxid' error. My fasta headers only have a sequence name followed by the taxid - they don't have any of the other variables shown above - e.g. >Species name (sampleXYZ) taxid=12345

I tried various different things - removing parentheses from sequence names, replacing spaces in sequence names with underscores, making sure my header whitespaces were the same format as an old obitools output fasta file and, lastly, making sure I had a semi-colon (;) after my taxid codes (i.e. >Species_name_sampleXYZ taxid=12345; ). It was putting in the semi-colon that finally got obiconvert working for me.

I'm not really sure why mingala's example file above isn't working, as there is already a semi-colon after taxid, but maybe editing the fasta so that that 'taxid' field immediately follows the accession number (instead of the 'organism' field) would help? I have a suspicion that obiconvert expects to see the taxid straight after the sequence name/accession, although I'm not really sure - I had a look at the .py scripts referenced in my error messages to try and figure out the formatting requirements, but my coding knowledge is pretty basic and I had trouble understanding them.

ADD REPLYlink written 9 months ago by klrdna0

P.S. After I solved this issue, I got another error - 'Keyerror: 12345'. It seems that these errors are caused by using an outdated taxonomy dump, and arise when you have a recently created taxid in your fasta that isn't present in your tax dump. Downloading the most recent tax dump files from NCBI fixed this for me.

ADD REPLYlink written 9 months ago by klrdna0
1
gravatar for steffie11
6 months ago by
steffie1110
Germany/Potsdam
steffie1110 wrote:

Hi Mingala, you need to indicate the taxonomic database by using -d during obiconvert.

Get the dump taxonomy

mkdir TAXO
cd TAXO
wget ftp://ftp.ncbi.nih.gov/pub/taxonomy/taxdump.tar.gz
tar -zxcf taxdump.tar.gz

Format the taxonomy for OBITools

obitaxonomy -t TAXO -d TAXO

Attribute taxonomic ids to the sequences

obiaddtaxids -d ~/TAXO  ~/sequences.fasta > sequences.taxid.fasta

Formatting the database

obiconvert -d ./TAXO --fasta --ecopcrdb-output=sequencesdb sequences.taxid.fasta

I downloaded my green algae barcodes from BOLD, some of the BOLD sequences have not been published to NCBI. Therefore they have their taxids to higher taxonomic levels.

ADD COMMENTlink modified 6 months ago • written 6 months ago by steffie1110
0
gravatar for h.mon
17 months ago by
h.mon28k
Brazil
h.mon28k wrote:
ECOPCROUTPUT  / newsequences.fasta

Remove the / from your command-line:

ECOPCROUTPUT newsequences.fasta
ADD COMMENTlink modified 9 months ago • written 17 months ago by h.mon28k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1084 users visited in the last hour