Searching nucleotide databases for term with special character
1
0
Entering edit mode
7.3 years ago
lvogel ▴ 30

Hi, I am trying to search some databases such as GenBank and EMBL-EBI for a search term that includes a colon. The problem is, even when I follow the advice to escape it with a backslash (https://www.ebi.ac.uk/ebisearch/documentation.ebi#query_syntax) it still doesn't include the colon in the string to search for, resulting in many results being included that I don't want. Does anyone know how to help?

database • 1.4k views
ADD COMMENT
0
Entering edit mode

which database/field are you searching ? If it's a DNA sequence, it would be easier to scan the files from a FTP server...

ADD REPLY
0
Entering edit mode

I'm searching for a term to be included in the organism name. It has the format [PREFIX]:[number] . There isn't a special field for it, but some entries do have it somewhere in the name. Those are the entries that I want. I am looking for DNA sequences, but I'm not searching the actual sequences right now. I'm trying to create a reference database. Interesting idea, though...

ADD REPLY
2
Entering edit mode
7.3 years ago

searching for a term to be included in the organism name. It has the format [PREFIX]:[number]

wget -O taxdmp.zip ftp://ftp.ncbi.nih.gov/pub/taxonomy/taxdmp.zip
unzip  taxdmp.zip names.dmp

$ grep -E  '\:[0-9]+' names.dmp | grep "scientific name" | tail
1897030 |   Firmicutes bacterium CAG:176_59_8   |       |   scientific name |
1897031 |   Firmicutes bacterium CAG:176_63_11  |       |   scientific name |
1897032 |   Firmicutes bacterium CAG:24053_14   |       |   scientific name |
1897033 |   Firmicutes bacterium CAG:272_52_7   |       |   scientific name |
1897034 |   Firmicutes bacterium CAG:321_26_22  |       |   scientific name |
1897035 |   Firmicutes bacterium CAG:552_39_19  |       |   scientific name |
1897036 |   Firmicutes bacterium CAG:65_45_313  |       |   scientific name |
1903134 |   Enterobacter sp. ST121:950178628    |       |   scientific name |
1905348 |   Platerodrilus sp. MNCN/DNA:86739    |       |   scientific name |
ADD COMMENT

Login before adding your answer.

Traffic: 2016 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6