Question: How to get annotation information from identifiers in NCBI nt database?
0
gravatar for O.rka
12 months ago by
O.rka170
O.rka170 wrote:

I have some Otus that I suspect are contaminants from faulty primers. How can I blasted them against the nt database and the results look like this (w/ outfmt=6):

Otu000056       gi|1163074592|gb|CP020116.1|    99.52   209     1       0       4       212     4182852 4183060 4e-102  381
Otu000056       gi|1163034213|gb|CP020058.1|    99.52   209     1       0       4       212     2494582 2494790 4e-102  381
Otu000056       gi|1163027546|gb|CP020055.1|    99.52   209     1       0       4       212     2523673 2523465 4e-102  381
Otu000056       gi|1163005269|gb|CP020048.1|    99.52   209     1       0       4       212     2155734 2155942 4e-102  381
Otu000056       gi|1162933287|gb|CP020107.1|    99.52   209     1       0       4       212     559506  559298  4e-102  381
Otu000056       gi|1162922101|gb|CP020106.1|    99.52   209     1       0       4       212     5006373 5006581 4e-102  381
Otu000056       gi|1162894325|gb|CP020092.1|    99.52   209     1       0       4       212     4746448 4746656 4e-102  381

I have thousands of these hits. How can I go from gi|1163074592|gb|CP020116.1| to Escherichia coli strain AR_0104, complete genome for example w/ the actual annotated hit? Preferably a command line tool that I can just feed my blast6 output into if possible.

assembly • 251 views
ADD COMMENTlink modified 12 months ago by genomax83k • written 12 months ago by O.rka170
1
gravatar for Pierre Lindenbaum
12 months ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum128k wrote:
$ for G in 1163074592 1163034213 1163027546 1163005269; do echo -n "$G " && wget -q -O - "https://eutils.ncbi.nlm.nih.gov/entrez/eutils/esummary.fcgi?db=nucleotide&id=${G}" | xmllint --xpath '//eSummaryResult/DocSum/Item[@Name="Title"]/text()' - && echo ; done

1163074592 Escherichia coli strain AR_0104, complete genome
1163034213 Escherichia coli strain AR_0061, complete genome
1163027546 Escherichia coli strain AR_0069, complete genome
1163005269 Escherichia coli strain AR_0118, complete genome
ADD COMMENTlink written 12 months ago by Pierre Lindenbaum128k
1
gravatar for genomax
12 months ago by
genomax83k
United States
genomax83k wrote:

Using Entrezdirect:

$ for G in 1163074592 1163034213 1163027546 1163005269; do esearch -db nuccore -query $G | efetch -format docsum | xtract -pattern DocumentSummary -element Caption,Title; sleep 3; done
CP020116    Escherichia coli strain AR_0104, complete genome
CP020058    Escherichia coli strain AR_0061, complete genome
CP020055    Escherichia coli strain AR_0069, complete genome
CP020048    Escherichia coli strain AR_0118, complete genome
ADD COMMENTlink modified 12 months ago • written 12 months ago by genomax83k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2491 users visited in the last hour