Question: How to get annotation information from identifiers in NCBI nt database?
0
gravatar for O.rka
17 days ago by
O.rka90
O.rka90 wrote:

I have some Otus that I suspect are contaminants from faulty primers. How can I blasted them against the nt database and the results look like this (w/ outfmt=6):

Otu000056       gi|1163074592|gb|CP020116.1|    99.52   209     1       0       4       212     4182852 4183060 4e-102  381
Otu000056       gi|1163034213|gb|CP020058.1|    99.52   209     1       0       4       212     2494582 2494790 4e-102  381
Otu000056       gi|1163027546|gb|CP020055.1|    99.52   209     1       0       4       212     2523673 2523465 4e-102  381
Otu000056       gi|1163005269|gb|CP020048.1|    99.52   209     1       0       4       212     2155734 2155942 4e-102  381
Otu000056       gi|1162933287|gb|CP020107.1|    99.52   209     1       0       4       212     559506  559298  4e-102  381
Otu000056       gi|1162922101|gb|CP020106.1|    99.52   209     1       0       4       212     5006373 5006581 4e-102  381
Otu000056       gi|1162894325|gb|CP020092.1|    99.52   209     1       0       4       212     4746448 4746656 4e-102  381

I have thousands of these hits. How can I go from gi|1163074592|gb|CP020116.1| to Escherichia coli strain AR_0104, complete genome for example w/ the actual annotated hit? Preferably a command line tool that I can just feed my blast6 output into if possible.

assembly • 91 views
ADD COMMENTlink modified 17 days ago by genomax67k • written 17 days ago by O.rka90
1
gravatar for Pierre Lindenbaum
17 days ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum120k wrote:
$ for G in 1163074592 1163034213 1163027546 1163005269; do echo -n "$G " && wget -q -O - "https://eutils.ncbi.nlm.nih.gov/entrez/eutils/esummary.fcgi?db=nucleotide&id=${G}" | xmllint --xpath '//eSummaryResult/DocSum/Item[@Name="Title"]/text()' - && echo ; done

1163074592 Escherichia coli strain AR_0104, complete genome
1163034213 Escherichia coli strain AR_0061, complete genome
1163027546 Escherichia coli strain AR_0069, complete genome
1163005269 Escherichia coli strain AR_0118, complete genome
ADD COMMENTlink written 17 days ago by Pierre Lindenbaum120k
1
gravatar for genomax
17 days ago by
genomax67k
United States
genomax67k wrote:

Using Entrezdirect:

$ for G in 1163074592 1163034213 1163027546 1163005269; do esearch -db nuccore -query $G | efetch -format docsum | xtract -pattern DocumentSummary -element Caption,Title; sleep 3; done
CP020116    Escherichia coli strain AR_0104, complete genome
CP020058    Escherichia coli strain AR_0061, complete genome
CP020055    Escherichia coli strain AR_0069, complete genome
CP020048    Escherichia coli strain AR_0118, complete genome
ADD COMMENTlink modified 17 days ago • written 17 days ago by genomax67k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 814 users visited in the last hour