Any idea how to retrieve DEFINITION field from ncbi using an accession numbers in a file
2
0
Entering edit mode
8.4 years ago
Ivan Romero ▴ 20

Hi everyone, I have a blastx file with no name in the sseqid just the accession number, I am thinking use this to obtain the DEFINITION field and add it as 13th column in my file. Could somebody help me with this task?

blast • 2.1k views
ADD COMMENT
0
Entering edit mode

search this site for ncbi eutils

ADD REPLY
1
Entering edit mode
8.4 years ago
5heikki 11k

You need Entrez Direct for this:

cat gi-list 
807531832
195954015

for next in $(cat gi-list); do title=$(epost -db protein -id "$next" | efetch -format docsum | xtract -element Title); echo -e "$next\t$title"; done
807531832    ATP6 (mitochondrion) [Campethera nivosa]
195954015    atp6 (mitochondrion) [Ochrogaster lunifer]

Then you can use a tool like join.

ADD COMMENT
0
Entering edit mode

works great, Thanks!!!

ADD REPLY
0
Entering edit mode
8.4 years ago
DCGenomics ▴ 330

EDirect 3.30 is now out on the NCBI ftp site. efetch -format docsum (and elink) can accept a single sequence accession number in the -id argument. For scripts that loop through one accession at a time, this will eliminate the epost step.

ADD COMMENT
0
Entering edit mode

Here is some simplified code for doing that:

for next in $(cat gi-list)
do
  efetch -db protein -id "$next" -format docsum |
  xtract -pattern DocumentSummary -lbl "$next" -element Title
done
ADD REPLY

Login before adding your answer.

Traffic: 2436 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6