Question: Extracting multiple features using NCBI's e-utilities?
0
gravatar for ThePresident
11 months ago by
ThePresident100
ThePresident100 wrote:

I have a list of protein accession identifiers such as "CBE06962.1". I would like to automatically extract several features such as locus_tag, start and stop positions of the corresponding genes, UniProt tags etc. Is it possible to do it with by combining esearch and efetch from e-utilities, something like:

esearch -db protein -query "CBE06962.1" | efetch ???

If not, I am thinking of downloading all gbanks files and then parsing it for the info I want.

Any suggestions? Thanks in advance,

TP

e-utilities perl • 320 views
ADD COMMENTlink written 11 months ago by ThePresident100
3
gravatar for genomax
11 months ago by
genomax63k
United States
genomax63k wrote:

What about this report:

efetch -db protein -id "CBE06962.1" -format ipg

Id  Source  Nucleotide Accession    Start   Stop    Strand  Protein Protein Name    Organism    Strain  Assembly
18688109    INSDC   FN545816.1  3730243 3731418 -   CBE06962.1  sensor protein  Clostridioides difficile R20291 R20291  GCA_000027105.1
18688109    INSDC   FN538970.1  3649468 3650643 -   CBA66243.1  sensor protein  Clostridioides difficile CD196  CD196   GCA_000085225.1
ADD COMMENTlink modified 11 months ago • written 11 months ago by genomax63k

It's a good start, I can at least have the start/stop positions, the strand and nucleotide accession. One thing that would be really useful is a locus_tag.

Thanks, this is still pretty good though.

TP

ADD REPLYlink written 11 months ago by ThePresident100
1

A single clean solution may be possible but at least this will get you started

efetch -db protein -id "CBE06962.1" -format gp | grep locus
                         /locus_tag="CDR20291_3124"

Since this brings back a GenPept format record you can grep for several other pieces of information.

efetch -db protein -id "CBE06962.1" -format gp | grep -e "locus" -e "coded"
                     /locus_tag="CDR20291_3124"
                     /coded_by="complement(FN545816.1:3730243..3731418)"
ADD REPLYlink modified 11 months ago • written 11 months ago by genomax63k

Many thanks, this looks pretty good.

ADD REPLYlink written 11 months ago by ThePresident100
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1913 users visited in the last hour