Question: Extracting multiple features using NCBI's e-utilities?
0
gravatar for ThePresident
19 months ago by
ThePresident140
ThePresident140 wrote:

I have a list of protein accession identifiers such as "CBE06962.1". I would like to automatically extract several features such as locus_tag, start and stop positions of the corresponding genes, UniProt tags etc. Is it possible to do it with by combining esearch and efetch from e-utilities, something like:

esearch -db protein -query "CBE06962.1" | efetch ???

If not, I am thinking of downloading all gbanks files and then parsing it for the info I want.

Any suggestions? Thanks in advance,

TP

e-utilities perl • 481 views
ADD COMMENTlink written 19 months ago by ThePresident140
3
gravatar for genomax
19 months ago by
genomax74k
United States
genomax74k wrote:

What about this report:

efetch -db protein -id "CBE06962.1" -format ipg

Id  Source  Nucleotide Accession    Start   Stop    Strand  Protein Protein Name    Organism    Strain  Assembly
18688109    INSDC   FN545816.1  3730243 3731418 -   CBE06962.1  sensor protein  Clostridioides difficile R20291 R20291  GCA_000027105.1
18688109    INSDC   FN538970.1  3649468 3650643 -   CBA66243.1  sensor protein  Clostridioides difficile CD196  CD196   GCA_000085225.1
ADD COMMENTlink modified 19 months ago • written 19 months ago by genomax74k

It's a good start, I can at least have the start/stop positions, the strand and nucleotide accession. One thing that would be really useful is a locus_tag.

Thanks, this is still pretty good though.

TP

ADD REPLYlink written 19 months ago by ThePresident140
1

A single clean solution may be possible but at least this will get you started

efetch -db protein -id "CBE06962.1" -format gp | grep locus
                         /locus_tag="CDR20291_3124"

Since this brings back a GenPept format record you can grep for several other pieces of information.

efetch -db protein -id "CBE06962.1" -format gp | grep -e "locus" -e "coded"
                     /locus_tag="CDR20291_3124"
                     /coded_by="complement(FN545816.1:3730243..3731418)"
ADD REPLYlink modified 19 months ago • written 19 months ago by genomax74k

Many thanks, this looks pretty good.

ADD REPLYlink written 19 months ago by ThePresident140
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2259 users visited in the last hour