Question: How to correctly use E-fetch from E-utilities?
0
gravatar for jaynaythan
2.2 years ago by
jaynaythan10
jaynaythan10 wrote:

Hi,

I am looking to download metadata from the SRA.

I downloaded E-utilities following these instructions

https://www.ncbi.nlm.nih.gov/books/NBK179288/

However, i cannot get E-fetch to work.

I have a feeling i am using a differetn version of E-fethc to others. here is the 'help' page from my E-fetch

 EFETCH - retrieve entries from sequence databases.

  Synopsis: efetch -options [database:]<query>

  Databases:  SWissprot/SP, PIR, WOrmpep/WP, EMbl, GEnbank/GB, ProDom, ProSite

  Options:
    -a            Search with Accession number
    -f            Fasta format output
    -q            Sequence only output (one line)
    -s <#>        Start at position #
    -e <#>        Stop at position #
    -o            More options and info...

    -D <dir>      Specify database directory
    -H            Display index header data
    -p            Display entrynames in search path
    -r            Print sequence in 'raw' format
    -m            Fetch from mixed mini database
    -M            Mini format output
    -b            Do NOT reverse the order of bytes
                              (SunOS, IRIX do reverse, Alpha not)
    -d <dbfile>   Specify database file (avoid this)
    -i <idxfile>  Specify index file (avoid this)
    -l <divfile>  Specify division lookup table (avoid this)
    -B <database> Specify database (archaic)
    -A            Only return entryname for accession number
    -n <name>     Give the sequence this name
    -x            Don't require query to match entry's name exactly (avoid)
    -w            For Wormpep: also fetch cross-referenced SwissProt entry
    -h            shows this help text


  Environment:   SWDIR      = SwissProt  directory - database and EMBL index files   PIRDIR     = PIR        -- " --   WORMDIR    = Wormpep  
-- " --   EMBLDIR    = EMBL       -- " --   GBDIR      = Genbank    -- " --   PRODOMDIR  = ProDom     -- " --   PROSITEDIR = ProSite    -- "
--   DBDIR      = User's own -- " -- (fasta format)

  SEQDB    database file (default SwissProt)   SEQDBIDX index file   DIVTABL  division lookup table

  Ex. setenv DBDIR /pubseq/seqlibs/embl/

  Note that Prodom family consensus seqs can be fetched by PD:_#

  by Erik Sonnhammer (esr@sanger.ac.uk)   Version 2.1,

There is no mention of the command -format which appears in commands online. for example these do not work for me.

esearch -db pubmed -query "lycopene cyclase" |efetch -format abstract

esearch -db sra -query SRR5070677 | efetch -format runinfo

the efetch fails but esearch works fine.

Could anyone help me out?

linux e-utilities sra ncbi • 2.1k views
ADD COMMENTlink modified 2.2 years ago • written 2.2 years ago by jaynaythan10
1
gravatar for Sej Modha
2.2 years ago by
Sej Modha4.4k
Glasgow, UK
Sej Modha4.4k wrote:

I am not sure if you have installed the eutils properly, both commands mentioned above works for me.

ADD COMMENTlink written 2.2 years ago by Sej Modha4.4k

OH. yes. You are right. Not sure what went wrong

ADD REPLYlink written 2.2 years ago by jaynaythan10

One nice way of installing eutils is to use Homebrew (http://brew.sh) or Linuxbrew (http://linuxbrew.sh). Then you can run brew install homebrew/science/edirect

ADD REPLYlink written 2.2 years ago by cmdcolin1.2k

Great thanks! Theres just really poor documentation of this. i would never have known!

ADD REPLYlink written 2.2 years ago by jaynaythan10

Hi Sej, i wonder if you can help me out.

i am trying to download this data:

https://www.ncbi.nlm.nih.gov/Traces/study/?acc=SRR5070677&go=go

But some fields are missing from the esearch results. Namely location and host. Do you know if its possible to include this info?

ADD REPLYlink written 2.2 years ago by jaynaythan10

Hi, Runinfo file downloaded from this link would not contain the host information either. You can try fetching the data in the XML format instead and parse the information relation to host from there.

ADD REPLYlink modified 2.2 years ago • written 2.2 years ago by Sej Modha4.4k

You can extract that information with a combination of tools of edirect:

esearch -query SRR5070677 -db sra | efetch -format xml > output.xml

this produces an XML file that you can process with:

cat output.xml | xtract -pattern SAMPLE_ATTRIBUTE -element TAG -element VALUE

that, in turn will produce the output:

strain  J159
collected_by    missing
collection_date 2014
geo_loc_name    USA: MN
host    Homo sapiens
host_disease    pertussis
isolation_source    missing
lat_lon missing
BioSampleModel  Pathogen.cl
ADD REPLYlink written 2.2 years ago by Istvan Albert ♦♦ 81k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1511 users visited in the last hour