trying to use the API for edirect tool (NCBI)
1
0
Entering edit mode
12 months ago
marie.harmel ▴ 10

Hi, I'm new to bioinformatics, so I apologize if my question seems a little bit basic. I wanted to use the tool Edirect to retrieve information about a list of samples that I have generated. I work on a cluster, and my current (working) script looks something like this:

*....
while IFS=, read -r srr rest_of_line; do #pick a sra number of my accession file     
  result=$(
     /media/.../apps/edirect/efetch -db sra -id "$srr" -format runinfo | \
    awk -F ',' 'NR == 2 {
      if ($13 != "") {
        print $11 "\t" $12 "\t" $19 "\t" $13 "\t" $15 "\t" $14 "\t" $16
        }
    }'
  )
  echo -e "$result" >> "$output_filename"
done < "$file"
...*

The final output looks like this:

*Experiment Library Platform Strategy Source Selection Layout   
ERX2313740      P3      ILLUMINA        WGS     METAGENOMIC     RANDOM  SINGLE   
ERX2313743      P2      ILLUMINA        WGS     METAGENOMIC     RANDOM  SINGLE    
ERX2313744      B4      ILLUMINA        WGS     METAGENOMIC     RANDOM  SINGLE*

The problem is that without an API key, the maximum number of requests per second is 3, and my accession list contains 5 million entries! Needless to say, it will take too much time. I registered with NCBI to obtain a key and added it to my .bash_profile (export NCBI_API_KEY=.....).

How am I supposed to edit my script to incorporate my key and multiply my speed? Thank you in advance

edirect NCBI API cluster • 1.1k views
ADD COMMENT
2
Entering edit mode
12 months ago
GenoMax 147k

How am I supposed to edit my script to incorporate my key and multiply my speed?

By setting a global or local variable. NCBI_API_KEY=key.

my accession list contains 5 million entries!

Please don't use a public API for this volume of queries. You can find NCBI SRA metadata reports here: https://ftp.ncbi.nih.gov/sra/reports/Metadata/ Consider parsing the XML files to get what you need.

ADD COMMENT
0
Entering edit mode

Thanks a lot for the fast reply. I will try the website thanks :)

ADD REPLY
0
Entering edit mode

Just to be sure (I'm just working with small dataset don't worry :) ), I don't just write NCBI_API_KEY=my_key, I have to write it somewhere on my efetch too or not?

ADD REPLY
2
Entering edit mode

export NCBI_API_KEY=key in the terminal you are working on. Make sure the key works echo $NCBI_API_KEY. You do not need to add it to Entrezdirect commands. If you were using web API then add the key at the end of each query like this

https://eutils.ncbi.nlm.nih.gov/entrez/eutils/einfo.fcgi?db=nuccore&api_key=ABCD123

For making the KEY available throughout your account you can add that line to your shell initialization file (`.bashrc_profile etc).

ADD REPLY

Login before adding your answer.

Traffic: 858 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6