Hi, I'm new to bioinformatics, so I apologize if my question seems a little bit basic. I wanted to use the tool Edirect to retrieve information about a list of samples that I have generated. I work on a cluster, and my current (working) script looks something like this:
*....
while IFS=, read -r srr rest_of_line; do #pick a sra number of my accession file
result=$(
/media/.../apps/edirect/efetch -db sra -id "$srr" -format runinfo | \
awk -F ',' 'NR == 2 {
if ($13 != "") {
print $11 "\t" $12 "\t" $19 "\t" $13 "\t" $15 "\t" $14 "\t" $16
}
}'
)
echo -e "$result" >> "$output_filename"
done < "$file"
...*
The final output looks like this:
*Experiment Library Platform Strategy Source Selection Layout
ERX2313740 P3 ILLUMINA WGS METAGENOMIC RANDOM SINGLE
ERX2313743 P2 ILLUMINA WGS METAGENOMIC RANDOM SINGLE
ERX2313744 B4 ILLUMINA WGS METAGENOMIC RANDOM SINGLE*
The problem is that without an API key, the maximum number of requests per second is 3, and my accession list contains 5 million entries! Needless to say, it will take too much time. I registered with NCBI to obtain a key and added it to my .bash_profile (export NCBI_API_KEY=.....).
How am I supposed to edit my script to incorporate my key and multiply my speed? Thank you in advance
Thanks a lot for the fast reply. I will try the website thanks :)
Just to be sure (I'm just working with small dataset don't worry :) ), I don't just write NCBI_API_KEY=my_key, I have to write it somewhere on my efetch too or not?
export NCBI_API_KEY=key
in the terminal you are working on. Make sure the key worksecho $NCBI_API_KEY
. You do not need to add it to Entrezdirect commands. If you were using web API then add the key at the end of each query like thisFor making the KEY available throughout your account you can add that line to your shell initialization file (`.bashrc_profile etc).