Question: Unable to retrieve Fasta of certain NCBI entries given their accession number
0
gravatar for erans995
2.4 years ago by
erans9950
erans9950 wrote:

Hello everyone

I have the following perl code that prints an entry's FASTA sequence to a file given its accession number:

LWP::Simple;

#append [accn] field to each accession
for ($i=0; $i < @ARGV; $i++) {
   $ARGV[$i] .= "[accn]";
}

#join the accessions with OR
$query = join('+OR+',@ARGV);

#assemble the esearch URL
$base = 'https://eutils.ncbi.nlm.nih.gov/entrez/eutils/';
$url = $base . "esearch.fcgi?db=nuccore&term=$query&usehistory=y";

#post the esearch URL
$output = get($url);

#parse WebEnv and QueryKey
$web = $1 if ($output =~ /<WebEnv>(\S+)<\/WebEnv>/);
$key = $1 if ($output =~ /<QueryKey>(\d+)<\/QueryKey>/);

#assemble the efetch URL
$url = $base . "efetch.fcgi?db=nuccore&query_key=$key&WebEnv=$web";
$url .= "&rettype=fasta&retmode=text";

#post the efetch URL
$fasta = get($url);

my $filename = 'dna.txt';

open(FH, '>', $filename) or die $!;

print FH $fasta;

close(FH);

This is a modified version of application 2 from the "Sample Applications of the E-utilities" page of NCBI, here's the original version:

use LWP::Simple;
$acc_list = 'NM_009417,NM_000547,NM_001003009,NM_019353';
@acc_array = split(/,/, $acc_list);

#append [accn] field to each accession
for ($i=0; $i < @acc_array; $i++) {
   $acc_array[$i] .= "[accn]";
}

#join the accessions with OR
$query = join('+OR+',@acc_array);

#assemble the esearch URL
$base = 'https://eutils.ncbi.nlm.nih.gov/entrez/eutils/';
$url = $base . "esearch.fcgi?db=nuccore&term=$query&usehistory=y";

#post the esearch URL
$output = get($url);

#parse WebEnv and QueryKey
$web = $1 if ($output =~ /<WebEnv>(\S+)<\/WebEnv>/);
$key = $1 if ($output =~ /<QueryKey>(\d+)<\/QueryKey>/);

#assemble the efetch URL
$url = $base . "efetch.fcgi?db=nuccore&query_key=$key&WebEnv=$web";
$url .= "&rettype=fasta&retmode=text";

#post the efetch URL
$fasta = get($url);
print "$fasta";

If I run the code with the accession number NM_009417 the code works fine and its FASTA sequence is being written to a file, however if I try running it with CAA30263.1, the following is written to the file: https://eutils.ncbi.nlm.nih.gov/eutils/dtd/20131226/efetch.dtd"> <eFetchResult> <ERROR>Empty result - nothing to do</ERROR> </eFetchResult> I also tried running the code with CAA30263(removed the version number) but it didn't work either. I'll note that I got this accession number by using the following code(which writes the accession number that matches the GI you give it to a file) with the GI 672:

use LWP::Simple;
#$gi_list = '24475906,224465210,50978625,9507198';

#assemble the URL
$base = 'https://eutils.ncbi.nlm.nih.gov/entrez/eutils/';
$url = $base . "efetch.fcgi?db=nucleotide&id=$ARGV[0]&rettype=acc";

#post the URL
$output = get($url);
my $filename = 'acc_num.txt';

open(FH, '>', $filename) or die $!;

print FH $output; 

close(FH);

This code is a modified version of application 1 from the "Sample Applications of the E-utilities" page of NCBI, here's the original version:

use LWP::Simple;
$gi_list = '24475906,224465210,50978625,9507198';

#assemble the URL
$base = 'https://eutils.ncbi.nlm.nih.gov/entrez/eutils/';
$url = $base . "efetch.fcgi?db=nucleotide&id=$gi_list&rettype=acc";

#post the URL
$output = get($url);
print "$output";

Your help will be appreciated, thank you very much for your time!!

fasta accession number ncbi perl • 1.0k views
ADD COMMENTlink modified 18 months ago by josev.die20 • written 2.4 years ago by erans9950
1
gravatar for genomax
2.4 years ago by
genomax91k
United States
genomax91k wrote:

CAA30263.1 is a protein sequence and you are searching in a nucleotide database.

ADD COMMENTlink written 2.4 years ago by genomax91k
0
gravatar for josev.die
18 months ago by
josev.die20
josev.die20 wrote:

You can also use the following function written in R

save_AAfasta <- function(xpsIds, nameFile) {

 for(i in seq(length(xpsIds))) {
   protein <- rentrez::entrez_summary(db = "protein", id = xpsIds[i])
   protein_fasta <- rentrez::entrez_fetch(db="protein", id=protein$uid, rettype="fasta")

   # save amino acid sequences into a FASTA file ("nameFile"")
   write(protein_fasta, file= paste(nameFile, ".fasta", sep = ""), append = TRUE)
 }
 }

Then, just call the function with your id and it'll save a fasta file with your sequence:

save_AAfasta('CAA30263', "Downloads/my_proteins")
ADD COMMENTlink modified 18 months ago • written 18 months ago by josev.die20
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1389 users visited in the last hour