Question: Unable to retrieve Fasta of certain NCBI entries given their accession number
0
gravatar for erans995
11 months ago by
erans9950
erans9950 wrote:

Hello everyone

I have the following perl code that prints an entry's FASTA sequence to a file given its accession number:

LWP::Simple;

#append [accn] field to each accession
for ($i=0; $i < @ARGV; $i++) {
   $ARGV[$i] .= "[accn]";
}

#join the accessions with OR
$query = join('+OR+',@ARGV);

#assemble the esearch URL
$base = 'https://eutils.ncbi.nlm.nih.gov/entrez/eutils/';
$url = $base . "esearch.fcgi?db=nuccore&term=$query&usehistory=y";

#post the esearch URL
$output = get($url);

#parse WebEnv and QueryKey
$web = $1 if ($output =~ /<WebEnv>(\S+)<\/WebEnv>/);
$key = $1 if ($output =~ /<QueryKey>(\d+)<\/QueryKey>/);

#assemble the efetch URL
$url = $base . "efetch.fcgi?db=nuccore&query_key=$key&WebEnv=$web";
$url .= "&rettype=fasta&retmode=text";

#post the efetch URL
$fasta = get($url);

my $filename = 'dna.txt';

open(FH, '>', $filename) or die $!;

print FH $fasta;

close(FH);

This is a modified version of application 2 from the "Sample Applications of the E-utilities" page of NCBI, here's the original version:

use LWP::Simple;
$acc_list = 'NM_009417,NM_000547,NM_001003009,NM_019353';
@acc_array = split(/,/, $acc_list);

#append [accn] field to each accession
for ($i=0; $i < @acc_array; $i++) {
   $acc_array[$i] .= "[accn]";
}

#join the accessions with OR
$query = join('+OR+',@acc_array);

#assemble the esearch URL
$base = 'https://eutils.ncbi.nlm.nih.gov/entrez/eutils/';
$url = $base . "esearch.fcgi?db=nuccore&term=$query&usehistory=y";

#post the esearch URL
$output = get($url);

#parse WebEnv and QueryKey
$web = $1 if ($output =~ /<WebEnv>(\S+)<\/WebEnv>/);
$key = $1 if ($output =~ /<QueryKey>(\d+)<\/QueryKey>/);

#assemble the efetch URL
$url = $base . "efetch.fcgi?db=nuccore&query_key=$key&WebEnv=$web";
$url .= "&rettype=fasta&retmode=text";

#post the efetch URL
$fasta = get($url);
print "$fasta";

If I run the code with the accession number NM_009417 the code works fine and its FASTA sequence is being written to a file, however if I try running it with CAA30263.1, the following is written to the file: https://eutils.ncbi.nlm.nih.gov/eutils/dtd/20131226/efetch.dtd"> <eFetchResult> <ERROR>Empty result - nothing to do</ERROR> </eFetchResult> I also tried running the code with CAA30263(removed the version number) but it didn't work either. I'll note that I got this accession number by using the following code(which writes the accession number that matches the GI you give it to a file) with the GI 672:

use LWP::Simple;
#$gi_list = '24475906,224465210,50978625,9507198';

#assemble the URL
$base = 'https://eutils.ncbi.nlm.nih.gov/entrez/eutils/';
$url = $base . "efetch.fcgi?db=nucleotide&id=$ARGV[0]&rettype=acc";

#post the URL
$output = get($url);
my $filename = 'acc_num.txt';

open(FH, '>', $filename) or die $!;

print FH $output; 

close(FH);

This code is a modified version of application 1 from the "Sample Applications of the E-utilities" page of NCBI, here's the original version:

use LWP::Simple;
$gi_list = '24475906,224465210,50978625,9507198';

#assemble the URL
$base = 'https://eutils.ncbi.nlm.nih.gov/entrez/eutils/';
$url = $base . "efetch.fcgi?db=nucleotide&id=$gi_list&rettype=acc";

#post the URL
$output = get($url);
print "$output";

Your help will be appreciated, thank you very much for your time!!

ADD COMMENTlink modified 10 days ago by josev.die10 • written 11 months ago by erans9950
1
gravatar for genomax
11 months ago by
genomax65k
United States
genomax65k wrote:

CAA30263.1 is a protein sequence and you are searching in a nucleotide database.

ADD COMMENTlink written 11 months ago by genomax65k
0
gravatar for josev.die
10 days ago by
josev.die10
josev.die10 wrote:

You can also use the following function written in R

save_AAfasta <- function(xpsIds, nameFile) {

 for(i in seq(length(xpsIds))) {
   protein <- rentrez::entrez_summary(db = "protein", id = xpsIds[i])
   protein_fasta <- rentrez::entrez_fetch(db="protein", id=protein$uid, rettype="fasta")

   # save amino acid sequences into a FASTA file ("nameFile"")
   write(protein_fasta, file= paste(nameFile, ".fasta", sep = ""), append = TRUE)
 }
 }

Then, just call the function with your id and it'll save a fasta file with your sequence:

save_AAfasta('CAA30263', "Downloads/my_proteins")
ADD COMMENTlink modified 10 days ago • written 10 days ago by josev.die10
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1780 users visited in the last hour