Entering edit mode
7.5 years ago
guedes.aureliano
•
0
Hello, I have a list with genome ID's and genomic region of each gene, and ID of protein seq of each gene.
INSDC CP002471.1 675883 676737 + AEF25085.1
INSDC CP000408.1 785817 786671 + ABP91966.1
INSDC AP010655.1 1095217 1096218 - BAH88071.1
RefSeq NC_008532.1 1397428 1398495 - WP_011681477.1
I did the download of proteins with this routine
sub get_fasta{
Download protein records corresponding to a list of GI numbers.
my $db = 'protein';
my ($ids) = @_; #ids separated by "," = AEF25085.1,ABP91966.1,AH88071.1,WP_011681477.1
assemble the epost URL
my $base = 'https://eutils.ncbi.nlm.nih.gov/entrez/eutils/';
my $url = $base . "efetch.fcgi?db=$db&id=$ids&rettype=fasta&retmode=text"; #"epost.fcgi?db=$db&id=$ids";
#post the efetch URL
my $data = get($url);
print "$data";
}
That worked right;
Now I need the nt sequence. I tried:
sub get_fasta{
my $db = 'nucleotide';
my ($ids, $sstart, $sstop) = @_;
#ids = PC002471.1,CP000408.1,AP010655.1,nc_008532.1
#sstart = 675883,785817,1095217,1397428
#sstop = 676737,786671,1096218,1398495
my $base = 'https://eutils.ncbi.nlm.nih.gov/entrez/eutils/';
my $url = $base . "efetch.fcgi?db=$db&id=$ids&seq_start=$sstart&seq_stop=$sstop&rettype=fasta&retmode=text"; #"epost.fcgi?db=$db&id=$ids";
my $furl = Furl->new(timeout => 200,);
my $res = $furl->get($url);
return $res->content;
}
But this don't work and return
Error: CEFetchPApplication::proxy_stream():
In same table I have the locus gene, I don't know if I can use locus gene to download the coding sequence.
I could try to download one by one sequence, but it could spend a lot of time to large list.
Thanks,