Closed:Retrivie a large set of genomic sequence with custom range each
0
0
Entering edit mode
7.5 years ago

Hello, I have a list with genome ID's and genomic region of each gene, and ID of protein seq of each gene.

INSDC   CP002471.1      675883  676737  +       AEF25085.1      
INSDC   CP000408.1      785817  786671  +       ABP91966.1      
INSDC   AP010655.1      1095217 1096218 -       BAH88071.1 
RefSeq  NC_008532.1     1397428 1398495 -       WP_011681477.1

I did the download of proteins with this routine

sub get_fasta{
Download protein records corresponding to a list of GI numbers.
my $db = 'protein';
my ($ids) = @_; #ids separated by "," = AEF25085.1,ABP91966.1,AH88071.1,WP_011681477.1 
assemble the epost URL
my $base = 'https://eutils.ncbi.nlm.nih.gov/entrez/eutils/';
my $url = $base . "efetch.fcgi?db=$db&id=$ids&rettype=fasta&retmode=text"; #"epost.fcgi?db=$db&id=$ids";
#post the efetch URL
my $data = get($url);
print "$data";
}

That worked right;

Now I need the nt sequence. I tried:

sub get_fasta{
  my $db = 'nucleotide';
  my ($ids, $sstart, $sstop) = @_;
#ids =      PC002471.1,CP000408.1,AP010655.1,nc_008532.1
#sstart =  675883,785817,1095217,1397428
#sstop =   676737,786671,1096218,1398495
  my $base = 'https://eutils.ncbi.nlm.nih.gov/entrez/eutils/';
  my $url = $base . "efetch.fcgi?db=$db&id=$ids&seq_start=$sstart&seq_stop=$sstop&rettype=fasta&retmode=text"; #"epost.fcgi?db=$db&id=$ids";
  my $furl = Furl->new(timeout => 200,);
  my $res = $furl->get($url);
  return $res->content;
}

But this don't work and return

Error: CEFetchPApplication::proxy_stream():

In same table I have the locus gene, I don't know if I can use locus gene to download the coding sequence.

I could try to download one by one sequence, but it could spend a lot of time to large list.

Thanks,

sequence genome gene efetch • 343 views
ADD COMMENT
This thread is not open. No new answers may be added
Traffic: 2125 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6