Closed:Retrivie a large set of genomic sequence with custom range each
Entering edit mode
7.5 years ago

Hello, I have a list with genome ID's and genomic region of each gene, and ID of protein seq of each gene.

INSDC   CP002471.1      675883  676737  +       AEF25085.1      
INSDC   CP000408.1      785817  786671  +       ABP91966.1      
INSDC   AP010655.1      1095217 1096218 -       BAH88071.1 
RefSeq  NC_008532.1     1397428 1398495 -       WP_011681477.1

I did the download of proteins with this routine

sub get_fasta{
Download protein records corresponding to a list of GI numbers.
my $db = 'protein';
my ($ids) = @_; #ids separated by "," = AEF25085.1,ABP91966.1,AH88071.1,WP_011681477.1 
assemble the epost URL
my $base = '';
my $url = $base . "efetch.fcgi?db=$db&id=$ids&rettype=fasta&retmode=text"; #"epost.fcgi?db=$db&id=$ids";
#post the efetch URL
my $data = get($url);
print "$data";

That worked right;

Now I need the nt sequence. I tried:

sub get_fasta{
  my $db = 'nucleotide';
  my ($ids, $sstart, $sstop) = @_;
#ids =      PC002471.1,CP000408.1,AP010655.1,nc_008532.1
#sstart =  675883,785817,1095217,1397428
#sstop =   676737,786671,1096218,1398495
  my $base = '';
  my $url = $base . "efetch.fcgi?db=$db&id=$ids&seq_start=$sstart&seq_stop=$sstop&rettype=fasta&retmode=text"; #"epost.fcgi?db=$db&id=$ids";
  my $furl = Furl->new(timeout => 200,);
  my $res = $furl->get($url);
  return $res->content;

But this don't work and return

Error: CEFetchPApplication::proxy_stream():

In same table I have the locus gene, I don't know if I can use locus gene to download the coding sequence.

I could try to download one by one sequence, but it could spend a lot of time to large list.


sequence genome gene efetch • 343 views
This thread is not open. No new answers may be added
Traffic: 2125 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6