Retrieve a large dataset of biological sequences with Perl
0
0
Entering edit mode
7.2 years ago
Helder Gomes ▴ 10

Hi

I am trying a Perl script from NCBI´s EUtils to retrieve a large dataset of sequences of the microbiome of Xestospongia testudinaria.

use LWP::Simple;
use warnings;
# Download FASTA records linked to xestospongia testudinaria on Nucleotide.

$db1 = 'pubmed';
$db2 = 'nuccore';
$linkname = 'pubmed_nuccore';
$query = 'xestospongia testudinaria';

#assemble the esearch URL
$base = 'http://eutils.ncbi.nlm.nih.gov/entrez/eutils/';
$url = $base . "esearch.fcgi?db=$db1&term=$query&usehistory=y";
#post the esearch URL
$output = get($url);

#parse WebEnv and QueryKey
$web1 = $1 if ($output =~ /<WebEnv>(\S+)<\/WebEnv>/);
$key1 = $1 if ($output =~ /<QueryKey>(\d+)<\/QueryKey>/);

#assemble the elink URL
$base = 'http://eutils.ncbi.nlm.nih.gov/entrez/eutils/';
$url = $base . "elink.fcgi?dbfrom=$db1&db=$db2";
$url .= "&query_key=$key1&WebEnv=$web1";
$url .= "&linkname=$linkname&cmd=neighbor_history";
print "$url\n";

#post the elink URL
$output = get($url);
print "$output\n";

#parse WebEnv and QueryKey
$web2 = $1 if ($output =~ /<WebEnv>(\S+)<\/WebEnv>/);
$key2 = $1 if ($output =~ /<QueryKey>(\d+)<\/QueryKey>/);

#assemble the efetch URL
$url = $base . "efetch.fcgi?db=$db2&query_key=$key2&WebEnv=$web2";
$url .= "&rettype=fasta&retmode=text";
#post the efetch URL
$data = get($url);
print "$data";

The script is ALMOST working except for one thing: it is displaying all the info (including the sequences) in my command line instead of in a document. Can you help me?

perl ncbi sequence • 1.9k views
ADD COMMENT
4
Entering edit mode

And that is because :

1. you stated that you wish to print it into your "command line": print "$data"; says print to stdout

2. you did not specify the output file.

 

a simple fix would be to redirect the stdout to a file : perl your_script.pl > myfile

ADD REPLY
0
Entering edit mode

Thanks mxs. It works great!

ADD REPLY

Login before adding your answer.

Traffic: 1369 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6