Entering edit mode
8.4 years ago
Naresh
▴
60
Under Entrez Programming Utilities Help, Application 3 says that you can RETRIEVE LARGE DATASETS. Here Chimpanzee example is given and we can retrieve mRNA sequences. But my interest is to retrieve protein sequences of my analysis. I tried the same script in PERL, in place of mRNA, i made protein and also .faa.gz files.
But i cannot get the output. Please guide me.
Thanks Naresh
If you have all the gi list of your protein of interest. you can use eutility's EFETCH option to retrieve n number of sequences
Tell us more precisely what you've tried. My guess is that you didn't write the url correctly. To get a protein sequence given a GI is http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=protein&rettype=fasta&id=...
The opening and closing tags in the regexes don't match e.g. you have <webenv> and <\/WebEnv>. Make sure that they match what's actually returned in $output otherwise, you won't get anything.
EDIT: For proteins, you need
$query = 'Schizosaccharomyces pombe[orgn]';
and$url = $base . "esearch.fcgi?db=protein&...
and$efetch_url = $base ."efetch.fcgi?db=protein&WebEnv=$web"
;This is the output..
Don't just copy/paste code from a web page. It may not be properly formatted. For example, things like 'assemble the esearch URL' are comments not code so they should be written as proper perl comments. Also you still haven't corrected the regexes. The difference between upper and lower case is meaningful. If you don't know how to program in perl, I suggest you at least have a quick look at a tutorial. Also when posting code, please try to format it for readability.
I never did Perl. I will learn now. Sorry for not formating it for readability.