Dear all,
Sorry, this is still a mystery to me. Why do I have to use xml or whatever, but not just a simple script like one below:
It was discussed a few times, but why it should be so complicated?
#!/usr/bin/perl
use LWP::UserAgent;
my $ua = new LWP::UserAgent;
my $response = $ua->get('http://www.example.com/');
# my url has to be like:
# http://www.ncbi.nlm.nih.gov/protein/WP_005451061.1?report=fasta&log$=seqview&format=text
unless ($response->is_success) {
die $response->status_line;
}
my $content = $response->decoded_content();
if (utf8::is_utf8($content)) {
binmode STDOUT,':utf8';
} else {
binmode STDOUT,':raw';
}
print $content;
ref: http://www.microhowto.info/howto/fetch_the_content_of_a_given_url_in_perl_using_lwp_useragent.html
I have a lot of NCBI ids, like WP_005451061.1
, many thousands.
I will have to find their respective UniProt ids, won't !?
http://www.ncbi.nlm.nih.gov/protein/WP_005451061.1?report=fasta&log$=seqview&format=text
Is it correct that there is no way to use the fasta-sequence encoded by the url above and I can reach it only manually? Thank you very much for your advice!
Sincerely yours,
Natalia
So easy?! Many thanks, I've not imagined such a clear solution! May I ask you a couple of other questions?
rettype=fasta
is it critical for the format, or it may be a fasta file with *.txt extension? I'm afraid it's prohibited...And is it possible to use somehow files with these IDs, I have too many of them for commas...
Thanks again!
Sincerely yours,
Natasha
Yeah, it is easy. This NCBI page describes several ways to download bulk data from NCBI.
You are right. This table lists the valid values for rettype and retmode for EFetch.
You can use Batch Entrez described in the page I linked above.
Perfect! It's not so terrible as it has seemed to be...
Thousand thanks!
Sincerely yours,
Natasha
Sorry, I have to come back. If I need a nucleotide sequence of the same protein, would it be enough just to change 'protein' to 'nucleotide' in the url, or I have to do something else? I think, id should be the same. Am I correct?
Thank you!
Natasha
http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=nucleotide&id=WP_005451061.1&rettype=fasta&retmode=text
I've seen I was wrong. If I have the same ID, I have a protein sequence in the output even if I said 'nucleotide'. What else should be changed? I haven't noticed any significant changes in eutils... What is my mistake?
Thanks in advance.