Can I reach NCBI protein sequences with their corresponding NCBI urls?
1
0
Entering edit mode
9.2 years ago
natasha.sernova ★ 4.0k

Dear all,

Sorry, this is still a mystery to me. Why do I have to use xml or whatever, but not just a simple script like one below:

It was discussed a few times, but why it should be so complicated?

#!/usr/bin/perl

use LWP::UserAgent;

my $ua = new LWP::UserAgent;
my $response = $ua->get('http://www.example.com/');


# my url has to be like:
# http://www.ncbi.nlm.nih.gov/protein/WP_005451061.1?report=fasta&log$=seqview&format=text

unless ($response->is_success) {
        die $response->status_line;
}

my $content = $response->decoded_content();
if (utf8::is_utf8($content)) {
        binmode STDOUT,':utf8';
} else {
        binmode STDOUT,':raw';
}

print $content;

ref: http://www.microhowto.info/howto/fetch_the_content_of_a_given_url_in_perl_using_lwp_useragent.html

I have a lot of NCBI ids, like WP_005451061.1, many thousands.

I will have to find their respective UniProt ids, won't !?

http://www.ncbi.nlm.nih.gov/protein/WP_005451061.1?report=fasta&log$=seqview&format=text

Is it correct that there is no way to use the fasta-sequence encoded by the url above and I can reach it only manually? Thank you very much for your advice!

Sincerely yours,
Natalia

NCBI protein • 2.2k views
ADD COMMENT
5
Entering edit mode
9.2 years ago
Siva ★ 1.9k

You can use E-utilities to get data from NCBI.

To get the amino acid sequence in FASTA format for a given ID (e.g. WP_005451061.1) or for comma separated IDs,

http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=protein&id=WP_005451061.1&rettype=fasta&retmode=text
ADD COMMENT
0
Entering edit mode

So easy?! Many thanks, I've not imagined such a clear solution! May I ask you a couple of other questions?

rettype=fasta is it critical for the format, or it may be a fasta file with *.txt extension? I'm afraid it's prohibited...

And is it possible to use somehow files with these IDs, I have too many of them for commas...

Thanks again!

Sincerely yours,
Natasha

ADD REPLY
0
Entering edit mode

Yeah, it is easy. This NCBI page describes several ways to download bulk data from NCBI.

rettype=fasta is it critical for the format, or it may be a fasta file with *.txt extension? I'm afraid it's prohibited...

You are right. This table lists the valid values for rettype and retmode for EFetch.

And is it possible to use somehow files with these IDs, I have too many of them for commas...

You can use Batch Entrez described in the page I linked above.

ADD REPLY
0
Entering edit mode

Perfect! It's not so terrible as it has seemed to be...

Thousand thanks!

Sincerely yours,
Natasha

ADD REPLY
0
Entering edit mode

Sorry, I have to come back. If I need a nucleotide sequence of the same protein, would it be enough just to change 'protein' to 'nucleotide' in the url, or I have to do something else? I think, id should be the same. Am I correct?

Thank you!

Natasha

http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=nucleotide&id=WP_005451061.1&rettype=fasta&retmode=text

ADD REPLY
0
Entering edit mode

I've seen I was wrong. If I have the same ID, I have a protein sequence in the output even if I said 'nucleotide'. What else should be changed? I haven't noticed any significant changes in eutils... What is my mistake?

Thanks in advance.

ADD REPLY

Login before adding your answer.

Traffic: 2673 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6