Utility For Downloading Sequences From A List Of Mixed Database Keys
2
1
Entering edit mode
12.9 years ago
Will 4.6k

I have a large list of sequence IDS that I need to download the AA sequences for. Its a mixed bag of Uniprot, Refseq, and NCBI Protein Accession. And the IDs are to a collection of organisms (mostly human but some HIV, HCV, rat, etc.)

Does anyone know of a webtool that can handle a mixed lists of ids and convert them to a single database? I know I can use biomart (or some other tool) to convert everything to a single ID system but I was wondering if there are any tools that can do that step for me.

Thanks,

-Will

sequence • 2.6k views
ADD COMMENT
2
Entering edit mode
12.9 years ago
Will 4.6k

Well, in-case anyone comes across this post later I've landed on using EBI's PICR tool. Which has a nice REST interface and can deal with seemingly any accession you throw at it.

So I use PICR to convert all of the IDS to SWISSPROT and then download the sequences from there.

ADD COMMENT
1
Entering edit mode
12.9 years ago
Andrew Su 4.9k

I suspect you're looking for a full web-site to do your conversion, but if you're willing to do some simple parsing you could use our ID-resolution web service at http://mygene.info. For example, these URLs all resolve to the same gene (CDK2, Entrez Gene 1017):

From there, you can get everything you'd want to know by querying by Entrez Gene ID

ADD COMMENT
0
Entering edit mode

That's a great service. If only it it supported all of the organisms I needed.

ADD REPLY
0
Entering edit mode

Since you have the source-code public I'll just make my own service ... that way I don't have to blast your website with millions of queries.

ADD REPLY
0
Entering edit mode

my guess is that you've already figured it out, but let me know if you need any help adding new organisms to your instance.

ADD REPLY

Login before adding your answer.

Traffic: 1388 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6