Retrieving Sequences using NCBI Gene database IDs
1
0
Entering edit mode
9.8 years ago

I'm trying to automatically retrieve sequences for genes defined by NCBI Gene identifiers. Example gene ID: 114787. Page on the Gene database site for this is: http://www.ncbi.nlm.nih.gov/gene/?term=114787%5Buid%5D

There's links on that page to the nucleotide database to get sequences for this gene in FASTA format, which is what I want. But, I can't query the nucleotide database with Biopython through the Efetch service because the IDs are different. I've tried using the elink service to map from Gene ID to nucleotide ID but I just get a massive list of IDs out, which can't be right.

How should I be doing this for a large number of Entrez Gene IDs? Preferably with Biopython.

Sequence Biopython Gene • 6.7k views
ADD COMMENT
0
Entering edit mode
9.8 years ago

http://eutils.ncbi.nlm.nih.gov/entrez/eutils/einfo.fcgi?db=gene shows that you can restrict the output of Elink to the refseq sequences linkname=gene_nuccore_refseqrna.

The query for NOTCH2 would be: http://eutils.ncbi.nlm.nih.gov/entrez/eutils/elink.fcgi?dbfrom=gene&db=nucleotide&id=4853&linkname=gene_nuccore_refseqrna

ADD COMMENT
0
Entering edit mode

Thanks, I was about to close this question after I found your answer to this question: Get Fasta File With Protein Sequences Given Entrez Gene Ids which it turns out is exactly what I really wanted to do.

ADD REPLY
0
Entering edit mode

Although, your script as written there fails to run. There's a problem with the tab delimiters (maybe they got reformatted when you pasted it in here?). I replaced them with $'\t' but the script just hangs.

Writing my own code it looks like each Gene ID maps to multiple protein IDs. You say in the comments on the other post that I could just select any of these protein IDs and it wouldn't matter. Do you mean that for a set of protein IDs which map to a single Gene ID they will return the same sequence?

ADD REPLY

Login before adding your answer.

Traffic: 1503 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6