Question: Retrieving Sequences using NCBI Gene database IDs
gravatar for gavingray1729
6.0 years ago by
United Kingdom
gavingray17290 wrote:

I'm trying to automatically retrieve sequences for genes defined by NCBI Gene identifiers. Example gene ID: 114787. Page on the Gene database site for this is:


There's links on that page to the nucleotide database to get sequences for this gene in FASTA format, which is what I want. But, I can't query the nucleotide database with Biopython through the Efetch service because the IDs are different. I've tried using the elink service to map from Gene ID to nucleotide ID but I just get a massive list of IDs out, which can't be right.


How should I be doing this for a large number of Entrez Gene IDs? Preferably with Biopython.

biopython sequence gene • 5.6k views
ADD COMMENTlink modified 6.0 years ago by Pierre Lindenbaum129k • written 6.0 years ago by gavingray17290
gravatar for Pierre Lindenbaum
6.0 years ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum129k wrote: shows that you can restrict the output of Elink to the refseq sequences `linkname=gene_nuccore_refseqrna`.


the query for NOTCH2 would be :



ADD COMMENTlink written 6.0 years ago by Pierre Lindenbaum129k

Thanks, I was about to close this question after I found your answer to this question: Get Fasta File With Protein Sequences Given Entrez Gene Ids which it turns out is exactly what I really wanted to do.

ADD REPLYlink written 6.0 years ago by gavingray17290

Although, your script as written there fails to run. There's a problem with the tab delimiters (maybe they got reformatted when you pasted it in here?). I replaced them with $'\t' but the script just hangs.

Writing my own code it looks like each Gene ID maps to multiple protein IDs. You say in the comments on the other post that I could just select any of these protein IDs and it wouldn't matter. Do you mean that for a set of protein IDs which map to a single Gene ID they will return the same sequence?

ADD REPLYlink written 6.0 years ago by gavingray17290
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1304 users visited in the last hour