Question: Using Biopython Entrez With Gene Name To Get Fastq
gravatar for StudentOfScience
7.5 years ago by
StudentOfScience0 wrote:

This is what I want to do. I have a list of gene names for example: [ITGB1, RELA, NFKBIA] Looking up the help in biopython and tutorial for API for entrez I came up with this:

x = ['ITGB1', 'RELA', 'NFKBIA']
for item in x:
    handle = Entrez.efetch(db="nucleotide", id=item ,rettype="gb")
    record =
    out_handle = open('genes/'+item+'.xml', 'w') #to create a file with gene name

But this keeps erroring out. I have discovered that if the id is a numerical id (although you have to make it in to a string to use, '186972394' so:

handle = Entrez.efetch(db="nucleotide", id='186972394' ,rettype="gb")

This gets me the info I want which includes the sequence.

So now to the Question:

How can I search gene names (cause I do not have id numbers) or easily convert my gene names to ids to get the sequences for the gene list I have.

I have also tried:

x = 'RELA'

handle = Entrez.efetch(db="nucleotide", id=x ,rettype="gb")

errors our HTTP Error 400: Bad Request because it is expecting a string of a number for id

handle = Entrez.esearch(db="nucleotide",term=x)

returns nothing,.. debugging shows it did not find anything

handle =  Entrez.esearch(db="nucleotide",term="Homo[Orgn] AND RELA[Gene]")

returns a list of IDS, and the first one is what I want but if I do this, I am sure you will not like it because it does not guarantee that the ID is actually what I want and not just the first of IDs in a list every time I query the search engine.


python fastq entrez biopython • 4.1k views
ADD COMMENTlink modified 7.5 years ago by Istvan Albert ♦♦ 84k • written 7.5 years ago by StudentOfScience0

This is really an entrez question rather than a Biopython one - you're trying to find an entrez term that limits you to a particular record for each id. Check out these tips for getting only sequences in refseq, and use biomol_genomic[PROP] to get rid of mRNAs

ADD REPLYlink written 7.5 years ago by David W4.7k

FASTQ ? are you sure you want to get a FASTQ ?

ADD REPLYlink written 7.5 years ago by Pierre Lindenbaum128k
gravatar for Istvan Albert
7.5 years ago by
Istvan Albert ♦♦ 84k
University Park, USA
Istvan Albert ♦♦ 84k wrote:

Since you seem to be looking for human gene names I think your best bet will be to query HGNC and extract the gene ids from their output.

A tutorial can be found here

I think Biomart will also permit you to connect a human gene name to an id. (I don't have experience with but other Biomart related post on this site might be helpful)

Almost forgot to mention Using the biomart perl api for simple queries

ADD COMMENTlink modified 7.5 years ago • written 7.5 years ago by Istvan Albert ♦♦ 84k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1180 users visited in the last hour