Using Biopython Entrez With Gene Name To Get Fastq
1
0
Entering edit mode
11.4 years ago

This is what I want to do. I have a list of gene names for example: [ITGB1, RELA, NFKBIA] Looking up the help in biopython and tutorial for API for entrez I came up with this:

x = ['ITGB1', 'RELA', 'NFKBIA']
for item in x:
    handle = Entrez.efetch(db="nucleotide", id=item ,rettype="gb")
    record = handle.read()
    out_handle = open('genes/'+item+'.xml', 'w') #to create a file with gene name
    out_handle.write(record)
    out_handle.close

But this keeps erroring out. I have discovered that if the id is a numerical id (although you have to make it in to a string to use, '186972394' so:

handle = Entrez.efetch(db="nucleotide", id='186972394' ,rettype="gb")

This gets me the info I want which includes the sequence.

So now to the Question:

How can I search gene names (cause I do not have id numbers) or easily convert my gene names to ids to get the sequences for the gene list I have.

I have also tried:

x = 'RELA'

handle = Entrez.efetch(db="nucleotide", id=x ,rettype="gb")

errors our HTTP Error 400: Bad Request because it is expecting a string of a number for id

handle = Entrez.esearch(db="nucleotide",term=x)

returns nothing,.. debugging shows it did not find anything

handle =  Entrez.esearch(db="nucleotide",term="Homo[Orgn] AND RELA[Gene]")

returns a list of IDS, and the first one is what I want but if I do this, I am sure you will not like it because it does not guarantee that the ID is actually what I want and not just the first of IDs in a list every time I query the search engine.

Thx

entrez biopython python fastq • 5.8k views
ADD COMMENT
2
Entering edit mode

This is really an entrez question rather than a Biopython one - you're trying to find an entrez term that limits you to a particular record for each id. Check out these tips for getting only sequences in refseq, and use biomol_genomic[PROP] to get rid of mRNAs

ADD REPLY
1
Entering edit mode

FASTQ ? are you sure you want to get a FASTQ ?

ADD REPLY
0
Entering edit mode
11.4 years ago

Since you seem to be looking for human gene names I think your best bet will be to query HGNC and extract the gene ids from their output.

A tutorial can be found here

I think Biomart will also permit you to connect a human gene name to an id. (I don't have experience with but other Biomart related post on this site might be helpful)

Almost forgot to mention Using the biomart perl api for simple queries

ADD COMMENT

Login before adding your answer.

Traffic: 1793 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6