Question: Extracting a Genomic Region From Nucleotide Accession
gravatar for 13dsc
2.9 years ago by
13dsc0 wrote:

I am trying to get a list of proteins in a genomic region surrounding a protein of interest identified by a protein accession number. I am using biopython and the entrez module. First I start off by getting a list of protein accession from GenBank and now I want to get an idea of the genomic region surrounding the protein.

I am using Entrez.efetch(db="protein", id=rec, rettype="ipg", retmode="text")

In order to get the nucleotide accession number and the start / stop sites of the protein CDS.

My question is, how do I then download a genbank file representing a 30kb region around the CDS.

Can someone point me in the right direction.

biopython python gene • 1.0k views
ADD COMMENTlink written 2.9 years ago by 13dsc0

I don't think this is the most optimal method to tackle this question.

You would be better of downloading a bed or gtf file of your organism of interest and get the neighbouring genes from that, e.g. using bedtools.

ADD REPLYlink written 2.9 years ago by WouterDeCoster43k

I am interested in comparing biosynthetic gene clusters in bacteria that share common enzymes required for the synthesis of the backbone of a natural product. I would rather not download a larger file than I need for the sake of time and bandwidth.

ADD REPLYlink modified 2.9 years ago • written 2.9 years ago by 13dsc0
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1787 users visited in the last hour