Extracting a Genomic Region From Nucleotide Accession
0
0
Entering edit mode
6.9 years ago
13dsc • 0

I am trying to get a list of proteins in a genomic region surrounding a protein of interest identified by a protein accession number. I am using biopython and the entrez module. First I start off by getting a list of protein accession from GenBank and now I want to get an idea of the genomic region surrounding the protein.

I am using Entrez.efetch(db="protein", id=rec, rettype="ipg", retmode="text")

In order to get the nucleotide accession number and the start / stop sites of the protein CDS.

My question is, how do I then download a genbank file representing a 30kb region around the CDS.

Can someone point me in the right direction.

gene python biopython • 1.8k views
ADD COMMENT
0
Entering edit mode

I don't think this is the most optimal method to tackle this question.

You would be better of downloading a bed or gtf file of your organism of interest and get the neighbouring genes from that, e.g. using bedtools.

ADD REPLY
0
Entering edit mode

I am interested in comparing biosynthetic gene clusters in bacteria that share common enzymes required for the synthesis of the backbone of a natural product. I would rather not download a larger file than I need for the sake of time and bandwidth.

ADD REPLY

Login before adding your answer.

Traffic: 2310 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6