Let's say I have a GenBank ID for a genome (BA000007.3) and a list with genome coordinates for blocks:
blocks = [(1443423, 1444553), ...]
I need to find out what genes (or parts of what genes) are covered by these blocks.
I've tried this:
from Bio import Entrez, SeqIO Entrez.email = "firstname.lastname@example.org" with Entrez.efetch(db="nucleotide", rettype="gb", retmode="text", id="BA000007.3") as handle: seq_record = SeqIO.read(handle, "gb") sub_record = seq_record[1443423:1444553] print(len(sub_record.features))
It returns 0, as no genes are fully covered by this block. However, if you open the NCBI record in a browser and change region shown you will see that there are 2 genes that are partially covered by this block.
I've also tried substracting with SeqFeature and FeatureLocation but it didn't work either (a new sequence doesn't have features at all).
So is there any way to do this without manually checking each block for each genome?