I have searched quite extensively for this, to no avail.
I have no problem extracting Gene_ids from a genbank file when given a list of gene_ids to extract:
for seq_record in SeqIO.parse(gbk_file, "genbank"): for feat in seq_record.features: if feat.type == "CDS": for gene in genes: if gene in feat.qualifiers['gene_id']: seq = feat.qualifiers['translation'] print(gene) print(seq)
Where "gene in genes" is a variable storing each desired Gene_id in a list.
I would like to provide a start gene, and an end gene as my range so I can extract a Gene Cluster in fasta format. Is this possible?
I will be looping through the start and end gene_id's in a for loop as follows:
for x, y, z in zip(start_list, end_list, cluster_list):
Once the gene_id range is captured (along with its Amino Acid sequence), I will output them to a unique file using:
with open("%s" % (z), "w") as outfile: outfile.write(">%s\n%s" % (gene, sequence))
Any help in capturing a range of gene_ids would be greatly appreciated.