Guys, I've wrote a script to extract sequences between the 23S rRNA and 16S rRNA gene in Python using Biopython. I can't figure out why, even if I've set explicit conditions, it keeps extracting randome sequences into the genome. Here's the code:
for f in glob.glob(cwd+'/genomes/*.gbk'): gbank = SeqIO.parse(open(f,'r'), "genbank") with open(output_file, "w") as nfh: for genome in gbank: for gene in genome.features: if gene.type=='rRNA': if gene.location.strand == -1: # wanted just the reverse strand if 'product' in gene.qualifiers: if '16S' in gene.qualifiers['product']: end = gene.location.nofuzzy_start elif '23S' in gene.qualifiers['product']: start = gene.location.nofuzzy_end if abs(start - end) < 50000: intergenic_sequence.append(">%s|ITS_full_sequence|%s\n%s" % (genome.description, gene.location.strand, genome.seq[start:end]))
rRNA and it extract a sequence into a random CDS, nothing to do with ribosomal RNA. This problem happens with several genomes tested (RefSeq records, well annotated).