Hi everyone, I recently started using python with biopython. I'm trying to practice to get the translate ORF using this gene taken from Genbank as input: NM_100684.3
However, my output does not show me the correct ORF and I get a different amino acid sequence both in composition and length.
What am I doing wrong?
These are the scripts used by me
>>>from Bio import SeqIO >>>record = SeqIO.read("sequence.fasta", "fasta") >>> table = 1 >>> min_pro_len = 100 >>>for strand, nuc in [(+1, record.seq), (-1, record.seq.reverse_complement())]: for frame in range(3): length = 3 * ((len(record)-frame) // 3) #Multiple of three for pro in nuc[frame:frame+length].translate(table).split("*"): if len(pro) >= min_pro_len: print("%s...%s - length %i, strand %i, frame %i" \ % (pro[:30], pro[-3:], len(pro), strand, frame)) YSDIDQINLNQISNLQRNLKYFITMGDSTG...NNV - length 554, strand 1, frame 2 SSPGDKGHNCKGGSASSLCPHREEHHSHNG...ILT - length 162, strand -1, frame 1 IEHQDSHDDVQPTGYKEGDPPGREGCGTAA...HNW - length 216, strand -1, frame 1 TKVTGNVQATIITPIHVSPCSVVKCEVEKK...SDA - length 122, strand -1, frame 2
This above is my output but isn't corrected and do not start with methionine, in Genbank the correct protein has 530 a.a. and start with "MGDSTGEPGSSMHGVTGREQ ..."