I am new to bioinformatics and trying to use python and biopython specifically SeqIO to find all the ORFs in several FASTA files. My code is simple and I can easily find all the ORFs in an example string and I can use SeqIO to go through all the FASTA files but when I put the two together I get a problem. I get some seemingly random sequence(s) in the beginning before accessing and printing the sequences I want. Then, I only get some of the FASTA files and what files I access and what ORFs seems to be dependent on what print statements I use beforehand, either printing the sequence, printing the sequence ID, printing nothing etc.
It does seem to work (I think) if I only consider one reading frame and therefore don't add an extra for loop.
This doesn't make sense to me at all and I can't after much struggle find a solution for this. Is what I am trying to do just not possible with SeqIO or is there something I am not understanding?
Any help would be much appreciated.
#function to return ORF (open reading frame) given a reading frame and a DNA string, sequence #first find the start codon ATG and then continue from there to find a stop codon all in a given reading frame #(i.e start at a given index and increment +3 #if both the stop and start codon are not found nothing is returned or printed def findORF(readingFrame, sequence): results= i=readingFrame while i<len(sequence)-2: if sequence[i:i+3] =="ATG": j=i while j<len(sequence)-2: if sequence[j:j+3]=="TAG" or sequence[j:j+3]=="TAA" or sequence[j:j+3]=="TGA": #could also make a list of stop codons and ask is sequence in list results.append(sequence[i:j+3]) break j=j+3 i=i+3 return results def revComp(sequence): result="" sequence=sequence.upper() i=len(sequence)-1 while i>=0: if sequence[i]=="T": result=result+"A" elif sequence[i]=="A": result=result+"T" elif sequence[i]=="G": result=result+"C" elif sequence[i]=="C": result=result+"G" else: result=result+"N" i=i-1 return result from Bio import SeqIO #parse all fasta sequences for file in SeqIO.parse("dna.example.fasta", "fasta"): sequence=file.seq sequences=[sequence,revComp(sequence)] for seq in sequences: #include forward and reverse strand for i in range(3): #print out all reading frames print("Sequence is") #print (seq.id) print(seq) print("Reading frame is %d"% i ) result=findORF(i,seq) if len(result)>0: print ("number of ORFs: %d" %len(result)) #print(*result, sep="\n \n") for j in range(len(result)): print ( "%d: %s" %(j+1,result[j])) else: print("no ORFS found \n")