Question

How to feed the sequence dictionary created with multifasta file into python function

0

Entering edit mode

4.0 years ago

kousi31 ▴ 100

I am learning python through tutorials. I created a dictionary for multi-fasta file using the following code as explained in a lecture. I couldn't find how to use the dictionary in a python function I wrote for finding ORFs.

The code used to create the dictionary

f=open('dna.example.fa') 
seqs={} 

for line in f:
    line=line.rstrip()
    if line[0]=='>': 
        words=line.split() 
        name=words[0][1:] 
        seqs[name]=''

    else:
        seqs[name]=seqs[name]+line

Code for parsing the file,

for name,seq in seqs.items():
     dna=(name,seq)

Code I wrote for finding ORFs

    def orf_finder1(dna, frame=1):
        start_codon=['ATG']
        stop_codon=['TAA', 'TAG', 'TGA']
        atg_pos=[]
        stp_pos=[]

        for i in range(frame, len(dna),3):
            codon=dna[i:i+3]
            if codon in start_codon:
                atg_pos.append(i)
                x=i
                for k in range(i+3, len(dna),3):
                    codons=dna[k:k+3]
                    if codons in stop_codon:
                        stp_pos=[]
                        s=k+3
                        l=len(dna[x:s])
                        d=dna[x:s]         
                        print(l)
                        print(d)
                        break
orf_finder1('AATGTTGACTAGCTAGCATGCAAGCTAGCTAA')
output:
15
ATGTTGACTAGCTAG

I haven't started learning biopython yet. Kindly someone please help me to feed the multifasta file into this function. Thank you.

python multi-fasta function • 2.4k views

ADD COMMENT • link updated 4.0 years ago by massa.kassa.sc3na ▴ 600 • written 4.0 years ago by kousi31 ▴ 100

score 0 · Answer 1 · 2020-04-22

0

Entering edit mode

4.0 years ago

massa.kassa.sc3na ▴ 600

Hello,

Your code is working, provided that your multifasta is nice. (doesn't have blank lines). I would add only seqs[name]=seqs[name]+line.strip()

so you don't have line breaks (\n) in your fasta dictionary.

To your actual question. Your function expects string as first input. So you need to provide it.

e.g: (note that we only pass seq not the dna tuple)

for name, seq in seqs.items():
    dna = (name, seq)
    print(orf_finder1(seq))

ADD COMMENT • link 4.0 years ago by massa.kassa.sc3na ▴ 600

0

Entering edit mode

Thank you. But I want to use the dictionary so as to identify the ORFs corresponding to the fasta header. i.e. to print the ORFs, present in seqs.values for each seqs.key. I tried replacing the dna in the code with seqs.values but it throws error. May be I have to identify ORFs under each fasta header and loop over. But I am not getting a clue on how to do it.

ADD REPLY • link 4.0 years ago by kousi31 ▴ 100

1

Entering edit mode

Maybe I've misunderstood what you want.

The loop will go through every sequence in your dna.example.fa- (it would be great to have this file for any testing - this way I have to mock up my own example file) - and it will apply the orf_finder1 to the sequence. You can check that it traverses whole dict by adding print(name) to the for loop.

If I read your code correctly, your orf_finder1 will print only the first ORF it finds.

Do you want to return all ORFs in the sequence? If so, you need to rewrite the orf_finder1 function. This has nothing to do with your input file being multifasta or your data being stored in the dict.

If I may have an recommendation - Do try to learn BioPython, It's not panacea, but it will make your life simpler, especially for file reading writing.

Also, do check http://biopython.org/DIST/docs/tutorial/Tutorial.html especially section "20.1.13 Identifying open reading frames".

ADD REPLY • link 4.0 years ago by massa.kassa.sc3na ▴ 600