Question: How to feed the sequence dictionary created with multifasta file into python function
0
gravatar for kousi31
7 months ago by
kousi3130
kousi3130 wrote:

I am learning python through tutorials. I created a dictionary for multi-fasta file using the following code as explained in a lecture. I couldn't find how to use the dictionary in a python function I wrote for finding ORFs.

The code used to create the dictionary

f=open('dna.example.fa') 
seqs={} 

for line in f:
    line=line.rstrip()
    if line[0]=='>': 
        words=line.split() 
        name=words[0][1:] 
        seqs[name]=''

    else:
        seqs[name]=seqs[name]+line

Code for parsing the file,

for name,seq in seqs.items():
     dna=(name,seq)

Code I wrote for finding ORFs

    def orf_finder1(dna, frame=1):
        start_codon=['ATG']
        stop_codon=['TAA', 'TAG', 'TGA']
        atg_pos=[]
        stp_pos=[]

        for i in range(frame, len(dna),3):
            codon=dna[i:i+3]
            if codon in start_codon:
                atg_pos.append(i)
                x=i
                for k in range(i+3, len(dna),3):
                    codons=dna[k:k+3]
                    if codons in stop_codon:
                        stp_pos=[]
                        s=k+3
                        l=len(dna[x:s])
                        d=dna[x:s]         
                        print(l)
                        print(d)
                        break
orf_finder1('AATGTTGACTAGCTAGCATGCAAGCTAGCTAA')
output:
15
ATGTTGACTAGCTAG

I haven't started learning biopython yet. Kindly someone please help me to feed the multifasta file into this function. Thank you.

function multi-fasta python • 359 views
ADD COMMENTlink modified 7 months ago by massa.kassa.sc3na340 • written 7 months ago by kousi3130
0
gravatar for massa.kassa.sc3na
7 months ago by
massa.kassa.sc3na340 wrote:

Hello,

Your code is working, provided that your multifasta is nice. (doesn't have blank lines). I would add only seqs[name]=seqs[name]+line.strip()

so you don't have line breaks (\n) in your fasta dictionary.

To your actual question. Your function expects string as first input. So you need to provide it.

e.g: (note that we only pass seq not the dna tuple)

for name, seq in seqs.items():
    dna = (name, seq)
    print(orf_finder1(seq))
ADD COMMENTlink written 7 months ago by massa.kassa.sc3na340

Thank you. But I want to use the dictionary so as to identify the ORFs corresponding to the fasta header. i.e. to print the ORFs, present in seqs.values for each seqs.key. I tried replacing the dna in the code with seqs.values but it throws error. May be I have to identify ORFs under each fasta header and loop over. But I am not getting a clue on how to do it.

ADD REPLYlink modified 7 months ago • written 7 months ago by kousi3130
1

Maybe I've misunderstood what you want.

The loop will go through every sequence in your dna.example.fa- (it would be great to have this file for any testing - this way I have to mock up my own example file) - and it will apply the orf_finder1 to the sequence. You can check that it traverses whole dict by adding print(name) to the for loop.

If I read your code correctly, your orf_finder1 will print only the first ORF it finds.

Do you want to return all ORFs in the sequence? If so, you need to rewrite the orf_finder1 function. This has nothing to do with your input file being multifasta or your data being stored in the dict.

If I may have an recommendation - Do try to learn BioPython, It's not panacea, but it will make your life simpler, especially for file reading writing.

Also, do check http://biopython.org/DIST/docs/tutorial/Tutorial.html especially section "20.1.13 Identifying open reading frames".

ADD REPLYlink modified 7 months ago • written 7 months ago by massa.kassa.sc3na340
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2013 users visited in the last hour