Question: Can't make program in Python to read FASTA
0
gravatar for hdinis09
10 months ago by
hdinis090
hdinis090 wrote:

My professor wants us to make a program that analyses triplets in sequences. The first step is to open and read the actual sequence and he provides these codes.

def read_FASTA(fname):
begin = True
prots = {}
fil = open(fname, "rt")
lins = fil.readlines()
fil.close()
for lin in lins:
    slin = lin.strip()
    if slin[0] == '>':
        if begin == False:
            prots[pname] = seq
        seq = ""
        pname = slin[1:].strip()
        begin = False
    else:
        seq = seq + slin
prots[pname] = seq
return prots

---SECOND CODE---

    import rfasta
prots = rfasta.read_FASTA(‘C:\Users\hdini\Desktop\aasd\\proteinas.fasta’)
for prot in prots:
    print(prot)

After downloading the fasta file from Uniprot(one or multiple), these are my results:

def read_FASTA("proteinas.fasta"):
    begin = True
    prots = {}
    fil = open("proteinas.fasta", "rt")
    lins = fil.readlines()
    fil.close()
    for lin in lins:
        slin = lin.strip()
        if slin[0] == '>':
            if begin == False:
                prots[pname] = seq
            seq = ""
            pname = slin[1:].strip()
            begin = False
        else:
            seq = seq + slin
    prots[pname] = seq
    return prots

--Second result--

import rfasta
prots = rfasta.read_FASTA(‘C:\Users\hdini\Desktop\aasd\\proteinas.fasta')
for prot in prots:
    print(prot)
sequence python fasta • 820 views
ADD COMMENTlink modified 10 months ago by Dattatray Mongad330 • written 10 months ago by hdinis090

I suspect this is just the tip of the iceberg, but at least part of your problem is that you are passing your file path with a mixture of quotes and backticks:

" = double quote
' = single quote
` = backtick

Do not mix these up. It should look something like:

function('C:\Path\to\file.fasta')

——

Just a pointer about using the forum, please don’t screenshot code/output, instead copy and paste the text and format it appropriately as I have done above.

ADD REPLYlink modified 10 months ago • written 10 months ago by Joe14k

But i did put in single quotes!

ADD REPLYlink written 10 months ago by hdinis090

Hummm.. perhaps it was just that screenshot making it look odd then (another reason to copy the raw text).

And what was the error you were getting again?

ADD REPLYlink modified 10 months ago • written 10 months ago by Joe14k

Hello hdinis09 ,

you forgot to tell us, what your question/problem is :)

fin swimmer

ADD REPLYlink written 10 months ago by finswimmer12k
3
gravatar for Dattatray Mongad
10 months ago by
National Centre for Cell Science, Pune
Dattatray Mongad330 wrote:

use biopython:

from Bio import SeqIO
     for records in SeqIO.Parse("fastaFileName","fasta"):
         print( records.id )
         print( records.seq )
ADD COMMENTlink modified 10 months ago by h.mon27k • written 10 months ago by Dattatray Mongad330

This is the better suggestion, but as the task is an assignment with code given specifically, I'm guessing this isn't an option.

ADD REPLYlink written 10 months ago by Joe14k
1
gravatar for Joe
10 months ago by
Joe14k
United Kingdom
Joe14k wrote:

You haven't shown us the project structure, and rfasta isn't an existing package, so I'm guessing the script itself is called rfasta and you're trying to import it locally?

The code works for me (after a little re-indentation compared to your post which was probably lost in translation).

Given the following:

Input file

>mutant
GTTGGGAGGCTATGTGTTGACTGGAAGGACGTCCTGTCGGGTGGCGAGAAGCAGAGAATC
>gorrila
GTTGGGAGGCTATGTGTGACTGGAAGGACATCCTGTCGGGTGGCGAGAAGCAGAGAATC
>chimpanze
GTTGGGAGGCTGTGTGTGACTGGAAGGACGTCCTGTCGGGTGGCGAGAAGCAGAGAATC
>human
GTTGGGAGGCTATGTGTGACTGGAAGGACGTCCTGTCGGGTGGCGAGAAGCAGAGAATC
>olive
GTTGGGAGGCTATGTGTGACTGGAAGGACGTCCTGTCGGGTGGCGAGAAGCAAAGAATC

The 'reader' code in a file called rfasta.py in the current working directory:

def read_FASTA(fname):
    begin = True
    prots = {}
    fil = open(fname, "rt")
    lins = fil.readlines()
    fil.close()
    for lin in lins:
        slin = lin.strip()
        if slin[0] == '>':
            if begin == False:
                prots[pname] = seq
            seq = ""
            pname = slin[1:].strip()
            begin = False
        else:
            seq = seq + slin
    prots[pname] = seq
    return prots

(This isn't particularly elegant python IMO, but it works and is fine for an exercise).

Running the code in a local python interpreter:

Python 2.7.12 |Continuum Analytics, Inc.| (default, Jul  2 2016, 17:42:40)
[GCC 4.4.7 20120313 (Red Hat 4.4.7-1)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
Anaconda is brought to you by Continuum Analytics.
Please check out: http://continuum.io/thanks and https://anaconda.org
>>> file = './seqs.fa'
>>> import rfasta
>>> p = rfasta.read_FASTA(file)
>>> print(p)
{'olive': 'GTTGGGAGGCTATGTGTGACTGGAAGGACGTCCTGTCGGGTGGCGAGAAGCAAAGAATC', 'gorrila': 'GTTGGGAGGCTATGTGTGACTGGAAGGACATCCTGTCGGGTGGCGAGAAGCAGAGAATC', 'chimpanze': 'GTTGGGAGGCTGTGTGTGACTGGAAGGACGTCCTGTCGGGTGGCGAGAAGCAGAGAATC', 'mutant': 'GTTGGGAGGCTATGTGTTGACTGGAAGGACGTCCTGTCGGGTGGCGAGAAGCAGAGAATC', 'human': 'GTTGGGAGGCTATGTGTGACTGGAAGGACGTCCTGTCGGGTGGCGAGAAGCAGAGAATC'}

As you can see, the fasta file is parsed in to a dictionary without any problem.

I was running this on a Linux box, so the filepath syntax etc will be different as you're doing it on Windows, but the principle is the same.

ADD COMMENTlink written 10 months ago by Joe14k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2298 users visited in the last hour