Question: Can't make program in Python to read FASTA
0
gravatar for hdinis09
3 months ago by
hdinis090
hdinis090 wrote:

My professor wants us to make a program that analyses triplets in sequences. The first step is to open and read the actual sequence and he provides these codes.

def read_FASTA(fname):
begin = True
prots = {}
fil = open(fname, "rt")
lins = fil.readlines()
fil.close()
for lin in lins:
    slin = lin.strip()
    if slin[0] == '>':
        if begin == False:
            prots[pname] = seq
        seq = ""
        pname = slin[1:].strip()
        begin = False
    else:
        seq = seq + slin
prots[pname] = seq
return prots

---SECOND CODE---

    import rfasta
prots = rfasta.read_FASTA(‘C:\Users\hdini\Desktop\aasd\\proteinas.fasta’)
for prot in prots:
    print(prot)

After downloading the fasta file from Uniprot(one or multiple), these are my results:

def read_FASTA("proteinas.fasta"):
    begin = True
    prots = {}
    fil = open("proteinas.fasta", "rt")
    lins = fil.readlines()
    fil.close()
    for lin in lins:
        slin = lin.strip()
        if slin[0] == '>':
            if begin == False:
                prots[pname] = seq
            seq = ""
            pname = slin[1:].strip()
            begin = False
        else:
            seq = seq + slin
    prots[pname] = seq
    return prots

--Second result--

import rfasta
prots = rfasta.read_FASTA(‘C:\Users\hdini\Desktop\aasd\\proteinas.fasta')
for prot in prots:
    print(prot)
sequence python fasta • 339 views
ADD COMMENTlink modified 3 months ago by Dattatray Mongad260 • written 3 months ago by hdinis090

I suspect this is just the tip of the iceberg, but at least part of your problem is that you are passing your file path with a mixture of quotes and backticks:

" = double quote
' = single quote
` = backtick

Do not mix these up. It should look something like:

function('C:\Path\to\file.fasta')

——

Just a pointer about using the forum, please don’t screenshot code/output, instead copy and paste the text and format it appropriately as I have done above.

ADD REPLYlink modified 3 months ago • written 3 months ago by jrj.healey11k

But i did put in single quotes!

ADD REPLYlink written 3 months ago by hdinis090

Hummm.. perhaps it was just that screenshot making it look odd then (another reason to copy the raw text).

And what was the error you were getting again?

ADD REPLYlink modified 3 months ago • written 3 months ago by jrj.healey11k

Hello hdinis09 ,

you forgot to tell us, what your question/problem is :)

fin swimmer

ADD REPLYlink written 3 months ago by finswimmer11k
3
gravatar for Dattatray Mongad
3 months ago by
National Centre for Cell Science, Pune
Dattatray Mongad260 wrote:

use biopython:

from Bio import SeqIO
     for records in SeqIO.Parse("fastaFileName","fasta"):
         print( records.id )
         print( records.seq )
ADD COMMENTlink modified 3 months ago by h.mon24k • written 3 months ago by Dattatray Mongad260

This is the better suggestion, but as the task is an assignment with code given specifically, I'm guessing this isn't an option.

ADD REPLYlink written 3 months ago by jrj.healey11k
1
gravatar for jrj.healey
3 months ago by
jrj.healey11k
United Kingdom
jrj.healey11k wrote:

You haven't shown us the project structure, and rfasta isn't an existing package, so I'm guessing the script itself is called rfasta and you're trying to import it locally?

The code works for me (after a little re-indentation compared to your post which was probably lost in translation).

Given the following:

Input file

>mutant
GTTGGGAGGCTATGTGTTGACTGGAAGGACGTCCTGTCGGGTGGCGAGAAGCAGAGAATC
>gorrila
GTTGGGAGGCTATGTGTGACTGGAAGGACATCCTGTCGGGTGGCGAGAAGCAGAGAATC
>chimpanze
GTTGGGAGGCTGTGTGTGACTGGAAGGACGTCCTGTCGGGTGGCGAGAAGCAGAGAATC
>human
GTTGGGAGGCTATGTGTGACTGGAAGGACGTCCTGTCGGGTGGCGAGAAGCAGAGAATC
>olive
GTTGGGAGGCTATGTGTGACTGGAAGGACGTCCTGTCGGGTGGCGAGAAGCAAAGAATC

The 'reader' code in a file called rfasta.py in the current working directory:

def read_FASTA(fname):
    begin = True
    prots = {}
    fil = open(fname, "rt")
    lins = fil.readlines()
    fil.close()
    for lin in lins:
        slin = lin.strip()
        if slin[0] == '>':
            if begin == False:
                prots[pname] = seq
            seq = ""
            pname = slin[1:].strip()
            begin = False
        else:
            seq = seq + slin
    prots[pname] = seq
    return prots

(This isn't particularly elegant python IMO, but it works and is fine for an exercise).

Running the code in a local python interpreter:

Python 2.7.12 |Continuum Analytics, Inc.| (default, Jul  2 2016, 17:42:40)
[GCC 4.4.7 20120313 (Red Hat 4.4.7-1)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
Anaconda is brought to you by Continuum Analytics.
Please check out: http://continuum.io/thanks and https://anaconda.org
>>> file = './seqs.fa'
>>> import rfasta
>>> p = rfasta.read_FASTA(file)
>>> print(p)
{'olive': 'GTTGGGAGGCTATGTGTGACTGGAAGGACGTCCTGTCGGGTGGCGAGAAGCAAAGAATC', 'gorrila': 'GTTGGGAGGCTATGTGTGACTGGAAGGACATCCTGTCGGGTGGCGAGAAGCAGAGAATC', 'chimpanze': 'GTTGGGAGGCTGTGTGTGACTGGAAGGACGTCCTGTCGGGTGGCGAGAAGCAGAGAATC', 'mutant': 'GTTGGGAGGCTATGTGTTGACTGGAAGGACGTCCTGTCGGGTGGCGAGAAGCAGAGAATC', 'human': 'GTTGGGAGGCTATGTGTGACTGGAAGGACGTCCTGTCGGGTGGCGAGAAGCAGAGAATC'}

As you can see, the fasta file is parsed in to a dictionary without any problem.

I was running this on a Linux box, so the filepath syntax etc will be different as you're doing it on Windows, but the principle is the same.

ADD COMMENTlink written 3 months ago by jrj.healey11k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1458 users visited in the last hour