Question: Can't make program in Python to read FASTA
0
3 months ago by
hdinis090
hdinis090 wrote:

My professor wants us to make a program that analyses triplets in sequences. The first step is to open and read the actual sequence and he provides these codes.

def read_FASTA(fname):
begin = True
prots = {}
fil = open(fname, "rt")
fil.close()
for lin in lins:
slin = lin.strip()
if slin[0] == '>':
if begin == False:
prots[pname] = seq
seq = ""
pname = slin[1:].strip()
begin = False
else:
seq = seq + slin
prots[pname] = seq
return prots


---SECOND CODE---

    import rfasta
for prot in prots:
print(prot)


def read_FASTA("proteinas.fasta"):
begin = True
prots = {}
fil = open("proteinas.fasta", "rt")
fil.close()
for lin in lins:
slin = lin.strip()
if slin[0] == '>':
if begin == False:
prots[pname] = seq
seq = ""
pname = slin[1:].strip()
begin = False
else:
seq = seq + slin
prots[pname] = seq
return prots


--Second result--

import rfasta
for prot in prots:
print(prot)

sequence python fasta • 339 views
modified 3 months ago by Dattatray Mongad260 • written 3 months ago by hdinis090

I suspect this is just the tip of the iceberg, but at least part of your problem is that you are passing your file path with a mixture of quotes and backticks:

" = double quote
' = single quote
 = backtick


Do not mix these up. It should look something like:

function('C:\Path\to\file.fasta')

——

Just a pointer about using the forum, please don’t screenshot code/output, instead copy and paste the text and format it appropriately as I have done above.

But i did put in single quotes!

Hummm.. perhaps it was just that screenshot making it look odd then (another reason to copy the raw text).

And what was the error you were getting again?

Hello hdinis09 ,

you forgot to tell us, what your question/problem is :)

fin swimmer

3
3 months ago by
National Centre for Cell Science, Pune

use biopython:

from Bio import SeqIO
for records in SeqIO.Parse("fastaFileName","fasta"):
print( records.id )
print( records.seq )


This is the better suggestion, but as the task is an assignment with code given specifically, I'm guessing this isn't an option.

1
3 months ago by
jrj.healey11k
United Kingdom
jrj.healey11k wrote:

You haven't shown us the project structure, and rfasta isn't an existing package, so I'm guessing the script itself is called rfasta and you're trying to import it locally?

The code works for me (after a little re-indentation compared to your post which was probably lost in translation).

Given the following:

#### Input file

>mutant
GTTGGGAGGCTATGTGTTGACTGGAAGGACGTCCTGTCGGGTGGCGAGAAGCAGAGAATC
>gorrila
GTTGGGAGGCTATGTGTGACTGGAAGGACATCCTGTCGGGTGGCGAGAAGCAGAGAATC
>chimpanze
GTTGGGAGGCTGTGTGTGACTGGAAGGACGTCCTGTCGGGTGGCGAGAAGCAGAGAATC
>human
GTTGGGAGGCTATGTGTGACTGGAAGGACGTCCTGTCGGGTGGCGAGAAGCAGAGAATC
>olive
GTTGGGAGGCTATGTGTGACTGGAAGGACGTCCTGTCGGGTGGCGAGAAGCAAAGAATC


#### The 'reader' code in a file called rfasta.py in the current working directory:

def read_FASTA(fname):
begin = True
prots = {}
fil = open(fname, "rt")
fil.close()
for lin in lins:
slin = lin.strip()
if slin[0] == '>':
if begin == False:
prots[pname] = seq
seq = ""
pname = slin[1:].strip()
begin = False
else:
seq = seq + slin
prots[pname] = seq
return prots


(This isn't particularly elegant python IMO, but it works and is fine for an exercise).

#### Running the code in a local python interpreter:

Python 2.7.12 |Continuum Analytics, Inc.| (default, Jul  2 2016, 17:42:40)
[GCC 4.4.7 20120313 (Red Hat 4.4.7-1)] on linux2
Anaconda is brought to you by Continuum Analytics.
Please check out: http://continuum.io/thanks and https://anaconda.org
>>> file = './seqs.fa'
>>> import rfasta
>>> print(p)
{'olive': 'GTTGGGAGGCTATGTGTGACTGGAAGGACGTCCTGTCGGGTGGCGAGAAGCAAAGAATC', 'gorrila': 'GTTGGGAGGCTATGTGTGACTGGAAGGACATCCTGTCGGGTGGCGAGAAGCAGAGAATC', 'chimpanze': 'GTTGGGAGGCTGTGTGTGACTGGAAGGACGTCCTGTCGGGTGGCGAGAAGCAGAGAATC', 'mutant': 'GTTGGGAGGCTATGTGTTGACTGGAAGGACGTCCTGTCGGGTGGCGAGAAGCAGAGAATC', 'human': 'GTTGGGAGGCTATGTGTGACTGGAAGGACGTCCTGTCGGGTGGCGAGAAGCAGAGAATC'}
`

As you can see, the fasta file is parsed in to a dictionary without any problem.

I was running this on a Linux box, so the filepath syntax etc will be different as you're doing it on Windows, but the principle is the same.