Question: Parsing FASTA file using class in Python
0
gravatar for mrth
2.6 years ago by
mrth30
mrth30 wrote:

Hello, I am new to the world of Biopython and Python in general. I am try to parse a fasta file using class. I have got the following code so far:

from itertools import groupby

class OpenFastaFile:
    def __init__(self, path):
        self.path = path
        self._map = {}
        __fasta_sequences = self.__fasta_iter()

    def __str__(self):
        return self._map.__str__()

    def __fasta_iter(self):
        fh = open(self.path)
        faiter = (x[1] for x in groupby(fh, lambda line: line[0] == ">"))
        for header in faiter:
            header = header.__next__()[1:].strip()
            seq = "".join(s.strip() for s in faiter.__next__())
            self._map[header] = seq

of = OpenFastaFile("sample.fa")
print(of)

However, I receive this output:

Traceback (most recent call last):
  File "UniProtFile.py", line 25, in <module>
    of = OpenFastaFile("sample.fa")
  File "UniProtFile.py", line 12, in __init__
    __fasta_sequences = self.__fasta_iter()
  File "UniProtFile.py", line 21, in __fasta_iter
    header = header.__next__()[1:].strip()
AttributeError: 'itertools._grouper' object has no attribute '__next__'

Process finished with exit code 1

My expected output was something along the lines of a dictionary like this: {'name' : 'ACCAGT' , 'name1' : 'ACGGCTA', etc}

Can someone please show me the error of my ways?

ADD COMMENTlink modified 2.6 years ago by Matt Shirley9.1k • written 2.6 years ago by mrth30

I would guess that faiter in the following line should be header:

seq = "".join(s.strip() for s in faiter.__next__())
ADD REPLYlink written 2.6 years ago by WouterDeCoster41k

Thanks for replying! Unfortunately, it still gives the same error message.

ADD REPLYlink written 2.6 years ago by mrth30

Is there any reason that you're trying to implement this as a class?

ADD REPLYlink written 2.6 years ago by Joe14k
3
gravatar for mrth
2.6 years ago by
mrth30
mrth30 wrote:

For those wondering, I have solved the problem. I used this code:

from itertools import groupby
class FastaFile:
    def __init__(self, path):
        self.path = path
        self._map = {}
        self.__fasta_iter()
    def __str__(self):
        return self._map.__str__()
    def __fasta_iter(self):
        fh = open(self.path)
        faiter = (x[1] for x in groupby(fh, lambda line: line[0] == ">"))
        for header in faiter:
            header = header.next()[1:].strip()
            seq = "".join(s.strip() for s in faiter.next())
            self._map[header] = seq
ff = FastaFile("sample.fa")
print (ff)
ADD COMMENTlink written 2.6 years ago by mrth30
2
gravatar for Matt Shirley
2.6 years ago by
Matt Shirley9.1k
Cambridge, MA
Matt Shirley9.1k wrote:

If you want a fasta file to act like a sequence dictionary, just use pyfaidx:

import pyfaidx
fa = pyfaidx.Fasta("sample.fa")
for key in fa:
  print(key) # sequence name
  print(fa[key]) # sequence object

You'll be using an efficient method that doesn't read all of your sequences into memory unless you access them.

ADD COMMENTlink written 2.6 years ago by Matt Shirley9.1k

Thanks for this information. I never knew pyfaidx existed.

ADD REPLYlink written 2.6 years ago by mrth30
1
gravatar for Pallab Bhowmick
2.6 years ago by
Canada
Pallab Bhowmick20 wrote:

Hi you can try my following code to generate your result:

   from Bio import SeqIO
      seqdic={}
       with open('sample.fa', 'r') as input_fasta_file:
            for seq_record in SeqIO.parse(input_fasta_file, 'fasta'):
                header = seq_record.id
                seqs = str(seq_record.seq)
                seqdic[header]=seqs
  
ADD COMMENTlink modified 2.6 years ago • written 2.6 years ago by Pallab Bhowmick20
1

As a comment: SeqIO.parse also takes a filename as input, not necessarily a file handle. So you could do

for seq_record in SeqIO.parse('sample.fa', 'fasta'):

You could also "simplify" your code using a dict comprehension, faster and more concise.

seqdic={seq_record.id: str(seq_record.seq) for seq_record in SeqIO.parse('sample.fa', 'fasta')}
ADD REPLYlink written 2.6 years ago by WouterDeCoster41k
1

I was going to post a code similar to this, but the OP's question seemed as if it was an assignment because it is overly complicated.

ADD REPLYlink written 2.6 years ago by st.ph.n2.5k

You were correct. It is for an assignment. I usually use with open to parse my files but need to try something new this time around - classes.

ADD REPLYlink written 2.6 years ago by mrth30

Hi there, I really appreciate the response! However, I'm trying to use class with magic methods to parse it as I need to add it somewhere within my code (for an assessment). I feel as though I am really close but yet so far away!

ADD REPLYlink written 2.6 years ago by mrth30
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 847 users visited in the last hour