Question: parse of file multifasta in python
0
gravatar for erick_rc93
20 months ago by
erick_rc9310
erick_rc9310 wrote:

I'm trying to parse a multifasta file with python with the next script

import re

def loadFasta(filename):
    if (filename.endswith(".gz")):
        fp = gzip.open(filename, 'rb')
    else:
        fp = open(filename, 'rb')
    # split at headers
    data = fp.read().split(">")
    fp.close()
    # ignore whatever appears before the 1st header
    data.pop(0)     
    headers = []
    sequences = []
    for sequence in data:
        lines = sequence.split('\n')
        headers.append(lines.pop(0))
        # add an extra "+" to make string "1-referenced"
        sequences.append('+' + ''.join(lines))
    return (headers, sequences)

header, seq = loadFasta("/path/to/fasta/all_chromosomes.fasta")

for i in xrange(len(header)):
    print (header[i])
    print (len(seq[i])-1, "bases", seq[i][:30], "...", seq[i][-30:])
    print

genome = seq[0]

But when I try to run the above script I get the next message error

Traceback (most recent call last):
  File "parser_fasta.py", line 23, in <module>
    header, seq = loadFasta("/path/to/fasta/all_chromosomes.fasta")
  File "parser_fasta.py", line 10, in loadFasta
    data = fp.read().split(">")
TypeError: a bytes-like object is required, not 'str'
sequence • 743 views
ADD COMMENTlink modified 20 months ago by finswimmer14k • written 20 months ago by erick_rc9310
1

any reason you didn't try https://biopython.org/wiki/SeqIO?

ADD REPLYlink written 20 months ago by Kevin640
1
gravatar for finswimmer
20 months ago by
finswimmer14k
Germany
finswimmer14k wrote:

You open your file in binary mode with rb. This isn't necessary, just use r or omit this parameter, because that's the default.

  • What also works is to use data = fp.read().decode().split(">")
  • When working with files, it is good practice to use the withstatement because this takes care about closing you file correct
  • As Kevin said before, I would highly recommend using an existing module for handling fasta files like biopython.

fin swimmer

ADD COMMENTlink written 20 months ago by finswimmer14k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 905 users visited in the last hour