Read DNA sequence from FASTA rising a subclass?
0
0
Entering edit mode
3.5 years ago
Gonçalo • 0

Hello everyone,

I am supposed to write a function that takes a name of a file (FASTA) as an argument. When passed the name of the file, the function should read the file, discard the header and return the sequence as a string. Now, I am being asked to rise a predefined (subclass?) (defined before my code) if the sequence part of the file contains characters that are not of the letters A,C,T,G,U. Also, all U nucleotides should be replaced by T in the returned string. I think I am on the right track but have no idea how to incorporate this subclass in my code if any of the letters are not A,C,T,G,U. I am working with a small file before defining the function but this is what I have got:

This is defined before my code:

# Run this cell to define the exception
class BadSequenceException(Exception):
    pass

#my code:
file = open("sequence1.fasta")
all_lines = file.readlines()

sequences = []

with open('sequence1.fasta', 'r') as seq:
    sequence = ''

for line in seq:
    if line.startswith('>'):
        sequences.append(sequence)
        sequence = ''
    else:
        sequence += line.strip()

def check (sequence, code="ATGCU"):
    for x in sequence:
        if x not in code:
            return False

return sequence.replace("U","T")

check(sequence)

I presume that the subclasse must be raised where the RETURN FALSE is?

Also, BadSequenceException is a subclass of the class Exception and inherits all its functionalities right? Any guidance on this would be very much appreciated. Thank you so much.

FASTA python • 1.0k views
ADD COMMENT
0
Entering edit mode

Hi! Is this the script that you use? If so, the def check part should be moved to the top.

Also if you're running check after reading the entire file, I think you should run it as you read each line (before sequence += line.strip()).

https://stackoverflow.com/questions/23657545/classes-with-exception

ADD REPLY
0
Entering edit mode

Thank so much for your help. I will look at it carefully once I get back home after work :)

ADD REPLY
0
Entering edit mode

That was very helpful thank you so much.

ADD REPLY
0
Entering edit mode

Indeed, instead of the return False you'd raise BadSequenceException(x + " is not a valid nucleobase") (or something like that).

In addition to that, do you really want to add the empty sequence (upon encountering the first sequence header >) to the set of sequences?

ADD REPLY
0
Entering edit mode

That makes sense and helped me a lot thank you very much :)

ADD REPLY

Login before adding your answer.

Traffic: 1847 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6