Question: Removing all stop codons from Sequence Record using Biopython
1
gravatar for ckan91
13 months ago by
ckan9120
United States
ckan9120 wrote:

Hello Everyone,

I have sequences that occasionally have an erronious stop codon. Is there a way to filter a biopython Sequence Record of all stop codons?

Edit: The sequence is in frame and I would like to remove the whole codon for all sequences in the SeqRec. Apologies for the lack of clarity.

Thank you so much! Chris

biopython • 968 views
ADD COMMENTlink modified 13 months ago • written 13 months ago by ckan9120
0
gravatar for Bastien Hervé
13 months ago by
Bastien Hervé3.7k
Limoges, CBRS, France
Bastien Hervé3.7k wrote:

More information are necessary here, but assuming you don't want them to be in phase, try something like this :

from Bio.Seq import Seq
from Bio import SeqIO
from Bio.SeqRecord import SeqRecord

codon_stop_array=["TAG","TGA","TAA"]
record_without_stop=[]
record_with_stop=[]

for record in SeqIO.parse("your_fasta_file.fasta", "fasta"):
    if any(codon in record.seq for codon in codon_stop_array):
        record_with_stop.append(record)
    else:
        record_without_stop.append(record)
ADD COMMENTlink written 13 months ago by Bastien Hervé3.7k

Thank you for your help!

ADD REPLYlink written 13 months ago by ckan9120
0
gravatar for Selenocysteine
13 months ago by
Dublin, Ireland
Selenocysteine550 wrote:

Bastien is right, there are many unclear points in your question (is the sequence already in frame? Do you want to remove the whole codon or just 1 nucleotide?) etc. Assuming that your sequence is already in frame you can do this:

from Bio.Seq import Seq
from Bio import SeqIO
from Bio.SeqRecord import SeqRecord

codon_stop_array = ["TAG", "TGA", "TAA", "UGA", "UAA", "UAG"]

for record in SeqIO.parse("my_fasta_file.fasta", "fasta"):
    print(record.seq)
    tempRecordSeq = list(record.seq)
    for index in range(0, len(record.seq), 3):
        codon = record.seq[index:index+3]
        if codon in codon_stop_array:
            del tempRecordSeq[index:index+3]
    record.seq = Seq("".join(tempRecordSeq))

but this will also remove the last stop codon.

ADD COMMENTlink written 13 months ago by Selenocysteine550

Thank you for your help!

ADD REPLYlink written 13 months ago by ckan9120
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1149 users visited in the last hour