Translating many dna sequences using Biopython (in loop)
0
0
Entering edit mode
9.2 years ago
mlank001 • 0

I am new to Biopython and am trying to translate a set of dna sequences in a file, and write the translated results to an output file. I tried using Biopython, and am sure its possible, but its giving me errors. Here is my code:

Sequence_downstream_60_column = 6
Sequence_upstream_60_column = 7

outfile = open("C:\\Users\\mlank\\Desktop\\Python\\out.txt", 'rU')

outfile.readline()
sequence_m_list = []
sequence_p_list = []

for line in outfile:
    fields = line.split("\t")
    sequence_m = fields[Sequence_downstream_60_column]
    sequence_p = fields[Sequence_upstream_60_column]

    if not sequence_m in sequence_m_list:
        sequence_m_list.append(sequence_m)
    if not sequence_p in sequence_p_list:
        sequence_p_list.append(sequence_p)

output = open("output.txt", "w+")
output.write("Sequence_downstream" + "\t" + "Amino_acid_seq_downstream" + "\t" + "Sequence_upstream" + "\t" + "Amino_acid_seq_upstream")

from Bio.Seq import Seq
from Bio.Alphabet import IUPAC
for i in range(len(sequence_m_list)):
    seq_m = sequence_m_list[i]
    seq_p = sequence_p_list[i]

    translated_seq_m = seq_m.translate(table=1)
    translated_seq_p = seq_p.translate(table=1)

    output.write("\t" + sequence_m_list[i])
    output.write("\t" + translated_seq_m)
    output.write("\t" + sequence_p_list[i])
    output.write("\t" + translated_seq_p)

the translated_seq_m is giving me errors. On debugging, it says translated_seq_m is not defined. Where am I wrong? Any help is appreciated!

Thanks!

sequence python Biopython • 4.1k views
ADD COMMENT
0
Entering edit mode

Just one readline? That will only read the header (if any) or an anomalous line that might not actually contain a sequence in the columns you're fetching.

ADD REPLY
0
Entering edit mode

The idea was to skip the header(top line in the file, which consists of column titles)-- in outfile.readline(). There is no problem with this.

However, I made some changes to the script and it is now successfully running:

from Bio.Seq import Seq, Alphabet, IUPAC
from Bio.Alphabet import IUPAC
for i in range(len(sequence_m_list)):
    seq_m = Seq(sequence_m_list[i])
    seq_p = Seq(sequence_p_list[i])

    translated_seq_m = seq_m.translate()
    translated_seq_p = seq_p.translate()

    output.write("\n" + sequence_m_list[i])
    output.write("\t" + str(translated_seq_m))
    output.write("\t" + sequence_p_list[i])
    output.write("\t" + str(translated_seq_p))

Thanks!

ADD REPLY
0
Entering edit mode

Can you show the contents of your file? A few lines or an example?

ADD REPLY
0
Entering edit mode

The outfile was a tab-delimited file with defined columns. I was trying to run the loop through a specific column from that file(with thousands of nucleotide seq) and writing the translated amino acid seq in the output text file.

I made some changes to the script, and it works! Thanks for the attempt to answer, though!

ADD REPLY

Login before adding your answer.

Traffic: 1546 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6