Question

Translating many dna sequences using Biopython (in loop)

0

Entering edit mode

9.2 years ago

mlank001 • 0

I am new to Biopython and am trying to translate a set of dna sequences in a file, and write the translated results to an output file. I tried using Biopython, and am sure its possible, but its giving me errors. Here is my code:

Sequence_downstream_60_column = 6
Sequence_upstream_60_column = 7

outfile = open("C:\\Users\\mlank\\Desktop\\Python\\out.txt", 'rU')

outfile.readline()
sequence_m_list = []
sequence_p_list = []

for line in outfile:
    fields = line.split("\t")
    sequence_m = fields[Sequence_downstream_60_column]
    sequence_p = fields[Sequence_upstream_60_column]

    if not sequence_m in sequence_m_list:
        sequence_m_list.append(sequence_m)
    if not sequence_p in sequence_p_list:
        sequence_p_list.append(sequence_p)

output = open("output.txt", "w+")
output.write("Sequence_downstream" + "\t" + "Amino_acid_seq_downstream" + "\t" + "Sequence_upstream" + "\t" + "Amino_acid_seq_upstream")

from Bio.Seq import Seq
from Bio.Alphabet import IUPAC
for i in range(len(sequence_m_list)):
    seq_m = sequence_m_list[i]
    seq_p = sequence_p_list[i]

    translated_seq_m = seq_m.translate(table=1)
    translated_seq_p = seq_p.translate(table=1)

    output.write("\t" + sequence_m_list[i])
    output.write("\t" + translated_seq_m)
    output.write("\t" + sequence_p_list[i])
    output.write("\t" + translated_seq_p)

the translated_seq_m is giving me errors. On debugging, it says translated_seq_m is not defined. Where am I wrong? Any help is appreciated!

Thanks!

sequence python Biopython • 4.1k views

ADD COMMENT • link updated 2.0 years ago by Ram 43k • written 9.2 years ago by mlank001 • 0

0

Entering edit mode

Just one readline? That will only read the header (if any) or an anomalous line that might not actually contain a sequence in the columns you're fetching.

ADD REPLY • link 2.0 years ago by Ram 43k

0

Entering edit mode

The idea was to skip the header(top line in the file, which consists of column titles)-- in outfile.readline(). There is no problem with this.

However, I made some changes to the script and it is now successfully running:

from Bio.Seq import Seq, Alphabet, IUPAC
from Bio.Alphabet import IUPAC
for i in range(len(sequence_m_list)):
    seq_m = Seq(sequence_m_list[i])
    seq_p = Seq(sequence_p_list[i])

    translated_seq_m = seq_m.translate()
    translated_seq_p = seq_p.translate()

    output.write("\n" + sequence_m_list[i])
    output.write("\t" + str(translated_seq_m))
    output.write("\t" + sequence_p_list[i])
    output.write("\t" + str(translated_seq_p))

Thanks!

ADD REPLY • link updated 2.0 years ago by Ram 43k • written 9.2 years ago by mlank001 • 0

0

Entering edit mode

Can you show the contents of your file? A few lines or an example?

ADD REPLY • link updated 2.0 years ago by Ram 43k • written 9.2 years ago by GouthamAtla 12k

0

Entering edit mode

The outfile was a tab-delimited file with defined columns. I was trying to run the loop through a specific column from that file(with thousands of nucleotide seq) and writing the translated amino acid seq in the output text file.

I made some changes to the script, and it works! Thanks for the attempt to answer, though!

ADD REPLY • link updated 2.0 years ago by Ram 43k • written 9.2 years ago by mlank001 • 0