Question: Error when converting/translating .tsv ORFs to .faa
0
gravatar for anabaena
8 weeks ago by
anabaena0
anabaena0 wrote:

Hey all, I am working on a script to translate a mass amount of .tsv files containing predicted ORFs using biopython. I keep running into the error that a 'module is not callable when the modules are correctly loaded. Everything works until I enter the translation phase of the script when I receive a module error. Below is the full error:

>       File "./convert_tsv_fasta.py", line 34, in <module>
>         SeqIO.write(proteins, inputfilepath + 'translation.faa', 'fasta')
>       File "/Users/zacharyhenning/miniconda3/lib/python3.7/site-packages/Bio/SeqIO/__init__.py",
> line 556, in write
>         for record in sequences:
>       File "./convert_tsv_fasta.py", line 32, in <genexpr>
>         for nuc_record in SeqIO.parse(outputfilepath, 'fasta')
>       File "./convert_tsv_fasta.py", line 16, in make_protein_record
>         id="translated_" + nuc_record.id,
>     TypeError: 'module' object is not callable

Here is my code:

from Bio import SeqIO
import pandas as pd
import numpy as np
import sys
from Bio.Seq import Seq
from Bio.Alphabet import generic_dna
from Bio import SeqIO
from Bio import SeqRecord

inputfilepath = sys.argv[1]
outputfilepath = inputfilepath + "_nucleotide.fasta"
#function being used in script
def make_protein_record(nuc_record):
    return SeqRecord(
        seq=nuc_record.seq.translate(to_stop=True, table='Bacterial'),
        id="translated_" + nuc_record.id,
        )
#initialize DF skipping first few rows

df = pd.DataFrame(pd.read_csv(inputfilepath, skiprows=5, delimiter='\t'))
df = df.drop(columns=['ContigCoord'])

#create dictionary to write to file

dict_y = df['Sequence'].to_dict()
for key, value in dict_y.items():
    with open(outputfilepath, 'a+') as handle:
        handle.write(">" + str(key) + "\n")
        handle.write(value + "\n" + "\n")
    handle.close()

#generator expression to translate files

proteins = (
    make_protein_record(nuc_record)
    for nuc_record in SeqIO.parse(outputfilepath, 'fasta')
)
SeqIO.write(proteins, inputfilepath + 'translation.faa', 'fasta')
ADD COMMENTlink modified 8 weeks ago by zorbax130 • written 8 weeks ago by anabaena0

could you please share a few lines of the tsv file that you are using?

ADD REPLYlink written 8 weeks ago by zorbax130
########################################################################################### sample ID: XXXX corresponds to Pangaea ID: XXXX
# Included ENA Sample ID(s): XXX
# Included ENA Run ID(s): XXXX
########################################################################################### GeneID  ContigCoord     Sequence scaffold2_1_gene1     scaffold2_1 strand:+ start:3 stop:1310    CGAGAATTGGAGCGTTACCCAAGTCAGGAAGATCCTAATCTTCCGTCTCACGTACTTGATAGACTTGCGACAATATTTGGAGAAGCTATCTCTGCAACTGAAGCGGTATGGCGAATTATTGCAGATATCCGACTCCATGGGGACCAAGCAATTCTTGACTACACTAAGCGTATTGATGGAACTGAGCTAGTTAATCTAGACATACCGAGTGACACATGGGCTAGCGCTTCAGTTAATTTGAATGAAACTTTACGCGAGGCCTTATCGCTTTCATCGTCACGGATTTTTGATTTCCACCAAGCATGCCTGCCAAAAGATTGGTTTGATGGAAATTTGGGAGTAGGGCTCAAGCACGTTCCCATCGAAAGAGTTGGGATTTATATTCCTGGCGGCACTGCAACGTACCCATCTACAGTCCTGATGAGTGCAATACCGGCACGCGTAGCAGGGGTTAAAGAAATCGTTCTATGTACCCCAAACCCGACTGATTCGGTTTTAACTGCTGCACTTAATGCAAAAGTAGATCAGGTATTTCAAATTGGAGGAGCGCAAGCTATCGCTGCAATGGCATTTGGCAGTGAAAGTATCCCACGGGTGGACAAGATTGTCGGGCCAGGGAATATTTTTGTCTCGATTGCAAAGCGGCTTGTATATGGGAGTGTAGACATTGATGGTCTCTATGGTCCTACAGAAACATTGATAATTGCCGATGATTCAGTGAATCCACAAATAATTGCTTCCGATTTACTAGCACAGGCAGAGCATGACGATCTTGCTACTCCAATACTTATAACTTTTTCACGCGCTATAGCTAACCTCGTTAATGATCACATTGAAGAACAGTCAGCAAATATGCCTAGGGAATCGATTATCAAAAATTCTTTGGCCAACCAAGGCGCTATTCATCTTGTTTCTGATGTATCTGAGGCTATTAAGCTATCCAATGTATTCGCGCCAGAGCACCTCAGTTTACTTATACGAGATGCAGAGAAATACATTCCGCAAATCGAAAATGCAGGTGGAATCTTCGTAGGGGAAAACAGCCCTGAAGTTTTAGGGGATTATGTGATTGGACCCAGTCATGTGATGCCAACTGGTGGTACTGCAAGATTTGCCTCTAACTTGGGAATTAATTCCTTTCTCAAGCAGATTCCTATAATGAATTTGTCATCTTCAACAATGCTGCAACTTGCACCCGCTGCTGTCGAAATAGCTGGCATAGAAGGTCTAAGTGCACATTCTGCCTCTGCTGCAATACGGATTGAAGGGATTGAAAGTACTTCCGACGAAAAGGGGCAATCGAGCTGA
ADD REPLYlink modified 8 weeks ago • written 8 weeks ago by anabaena0
1

In from Bio import SeqRecord you should use Bio.SeqRecord import SeqRecord instead. Have you considered the six frame translations? you can use from Bio.SeqUtils import six_frame_translations

ADD REPLYlink written 8 weeks ago by zorbax130
2
gravatar for zorbax
8 weeks ago by
zorbax130
Mexico
zorbax130 wrote:

If you want to save the sequence in fasta format and the translation using the function translate() of the method Seq

#!/usr/bin/env python3

import sys
import pandas as pd
from Bio import SeqIO
from Bio.Seq import Seq
from Bio.SeqRecord import SeqRecord

inputfilepath = sys.argv[1]
outputfilepath = inputfilepath.split(".")[0] + "_nucleotide.fasta"

df = pd.DataFrame(pd.read_csv(inputfilepath, skiprows=5, delimiter='\t'))
dict_y = dict(zip(df['GeneID'], df['Sequence']))

results = []
for key, value in dict_y.items():
    seq_value = Seq(value)
    record = SeqRecord(seq=seq_value.translate(to_stop=True, table='Bacterial'),
                       id="translated_" + key, description="")
    results.append(record)

    with open(outputfilepath, 'a+') as handle:
        handle.write(">" + str(key) + "\n")
        handle.write(value + "\n")

SeqIO.write(results, inputfilepath.split(".")[0] + '_translation.faa', 'fasta')
ADD COMMENTlink written 8 weeks ago by zorbax130

Awesome! thanks you!

ADD REPLYlink written 8 weeks ago by anabaena0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 909 users visited in the last hour