Error when converting/translating .tsv ORFs to .faa
1
0
Entering edit mode
4.0 years ago
anabaena ▴ 10

Hey all, I am working on a script to translate a mass amount of .tsv files containing predicted ORFs using biopython. I keep running into the error that a 'module is not callable when the modules are correctly loaded. Everything works until I enter the translation phase of the script when I receive a module error. Below is the full error:

>       File "./convert_tsv_fasta.py", line 34, in <module>
>         SeqIO.write(proteins, inputfilepath + 'translation.faa', 'fasta')
>       File "/Users/zacharyhenning/miniconda3/lib/python3.7/site-packages/Bio/SeqIO/__init__.py",
> line 556, in write
>         for record in sequences:
>       File "./convert_tsv_fasta.py", line 32, in <genexpr>
>         for nuc_record in SeqIO.parse(outputfilepath, 'fasta')
>       File "./convert_tsv_fasta.py", line 16, in make_protein_record
>         id="translated_" + nuc_record.id,
>     TypeError: 'module' object is not callable

Here is my code:

from Bio import SeqIO
import pandas as pd
import numpy as np
import sys
from Bio.Seq import Seq
from Bio.Alphabet import generic_dna
from Bio import SeqIO
from Bio import SeqRecord

inputfilepath = sys.argv[1]
outputfilepath = inputfilepath + "_nucleotide.fasta"
#function being used in script
def make_protein_record(nuc_record):
    return SeqRecord(
        seq=nuc_record.seq.translate(to_stop=True, table='Bacterial'),
        id="translated_" + nuc_record.id,
        )
#initialize DF skipping first few rows

df = pd.DataFrame(pd.read_csv(inputfilepath, skiprows=5, delimiter='\t'))
df = df.drop(columns=['ContigCoord'])

#create dictionary to write to file

dict_y = df['Sequence'].to_dict()
for key, value in dict_y.items():
    with open(outputfilepath, 'a+') as handle:
        handle.write(">" + str(key) + "\n")
        handle.write(value + "\n" + "\n")
    handle.close()

#generator expression to translate files

proteins = (
    make_protein_record(nuc_record)
    for nuc_record in SeqIO.parse(outputfilepath, 'fasta')
)
SeqIO.write(proteins, inputfilepath + 'translation.faa', 'fasta')
biopython translation orfs pandas • 1.2k views
ADD COMMENT
0
Entering edit mode

could you please share a few lines of the tsv file that you are using?

ADD REPLY
0
Entering edit mode
########################################################################################### sample ID: XXXX corresponds to Pangaea ID: XXXX
# Included ENA Sample ID(s): XXX
# Included ENA Run ID(s): XXXX
########################################################################################### GeneID  ContigCoord     Sequence scaffold2_1_gene1     scaffold2_1 strand:+ start:3 stop:1310    CGAGAATTGGAGCGTTACCCAAGTCAGGAAGATCCTAATCTTCCGTCTCACGTACTTGATAGACTTGCGACAATATTTGGAGAAGCTATCTCTGCAACTGAAGCGGTATGGCGAATTATTGCAGATATCCGACTCCATGGGGACCAAGCAATTCTTGACTACACTAAGCGTATTGATGGAACTGAGCTAGTTAATCTAGACATACCGAGTGACACATGGGCTAGCGCTTCAGTTAATTTGAATGAAACTTTACGCGAGGCCTTATCGCTTTCATCGTCACGGATTTTTGATTTCCACCAAGCATGCCTGCCAAAAGATTGGTTTGATGGAAATTTGGGAGTAGGGCTCAAGCACGTTCCCATCGAAAGAGTTGGGATTTATATTCCTGGCGGCACTGCAACGTACCCATCTACAGTCCTGATGAGTGCAATACCGGCACGCGTAGCAGGGGTTAAAGAAATCGTTCTATGTACCCCAAACCCGACTGATTCGGTTTTAACTGCTGCACTTAATGCAAAAGTAGATCAGGTATTTCAAATTGGAGGAGCGCAAGCTATCGCTGCAATGGCATTTGGCAGTGAAAGTATCCCACGGGTGGACAAGATTGTCGGGCCAGGGAATATTTTTGTCTCGATTGCAAAGCGGCTTGTATATGGGAGTGTAGACATTGATGGTCTCTATGGTCCTACAGAAACATTGATAATTGCCGATGATTCAGTGAATCCACAAATAATTGCTTCCGATTTACTAGCACAGGCAGAGCATGACGATCTTGCTACTCCAATACTTATAACTTTTTCACGCGCTATAGCTAACCTCGTTAATGATCACATTGAAGAACAGTCAGCAAATATGCCTAGGGAATCGATTATCAAAAATTCTTTGGCCAACCAAGGCGCTATTCATCTTGTTTCTGATGTATCTGAGGCTATTAAGCTATCCAATGTATTCGCGCCAGAGCACCTCAGTTTACTTATACGAGATGCAGAGAAATACATTCCGCAAATCGAAAATGCAGGTGGAATCTTCGTAGGGGAAAACAGCCCTGAAGTTTTAGGGGATTATGTGATTGGACCCAGTCATGTGATGCCAACTGGTGGTACTGCAAGATTTGCCTCTAACTTGGGAATTAATTCCTTTCTCAAGCAGATTCCTATAATGAATTTGTCATCTTCAACAATGCTGCAACTTGCACCCGCTGCTGTCGAAATAGCTGGCATAGAAGGTCTAAGTGCACATTCTGCCTCTGCTGCAATACGGATTGAAGGGATTGAAAGTACTTCCGACGAAAAGGGGCAATCGAGCTGA
ADD REPLY
1
Entering edit mode

In from Bio import SeqRecord you should use Bio.SeqRecord import SeqRecord instead. Have you considered the six frame translations? you can use from Bio.SeqUtils import six_frame_translations

ADD REPLY
2
Entering edit mode
4.0 years ago
zorbax ▴ 610

If you want to save the sequence in fasta format and the translation using the function translate() of the method Seq

#!/usr/bin/env python3

import sys
import pandas as pd
from Bio import SeqIO
from Bio.Seq import Seq
from Bio.SeqRecord import SeqRecord

inputfilepath = sys.argv[1]
outputfilepath = inputfilepath.split(".")[0] + "_nucleotide.fasta"

df = pd.DataFrame(pd.read_csv(inputfilepath, skiprows=5, delimiter='\t'))
dict_y = dict(zip(df['GeneID'], df['Sequence']))

results = []
for key, value in dict_y.items():
    seq_value = Seq(value)
    record = SeqRecord(seq=seq_value.translate(to_stop=True, table='Bacterial'),
                       id="translated_" + key, description="")
    results.append(record)

    with open(outputfilepath, 'a+') as handle:
        handle.write(">" + str(key) + "\n")
        handle.write(value + "\n")

SeqIO.write(results, inputfilepath.split(".")[0] + '_translation.faa', 'fasta')
ADD COMMENT
0
Entering edit mode

Awesome! thanks you!

ADD REPLY

Login before adding your answer.

Traffic: 3327 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6