Question: How to convert a multiFASTA file with DNA (nucleotide) sequences into a multiFASTA with protein (amino acid) sequences?
gravatar for Alec Watanabe
2.2 years ago by
Alec Watanabe60 wrote:

Hello guys,

I have a multiFASTA file with DNA (nucleotide) sequences and I want to convert these sequences to protein (AA) sequences. I need to do this conversion because MEDpipe (a webtool) apparently only accepts FASTA file with AA sequences (I presume), because I keep getting errors when I submit the file. Since this multiFASTA file has over 7000 sequences, it would be really time consuming to translate one sequence each time (most webtools I've found only accepts one sequence per time). Is there any way to do this? Here is an example of the file I have:

ADD COMMENTlink modified 2.2 years ago by genomax80k • written 2.2 years ago by Alec Watanabe60

For future reference: highlight text you want to code and then click on the 101 button in editor window to format it correctly.

ADD REPLYlink written 2.2 years ago by genomax80k

Thank you for the tip, I'm new to the community so I'm still learning, but I'll apply this knowledge now in my future posts!

ADD REPLYlink written 2.2 years ago by Alec Watanabe60
gravatar for genomax
2.2 years ago by
United States
genomax80k wrote:

You can use EMBOSS transeq via web interface or by downloading/installing EMBOSS.

I got the following using your sequences via web interface in one pass:

ADD COMMENTlink modified 2.2 years ago • written 2.2 years ago by genomax80k

Dear genomax,

Thank you for your response. Do you know if downloading/installing EMBOSS will let me use any file size? The complete multiFASTA file I intend to use has around 8MB. Also, does these '' symbols mean that the nucleotide sequence was not able to be converted? Some webtools use '' symbol to indicate that the sequence has a region with frameshift.

ADD REPLYlink written 2.2 years ago by Alec Watanabe60

Yes a local copy of EMBOSS should allow you to use large files. You could break your file up for multiple web submissions.

Above was a simple single frame translation. The * should be stop codons.

ADD REPLYlink written 2.2 years ago by genomax80k

Hi again! This is just an update.

I've tested EMBOSS transeq and I've concluded that it doesn't work efficiently for this specific purpose. When we translate the sequence, the output contains too many - (dash) symbols. I don't know if there is any way to correctly translate a multiFASTA file, but apparently it's best to use an EMBL file and convert it to multiFASTA (exporting sequences as AA) using any software that allows visualization as previously discussed in another post of mine.

ADD REPLYlink written 2.2 years ago by Alec Watanabe60

I think you should not be getting any - in your output. See the transeq help page here. Something is not right.

To reiterate my comment from your other thread: If these are standard reference genomes from human microbiome project then why are you not getting the protein fasta files from NCBI? Those sequences will be full protein sequences without these stop codons.

ADD REPLYlink modified 2.2 years ago • written 2.2 years ago by genomax80k

you should not indeed.

Are there any IUPAC code bases in your sequence? or perhaps some 'hidden' characters (did you open/manipulate the sequenes in a dos/windows environment?)

ADD REPLYlink written 2.2 years ago by lieven.sterck7.3k

I just want to add that to have a meaningful conversion of this DNA to protein , it is probably key to first identify the correct translation frame. It does not really make sense to translate DNA into protein without taking the frame into account.

MEDpipe indeed requires a protein file as input

ADD REPLYlink written 2.2 years ago by lieven.sterck7.3k

Indeed. I was thinking about changing from MEDpipe to inmembrane. From what I understand, inmembrane uses the same SurfG+ software and other dependencies, it also has a gram-negative protocol and apparently it works with a DNA multifasta file. Both tools generates a .csv file with a similar result output.

ADD REPLYlink written 2.2 years ago by Alec Watanabe60
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1934 users visited in the last hour