convert fasta dna files to protein
1
0
Entering edit mode
6.1 years ago
genya35 ▴ 40

Hello,

I have 200 human fasta dna files from a region of chr6. Each sequence is 5,500 bp each. I've combined these fasta files and uploaded them into Clustal Omega to generate multiple sequence alignments and phylogenetic tree.

It worked well, however, I would like to convert these sequences into protein and highlights epitopes present in the sequences. What is the best tool to used for this purpose? What format do I need to select for the output?

alignment • 11k views
1
Entering edit mode

I am not sure how many sequences you have but you can achieve this with MEGA

0
Entering edit mode

MEGA is cool, thanks

0
Entering edit mode

Could you please explain what MEGA is and how to find it. Are there other alternatives? Thanks

0
Entering edit mode

Please use ADD REPLY/ADD COMMENT when responding to existing posts to keep threads logically organized.

MEGA is a phylogenetic data analysis package. You can find it here. I am assuming this is the MEGA that @sridhar56 was referring to.

Doing alignments at the nucleotide level can be very different than doing them at the protein level. I hope you were planning to translate the sequences and then redo the MSA? You could use one the EMBOSS tools for doing the translation, if you need a web based option.

0
Entering edit mode

@genomax2 Thanks for the reply. You are referring to EMBOSS Transeq, correct? Should I combine all fasta files into one and then translate them, or do I have to do this one-by-one?

Please suggest if should use 1 frame with HLA DPB1 gene or multiple? Should I select 'Standard Code'?

0
Entering edit mode

If you know the frame you are interested in (and if all the files are in the same frame) then this may be easier to do. Transeq appears to accept multiple sequences from the web interface so you should be able to use a single multi-fasta format file (keeping the frame consideration in mind). Standard code should be fine.

Are the 200 files for the same region/gene or are there multiple locations present?

0
Entering edit mode

ExPASy has a simple Web interface translation tool with support for multiple tables if that is what you need.

I'm guessing you want something command line though?

0
Entering edit mode

yes, these files are for the same region. Thanks

0
Entering edit mode

Then give transeq a try.

0
Entering edit mode

unfortunately, EMBL is down today.:( Hopefully it will come back soon.

0
Entering edit mode
0
Entering edit mode

So far, I've tried EMBOS Transeq and the run aborted before generating anyting for some reason. I've combined multiple fasta files (size was under 1 MB). It worked when I tried it with a really small fasta file. I'm not sure what the problem is? I ran cat *.fas > output.txt to combine multiple fasta files and uploaded this file into Transeq. The message said it was processing data but a few minutes later I received an email about a failure. I will try again tomorrow.

I've also tried ExPasy tool and it generated an output on the screen but it's not clear to me how to download the result. Also, my final goal is to import the result into Culstal Omega to do the alignment and generate a tree, so the format of the output has to be compatible. Should I stick with ExPasy?

As far I understand, the translation has to happen first followed by the alignment in Clustal Omega, correct?

1
Entering edit mode

EBI Web sites appear to be undergoing maintenance at the moment. Use the ExPaSy result.

Option 1: Highlight and copy/paste the result data into a separate text file (I assume result is already in fasta format). Be sure to save the file in text format.

Option 2: Choose "file" --> "Save Page as" from your browser window. Be sure to select format as "text file" for the file being saved.

First option may be cleaner. You can then open the file in MEGA or upload to Clustal Omega.

0
Entering edit mode

ExPaSy output generates multiple frames (3) and provided 5'3' and 3'5 for each. Do I need to select a particular frame and 5'3'/3'5' sequence to upload into Clustal Omega. All sequences should be in the same frame since they represent different alleles of the same gene. Perhaps I'm wrong? Also, I chose 'compact' output. Thanks

0
Entering edit mode

I hesitate to give you a blanket answer without being able to see the data you are using.

You should choose the frame that actually encodes the protein you are interested in. You can determine the frame you need by doing a blastp search with the translated proteins.

If you don't get this right (choose the correct protein) then you could end up doing a lot of work for nothing.

0
Entering edit mode

Hello yelekley7!

This is typically not recommended as it runs the risk of annoying people in both communities.

0
Entering edit mode

Sorry, I'm new to this. However, this forum is far more superior since no one bother replying to my question in the other forum.

1
Entering edit mode

The number of responses does not imply superiority or inferiority of one forum versus another. Your question is hard to answer, and it's not clear to me that any of the responses in this thread will resolve it. But once you have resolved it, it would be helpful to post the resolution in all forums in which you have posted the question.

0
Entering edit mode

Many participate in both forums and that is precisely the reason why cross-posting leads to duplication of effort. Since I answered your question here I did not do so on SeqAnswers.

0
Entering edit mode

And for those only active in one forum, it's also duplication of effort since it doesn't make sense that two persons spend time answering the question each on their favourite forum.

2
Entering edit mode
6.0 years ago
Joe 20k

Assuming your fasta files correspond exactly to the CDS of the protein sequences you want (I assume this because you've been aligning genes) there are loads of example scripts online which you can use to translate them all. By this I mean that the first bases of the fasta nucleotide sequence are the start codon.

As for highlighting domains etc. That's really a separate task/query as you can't really incorporate that data in to an alignment.

Some options for programs which will show you domain structure however include:

BlastP Pfam (I think) ESpript (http://espript.ibcp.fr/ESPript/cgi-bin/ESPript.cgi) will take alignments and predict secondary structure etc.

0
Entering edit mode

I'm trying out the ESPript tool. I've ran Clustal Omega and saved the alingnment output in a Note Pad with .aln extension. I've uploded the file into "Aligned Sequence" tab and it looks like it's running but there is no indication weather it's working or when it will finish. How long does it usually take? Thanks

0
Entering edit mode

I'm afraid I honestly have no idea. It's been some time since I used it last, and when I did I was only aligning a dozen or so genes.

0
Entering edit mode

I was able to get ESPript to output a result after I trancated the aligned protein seqences. However, it did not correctly predict the location of the six epitopes within the 6 aligned sequences. Perhaps, if I upload more sequences the prediction will be more accurate? I'm a little disappointed but will keep looking for a way to improve ESPript calls or perhaps I will find another tool. Thanks for the advice.

0
Entering edit mode

What are you calling 'epitopes'? ESpript can't to my knowledge detect those, just secondary structure etc. For annotating sequences by hand (mostly for lab purposes, but it also works for generating figures) is SnapGene. You can manually add your annotations etc and depict the epitopes however you wish. This doesn't really answer your question though as it isn't really for dealing with alignments etc. Perhaps if you add to your original question some example input, and a quick mock-up of the ideal kind of output you want, we might be more help.

0
Entering edit mode

Do you use SnapGene viewer or snagene_2.0 that charges fees? Is there free software that would allow to import aligned protein sequences, align them to a reference and mark them up? Thanks

0
Entering edit mode

You had got a recommendation to use MEGA above but you seem to not have done anything with it. It is a proper phylogenetic data analysis program and will most everything you have been asking about so far.

0
Entering edit mode

My lab has the full paid version, but the viewer is fine for producing figures and such, if it works for you in this instance.