Question: convert fasta dna files to protein
0
gravatar for genya35
2.1 years ago by
genya3510
genya3510 wrote:

Hello,

I have 200 human fasta dna files from a region of chr6. Each sequence is 5,500 bp each. I've combined these fasta files and uploaded them into Clustal Omega to generate multiple sequence alignments and phylogenetic tree.

It worked well, however, I would like to convert these sequences into protein and highlights epitopes present in the sequences. What is the best tool to used for this purpose? What format do I need to select for the output?

Thank you so much for your advice.

alignment • 3.9k views
ADD COMMENTlink modified 2.0 years ago • written 2.1 years ago by genya3510
1

I am not sure how many sequences you have but you can achieve this with MEGA

ADD REPLYlink written 2.1 years ago by sridhar56100

MEGA is cool, thanks

ADD REPLYlink written 2.0 years ago by genya3510

Could you please explain what MEGA is and how to find it. Are there other alternatives? Thanks

ADD REPLYlink written 2.1 years ago by genya3510

Please use ADD REPLY/ADD COMMENT when responding to existing posts to keep threads logically organized.

MEGA is a phylogenetic data analysis package. You can find it here. I am assuming this is the MEGA that @sridhar56 was referring to.

Doing alignments at the nucleotide level can be very different than doing them at the protein level. I hope you were planning to translate the sequences and then redo the MSA? You could use one the EMBOSS tools for doing the translation, if you need a web based option.

ADD REPLYlink modified 2.1 years ago • written 2.1 years ago by genomax64k

@genomax2 Thanks for the reply. You are referring to EMBOSS Transeq, correct? Should I combine all fasta files into one and then translate them, or do I have to do this one-by-one?

Please suggest if should use 1 frame with HLA DPB1 gene or multiple? Should I select 'Standard Code'?

I really appreciate your advice,

ADD REPLYlink modified 2.0 years ago • written 2.1 years ago by genya3510

If you know the frame you are interested in (and if all the files are in the same frame) then this may be easier to do. Transeq appears to accept multiple sequences from the web interface so you should be able to use a single multi-fasta format file (keeping the frame consideration in mind). Standard code should be fine.

Are the 200 files for the same region/gene or are there multiple locations present?

ADD REPLYlink modified 2.1 years ago • written 2.1 years ago by genomax64k

ExPASy has a simple Web interface translation tool with support for multiple tables if that is what you need.

I'm guessing you want something command line though?

ADD REPLYlink written 2.1 years ago by jrj.healey11k

yes, these files are for the same region. Thanks

ADD REPLYlink written 2.1 years ago by genya3510

Then give transeq a try.

ADD REPLYlink modified 2.1 years ago • written 2.1 years ago by genomax64k

unfortunately, EMBL is down today.:( Hopefully it will come back soon.

ADD REPLYlink written 2.1 years ago by genya3510

Try the translate tool at ExPASy.

ADD REPLYlink written 2.1 years ago by genomax64k

So far, I've tried EMBOS Transeq and the run aborted before generating anyting for some reason. I've combined multiple fasta files (size was under 1 MB). It worked when I tried it with a really small fasta file. I'm not sure what the problem is? I ran cat *.fas > output.txt to combine multiple fasta files and uploaded this file into Transeq. The message said it was processing data but a few minutes later I received an email about a failure. I will try again tomorrow.

I've also tried ExPasy tool and it generated an output on the screen but it's not clear to me how to download the result. Also, my final goal is to import the result into Culstal Omega to do the alignment and generate a tree, so the format of the output has to be compatible. Should I stick with ExPasy?

As far I understand, the translation has to happen first followed by the alignment in Clustal Omega, correct?

ADD REPLYlink modified 2.1 years ago • written 2.1 years ago by genya3510
1

EBI Web sites appear to be undergoing maintenance at the moment. Use the ExPaSy result.

Option 1: Highlight and copy/paste the result data into a separate text file (I assume result is already in fasta format). Be sure to save the file in text format.

Option 2: Choose "file" --> "Save Page as" from your browser window. Be sure to select format as "text file" for the file being saved.

First option may be cleaner. You can then open the file in MEGA or upload to Clustal Omega.

ADD REPLYlink written 2.1 years ago by genomax64k

ExPaSy output generates multiple frames (3) and provided 5'3' and 3'5 for each. Do I need to select a particular frame and 5'3'/3'5' sequence to upload into Clustal Omega. All sequences should be in the same frame since they represent different alleles of the same gene. Perhaps I'm wrong? Also, I chose 'compact' output. Thanks

ADD REPLYlink written 2.1 years ago by genya3510

I hesitate to give you a blanket answer without being able to see the data you are using.

You should choose the frame that actually encodes the protein you are interested in. You can determine the frame you need by doing a blastp search with the translated proteins.

If you don't get this right (choose the correct protein) then you could end up doing a lot of work for nothing.

ADD REPLYlink written 2.1 years ago by genomax64k

Hello yelekley7!

It appears that your post has been cross-posted to another site: http://seqanswers.com/forums/showthread.php?p=204732

This is typically not recommended as it runs the risk of annoying people in both communities.

ADD REPLYlink written 2.1 years ago by WouterDeCoster37k

Sorry, I'm new to this. However, this forum is far more superior since no one bother replying to my question in the other forum.

ADD REPLYlink modified 2.1 years ago • written 2.1 years ago by genya3510
1

The number of responses does not imply superiority or inferiority of one forum versus another. Your question is hard to answer, and it's not clear to me that any of the responses in this thread will resolve it. But once you have resolved it, it would be helpful to post the resolution in all forums in which you have posted the question.

ADD REPLYlink modified 2.1 years ago • written 2.1 years ago by Brian Bushnell16k

Many participate in both forums and that is precisely the reason why cross-posting leads to duplication of effort. Since I answered your question here I did not do so on SeqAnswers.

ADD REPLYlink modified 2.1 years ago • written 2.1 years ago by genomax64k

And for those only active in one forum, it's also duplication of effort since it doesn't make sense that two persons spend time answering the question each on their favourite forum.

ADD REPLYlink written 2.1 years ago by WouterDeCoster37k
2
gravatar for jrj.healey
2.1 years ago by
jrj.healey11k
United Kingdom
jrj.healey11k wrote:

Assuming your fasta files correspond exactly to the CDS of the protein sequences you want (I assume this because you've been aligning genes) there are loads of example scripts online which you can use to translate them all. By this I mean that the first bases of the fasta nucleotide sequence are the start codon.

E.g. http://stackoverflow.com/questions/19521905/translation-dna-to-protein

As for highlighting domains etc. That's really a separate task/query as you can't really incorporate that data in to an alignment.

Some options for programs which will show you domain structure however include:

BlastP Pfam (I think) ESpript (http://espript.ibcp.fr/ESPript/cgi-bin/ESPript.cgi) will take alignments and predict secondary structure etc.

ADD COMMENTlink written 2.1 years ago by jrj.healey11k

I'm trying out the ESPript tool. I've ran Clustal Omega and saved the alingnment output in a Note Pad with .aln extension. I've uploded the file into "Aligned Sequence" tab and it looks like it's running but there is no indication weather it's working or when it will finish. How long does it usually take? Thanks

ADD REPLYlink written 2.1 years ago by genya3510

I'm afraid I honestly have no idea. It's been some time since I used it last, and when I did I was only aligning a dozen or so genes.

ADD REPLYlink written 2.1 years ago by jrj.healey11k

I was able to get ESPript to output a result after I trancated the aligned protein seqences. However, it did not correctly predict the location of the six epitopes within the 6 aligned sequences. Perhaps, if I upload more sequences the prediction will be more accurate? I'm a little disappointed but will keep looking for a way to improve ESPript calls or perhaps I will find another tool. Thanks for the advice.

ADD REPLYlink modified 2.1 years ago • written 2.1 years ago by genya3510

What are you calling 'epitopes'? ESpript can't to my knowledge detect those, just secondary structure etc. For annotating sequences by hand (mostly for lab purposes, but it also works for generating figures) is SnapGene. You can manually add your annotations etc and depict the epitopes however you wish. This doesn't really answer your question though as it isn't really for dealing with alignments etc. Perhaps if you add to your original question some example input, and a quick mock-up of the ideal kind of output you want, we might be more help.

ADD REPLYlink written 2.1 years ago by jrj.healey11k

Do you use SnapGene viewer or snagene_2.0 that charges fees? Is there free software that would allow to import aligned protein sequences, align them to a reference and mark them up? Thanks

ADD REPLYlink modified 2.0 years ago • written 2.0 years ago by genya3510

You had got a recommendation to use MEGA above but you seem to not have done anything with it. It is a proper phylogenetic data analysis program and will most everything you have been asking about so far.

ADD REPLYlink modified 2.0 years ago • written 2.0 years ago by genomax64k

My lab has the full paid version, but the viewer is fine for producing figures and such, if it works for you in this instance.

ADD REPLYlink written 2.0 years ago by jrj.healey11k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 792 users visited in the last hour