Question: CDS sequence from mRNA and protein sequence
2
gravatar for Macspider
7 months ago by
Macspider2.7k
Vienna - BOKU
Macspider2.7k wrote:

Hello guys,

Here's for the (probably) dumb question of the day, but I can't seem to find fitting solutions in a reasonable amount of time, so I might be missing something here.

I have an mRNA multifasta and a protein multifasta (IDs are correspondent between the two). What I'm after, is to get the CDS sequences from the mRNA file, using the proteins as model for that. Or any other way to get the CDS sequences from those mRNAs. I do not have genome coordinates or any type of GFF file.

Any easy way that you might know?

mrna code protein cds fasta • 383 views
ADD COMMENTlink modified 7 months ago by lieven.sterck3.9k • written 7 months ago by Macspider2.7k
2

You can try blastx (with the protein fasta as db), I assume your mRNA sequences are without introns?

ADD REPLYlink modified 7 months ago • written 7 months ago by b.nota6.2k

Yes they are intronless!

ADD REPLYlink written 7 months ago by Macspider2.7k

BLASTX should do the trick but not sure if you would be able to extract the exact coding sequences using it, however, you can certainly use the coordinates.

ADD REPLYlink modified 7 months ago • written 7 months ago by Sej Modha4.0k
0
gravatar for lieven.sterck
7 months ago by
lieven.sterck3.9k
VIB, Ghent, Belgium
lieven.sterck3.9k wrote:

Simply run some software on it get the longest ORF (plenty of tools around for that) from the mRNA, should (in theory) end up to the proteins. Additionally just double-check them against your protein set (eg calculate MD5 key and compare both).

potential software to use:

and many other around I guess

ADD COMMENTlink modified 7 months ago • written 7 months ago by lieven.sterck3.9k
1

I am interested in which software would be the way to go!

ADD REPLYlink written 7 months ago by Macspider2.7k
1

OK, I'll add some to my 'answer'

My general advise would be to go for one that does nothing more than to look for the longest orf, so preferentially none that use coding potential or such but simply (and bluntly) gets the longest one from a potential start to a stop (by limiting to the forward strand even)

ADD REPLYlink modified 7 months ago • written 7 months ago by lieven.sterck3.9k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1711 users visited in the last hour