Question: Est Sequence Database: Which Frame To Choose ?
gravatar for Woa
9.8 years ago by
United States
Woa2.8k wrote:

Hi all,

I have a EST DNA sequence database and I wish it to be searched for protein identification using tandem mass spectrometry data. However I have reasons to believe that the protein is not coded by the reverse strand. So for a six frame translation I can safely ignore the 3 reading frames from the reverse strand.

My question is, then how to use the rest of the three reading frames from the positive strand as a database? Common softwares generally translate all six frames, but in this case I need only three. Shall I use the translated longest open reading frame(ORF) from 3 reading frames? Or shall I keep ALL ORFs generated from the 3 frames? Or shall I keep the translated Reading frame that contains the longest ORF. For the last case there will be a lot of translated STOP CODONS (marked as *) inside the sequence however.

proteomics orf est • 2.9k views
ADD COMMENTlink modified 9.8 years ago by Larry_Parnell16k • written 9.8 years ago by Woa2.8k
gravatar for Michael Dondrup
9.8 years ago by
Bergen, Norway
Michael Dondrup48k wrote:

Afaik: If using a classical EST approach you can't ignore the reverse strands because of the way the cDNA clone libraries for EST sequencing are constructed. The single stranded cDNA is turned into double stranded DNA and inserted into the library vectors. The double stranded clones are then sequenced but therefore you will not know what the original strand was. If you used a strand-specific next generation sequencing protocol, your assumption might hold though.

However, if your library is not too big I would still run the full six frame translation and filter afterwards. That will allow you to verify your +strand assumption, I wouldn't be surprised if you would get a lot of unexpected reverse hits even though your theoretical approach told you otherwise.

ADD COMMENTlink written 9.8 years ago by Michael Dondrup48k
gravatar for Larry_Parnell
9.8 years ago by
Boston, MA USA
Larry_Parnell16k wrote:

Exactly as Michael writes - EST libraries are not cloned into the sequencing vector in an orientation-specific manner. Some clones are, but this is not a reliable feature, especially with more modern methods that allow cloning of middle to 5' regions of the gene/transcript. The best diagnostic that you will have for the orientation issue, and then reducing your search space to 3 reading frames on one strand, will be the polyA (vs polyT) run at the extreme 3'-end of the clone. However, those clones, when short, may not encode any protein-coding sequence...

In addition, weird s#!t can happen during cloning. For example, I have seen many hybrid ESTs where two different parts of two different genes are joined in vitro to produce a single clone. Also, sloppy library prep will give genomic sequence in the EST reads.

Just compare to all 6 frames and examine and filter your results afterward. You should also compare your peptides to RefSeq sequences for the source of the peptides or as close evolutionarily as you can get. This is in order to have a full-length sequence you can use to judge the quality of your matches to the ESTs.

ADD COMMENTlink modified 9.8 years ago • written 9.8 years ago by Larry_Parnell16k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1568 users visited in the last hour