Question: How to correctly build read frames for novel splice isoforms?
0
gravatar for jacobsen.jeremy
4.8 years ago by
United States
jacobsen.jeremy40 wrote:

I'm interested in identifying potential proteins that could map to novel splice isoforms.

I have run cufflinks and I have a list of high confidence isoforms which might be novel.  Now I want to determine if any of these could code for proteins.  I have written code that outputs a polypeptide sequence based on the exons that cufflinks identified as belonging to said transcript.  I'm pretty lost at this point because I don't have a clear understanding of how to construct my read frames.  I'm hoping to explain where I am so far so that someone can tell me where I've made incorrect assumptions.  Thanks.

Here's what the code does:

1-> It gets a list of potentially novel isoforms from Cuffcompare .tmap file

SLMO2-ATP5E    NR_037929    j    CUFF.72292    CUFF.72292.1    100    637.607724    628.443874    646.771574    23542.49981

2-> It gets all exons for CUFF.72292.1 from cuffcompare combined file:

chr20    Cufflinks    exon    57601521    57601524    .    -    .    gene_id "XLOC_046344"; transcript_id "TCONS_00063383"; exon_number "1"; gene_name "SLMO2-ATP5E"; oId "CUFF.72292.1"; nearest_ref "NR_037929"; class_code "j"; tss_id "TSS52060";
chr20    Cufflinks    exon    57603862    57603896    .    -    .    gene_id "XLOC_046344"; transcript_id "TCONS_00063383"; exon_number "2"; gene_name "SLMO2-ATP5E"; oId "CUFF.72292.1"; nearest_ref "NR_037929"; class_code "j"; tss_id "TSS52060";
chr20    Cufflinks    exon    57605358    57605484    .    -    .    gene_id "XLOC_046344"; transcript_id "TCONS_00063383"; exon_number "3"; gene_name "SLMO2-ATP5E"; oId "CUFF.72292.1"; nearest_ref "NR_037929"; class_code "j"; tss_id "TSS52060";
chr20    Cufflinks    exon    57607275    57607422    .    -    .    gene_id "XLOC_046344"; transcript_id "TCONS_00063383"; exon_number "4"; gene_name "SLMO2-ATP5E"; oId "CUFF.72292.1"; nearest_ref "NR_037929"; class_code "j"; tss_id "TSS52060";


Here's where I'm confused..

3->Based on the strand, it grabs each exon DNA sequence from the chromosome fasta file, combines them, and constructs three peptides (one for each frame):

(-)Frame_0:                                        
[CGAEKAKTPD*KDADLAGRLGCNGRRTAKPGCSRRKRCRTTG*PLSDLSRCRL*GSRHVFVTLYVTSVLSFVYDSSEDRRCIFNTFISSLLDGTDFELYDVKVP]                                        
(-)Frame_1:                                        
[AGRRRRRHQTRRTPTWRADSAVTAAEPLSRAARGESDVVPPDDLCPT*VDVGYEGLDTFSSLST*LLS*VSFTTLLKTVVAFLTLSFLPY*MGLISNFTM*RF]                                        
(-)Frame_2:                                        
[RGGEGEDTRLEGRRLGGPTRL*RPQNR*AGLLEAKAMSYHRMTSVRPESM*AMRV*TRFRHSLRDFCLKFRLRLF*RPSLHF*HFHFFLIRWD*FRTLRCKGS]    


Reasons for confusion:

1) I am unsure whether it was correct to build a read frame from the entire sequence (connecting exons head to tail), as opposed to each exon individually (before concatenation).

2) I am unsure whether a transcript can change read frames from exon to exon during splicing as this would very much complicate things.

3) I'm not certain about whether a read frame is always contained entirely within the AG-GU boundaries.  In other words
   is it possible for the G on either side to be included in the frame?

4) For protein inference, can there exist a methionine in addition to the start site or is this invialid?
   For instance:  MKPGCSRRKRCRTTG* (valid?), MKPGCSRMKRCRTTG* (invalid?)

Thanks!

-Jeremy
                                    

rna-seq assembly • 1.6k views
ADD COMMENTlink modified 4.8 years ago • written 4.8 years ago by jacobsen.jeremy40

Thanks Devon! 

ADD REPLYlink modified 4.8 years ago • written 4.8 years ago by jacobsen.jeremy40
1
gravatar for Devon Ryan
4.8 years ago by
Devon Ryan89k
Freiburg, Germany
Devon Ryan89k wrote:

There's no such thing as frame 0, you mean 1 there.

  1. Concatenate first, then translate (I'm guessing you don't have a biology background).
  2. See above
  3. The acceptor and donor sites are part of the intron, so they wouldn't normally be included (I'm sure someone has found an exception...biology is messy like that).
  4. A protein can, and typically will, have more than one methionine.

You might find a local biologist to help you out with things.

ADD COMMENTlink written 4.8 years ago by Devon Ryan89k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 826 users visited in the last hour