I have used de novo methods (Cufflinks, isoSCM) to assemble new transcripts for some badly annotated genes.
I have now a large gene family (mice olfactory receptors) with fully annotated transcripts, which often have exons and introns in both CDS and UTRs. There were already quite a few studies of the CDSs, but the UTRs have never been looked into.
- I would like to align them and try to some serious phylogeny work. How do I take into account the fact that there are large natural occurring gaps due to introns? Should I only consider spliced transcripts and consider that introns are much less conserved?
- Is it relevant to focus on statistics such as the length of the UTRs and the number of introns? Should I consider the variability of the CDS vs that of the UTRs?
- If I resort to the MEME suite to look for patterns (such as RNA binding ones), should I focus only on the UTRs or also the CDS?
- Can I study SNPs? I have too few RNA Seq samples of my own, so I would have to look into databases. I can expect a rather high level of variation...