Hi, everybody!
I'm trying to map start codons using ribosome profiling data.
For anybody unfamiliar with ribosome profiling: in basic terms, it's a sequencing technology which produces collections of reads from transcripts protected by ribosomes.
The basic strategy for the mapping process comprises accepting the furthest upstream viable codon where the proportion of reads is greater than a predefined threshold relative to all reads within the given gene's coordinate boundaries. This strategy works reasonably well for certain instances, but not so well in other instances. Specifically, in cases where two genes' coordinates overlap; this creates a situation where, it seems to me, it is impossible to determine which gene is responsible for producing the reads aligned to that area of the genome causing the strategy to fail.
Does anybody have any suggestions of how this issue could be overcome? Or perhaps how a more effective strategy could be implemented? I imagine a similar obstacle is encountered when calculating RPKM values for overlapping genes in RNA-seq experiments.
Thanks!
I would think that some sort of expectation maximization method would work. That's normally done for the actual counts, but I imagine that that could be applied to the alignments as well.