Dear All,
I have an RNA-seq experiment which I like to summarize and display using e.g. heatmaps or metabolic pathways. The reads were aligned to the rice reference transcripts which have identifiers Os01t12345.1, Os01t12345.2, ... where the .1 and .2 indicates transcript isoforms. Some genes come with 2 or more isoforms, but I would like to ignore that information entirely for know and just put one set of values for each gene to heatmaps/tables etc.
What would be an appropriate way to collapse those isoforms into one? I can imagine averaging over isoforms, or just take the highest expressed isoform. How do other people process their data?
Cheers,
Stefan
The answer to this will depend on how your expression metrics were counted to begin with. Of most importance is how multimappers were dealt with. If they were dealt with in a good way then you can collapse transcript->gene metrics by simply adding things together. If multimappers were counted multiple times then there's likely no legitimate summarization method and you'd need to reprocess things (N.B., you could probably do that with Salmon, which should be quite fast).
Reads were mapped using bowtie1 and
--all --best --strata
, quantification I do not know since I did not do it myself. I will find that out.Thank you for your insights, still. It is a starting point.