Hello fellow bioinformaticians,
This may well be an easy and solved problem, but I didn't find a standard solution for this. I'm also extremely new to the field, so please excuse me :)
I have expression data for different transcripts from 386 proteins in 25 different tissues (from GTEx - yes, the one that was getting all the bad rep recently...). I'm trying to find out if there are any proteins that have transcripts that are differentially expressed across tissues. I know that the transcripts themselves will be expressed at very different levels, but I want to find out what transcripts have a different expressions pattern.
What I'm doing right now is:
For each protein:
- Get the RPKM values for each transcript in each tissue
- Sort the transcripts based on total RPKM across all tissues (so that the "reference" transcript is the one that's expressed the most)
- Perform linear model fitting in R rpkm ~ tissue * transcript
- At this point I wasn't sure what to do exactly to figure out the important ones. I tried just performing ANOVA, but that seems to return that ALL proteins are significant. I tried looking at the summary of the model for each protein and just pick out the coefficients that corresponded to a low p value for a tissue-transcript, but that seemed to not give correct results either.
So in short, I'm just wondering if there's a standard tool or pipeline for determining if different transcripts of the same gene have different expression patterns across tissues