This is a newbie question :)
I'm building a linear model to identify significant predictors of mutation count/types in tumours from TCGA. I want to include expression levels of a couple of genes, but I am quite new to RNA-Seq analyses and best practices. TCGA provides RNA-Seq data at the gene level in three formats: HTSeq-counts, FPKM and FPKM-UQ. I have been reading (tutorials and the questions here) and asking around and I have reached the conclusion that I can use FPKM-UQ values to compare across samples without any further pre-processing - Is this true? Or would you recommend doing pre-processing to these values before comparing?
Thanks so much, Daniela