I would like to be able to pick cancer cell lines that have "high", "medium" and "low" expressions of certain genes based on what the high, medium, and low expressions of those genes are in cancer patient populations. Would need to know the distribution of expressions in patients before determining what my high, medium, and low cutoffs will be.
I've gathered RNAseq data on cell lines from DepMap.org and I also have RNAseq data on The Cancer Genome Atlas (TCGA) patient populations from cBioPortal, the problem is DepMap.org's RNAseq units are
log2(tpm+1) and TCGA RNAseq units are RSEM. Is there a way to be able to compare those numbers (convert one unit into the other or another source with same units??)
Description of one of my TCGA files (for adenoid cystic carcinoma - ACC) called
cancer_study_identifier: acc_tcga genetic_alteration_type: MRNA_EXPRESSION datatype: CONTINUOUS data_filename: data_RNA_Seq_v2_expression_median.txt stable_id: rna_seq_v2_mrna show_profile_in_analysis_tab: false profile_description: Expression levels for 20532 genes in 79 acc cases (RNA Seq V2 RSEM) profile_name: mRNA expression (RNA Seq V2 RSEM)