Question

Filtering somatic point mutations and CNV alterations on the gene level for multi-omics data integration

0

Entering edit mode

3.4 years ago

svlachavas ▴ 790

Dear Biostars community,

based on an unsupervised approach for multi-omics data integration for detecting molecular subtypes in a specific cancer type, I have different omics layers for the same patients (360): rna-seq expression data, CNV and somatic point mutations.

All of the different omics layers are on the gene level, with the number of features being around ~20k for both gene expression and mutations. As before fitting the model, I would like initially to perform feature reduction to reduce the number of features:

I was wondering except expression data, in which I could implement a non-specific intensity filtering and/or variance, how I could deal with the mutational data regarding the filtering process ? For example, the range of values in the CNV data are from -2 to 2 (GISTIC values), and for the somatic point mutations is 0 for silent mutations, and 1 elsewhere. Thus, one putative approach would be after gene expression filtering, to keep only the genes also in the mutational data that overlap ? As this could satisfy the approach of mutated genes that are expressed at least in a minimal number of samples?

On this premise, could an alternative filtering approach be implemented for the mutational data ? One major concern is that especially for the somatic point mutations, If I would filter based on the frequency of 0s (like no mutation events), I might loose genes that are mutated in a small number of samples but "within" a specific subtype...

Thank you in advance,

Efstathios

feature reduction somatic mutations multiomics • 701 views

ADD COMMENT • link 3.4 years ago by svlachavas ▴ 790