for the input of a machine learning model I am designing, I use cell-line gene expression data and cell-line mutation data from CCLE. However, I want to first reduce the input dimension (number of genes) because that I believe that not all the genes are related to cancer. how can I find a pre-trained model that reduces the number of genes in the cancer cell-lines model? (each of mutation data and expression data). I was implementing it by an autoencoder, but I thought there must be a study that precisely worked on finding cancer-related genes in mutation or expression data, and it's better to use that study model.