Suppose I have an RNA-seq experiment in which I have 5000 deferentially expressed genes. In which around 120 are cancer causing genes or genes that have been found to be related with cancer (using information from Cosmic http://cancer.sanger.ac.uk/cancergenome/projects/cosmic/). Would it be a valid approach to use the 5000 genes as a background and do GSEA on 120 genes. If not why?
It could be valid.
It might be better that you check the correlation of 120 cancer causing genes with rest differentially expressed genes and than perform the GSEA. If you don't want to depend on the differentially expressed genes/ cancer causing genes, go for GSEA with "usage" package.