I have used the cummeRbund function findSimilar() to find the 10 most similar genes to the differentially expressed genes I identified using Cuffdiff. This used the Jensen-Shannon distance and produced a ranked ordered gene list which I now want to test for GO enrichment. The file looks like this:
"XLOC_007917" 0 "XLOC_008881" 0.00417099861122699 "XLOC_017692" 0.0178758082512721 "XLOC_008901" 0.0180682577435933 "XLOC_014267" 0.0333227735282459 "XLOC_013408" 0.0400392521794019 "XLOC_013497" 0.0412541820119971 "XLOC_010554" 0.0453928603025379 "XLOC_000570" 0.0461264880687295 "XLOC_010786" 0.0469577467848723
I first searched manually for GO terms for each of the most similar genes but I'd like to do a more robust analysis. I am trying to run GSEA, the Java application from Broad Institute.
I made my Ranked list file format (*.rnk) and now I have to choose a gene set database.
I am working on a sponge species so I can't use the database already provided.
How can I create my own gene sets database? What should it look like?
Any tips or advice will be appreciated!