I've been using ToppFunn for gene ontologies and it worked great and fast, but it's a blackbox as to how it gets its results. I'm looking for an open-source, R solution and found biomaRt. I have a few qualms with it, largely it doesn't seem very intuitive as to how one finds the information they need. I have a list of genes I'd like to use for a query, and as outputs I would like the gene ontologies that contain these genes, the number of genes from the input list that are in each ontology, and the p-value. Below is how ToppGene looks and the information it gives, which is great. Having p-value is key, I can filter based on significance. Also being able to access which genes from the input are present in each ontology in the sparse matrix is what I want to re-create.
Currently I get a huge list where it each entry matches a gene to an ontology, so each gene has multiple entries with one for each ontology of which it is a member. Is there a way to collapse this output or query better? I would like to create a matrix with each gene as a column and each ontology as a row, values would be 0/1 whether gene is a member of each ontology or not too; so I can do counts and cluster comparisons.
It also can take a while, I'm sure it's possible, but is it easy to download a mart or ensembl to use locally?
library(biomaRt) mart <- useMart(biomart="ENSEMBL_MART_ENSEMBL", dataset="mmusculus_gene_ensembl") result <- getBM(attributes=c("illumina_mousewg_6_v2", "go_id", "name_1006"), filters="illumina_mousewg_6_v2", values=c("ILMN_2651144", "ILMN_1251419", "ILMN_1214841", "ILMN_1214071", "ILMN_2930552", "ILMN_1377919", "ILMN_2618176", "ILMN_2526739", "ILMN_1253182"), mart=mart)