21 months ago
Floydian_slip ▴ 150

Hi, I have a set of ~150 gene and I want to obtain a list of phenotypes (diseases or conditions) they are associated with. I was wondering what is the way to do this in OMIM in a batch manner. Eg.,

Gene Condition

BRCA1 Breast Cancer

21 months ago
Corentin ▴ 470

If you are willing to use R and biomaRt:

gene_mart <- biomaRt::useEnsembl(biomart = "ENSEMBL_MART_ENSEMBL",
host = "www.ensembl.org",
dataset = "hsapiens_gene_ensembl")

biomaRt::getBM(mart = gene_mart,
attributes = c("hgnc_symbol", "mim_morbid_accession", "mim_morbid_description"),
filters = "hgnc_symbol",
values = c("BRCA1"),
uniqueRows = TRUE)


This will give the following output:

hgnc_symbol mim_morbid_accession    mim_morbid_description
BRCA1               114480     BREAST CANCER;;BREAST CANCER, FAMILIALBREAST CANCER, FAMILIAL MALE, INCLUDED
BRCA1               604370     BREAST-OVARIAN CANCER, FAMILIAL, SUSCEPTIBILITY TO, 1; BROVCA1BREAST CANCER, FAMILIAL, SUSCEPTIBILITY TO, 1, INCLUDED;;OVARIAN CANCER, FAMILIAL, SUSCEPTIBILITY TO, 1, INCLUDED
BRCA1               614320    PANCREATIC CANCER, SUSCEPTIBILITY TO, 4; PNCA4
BRCA1               617883    FANCONI ANEMIA, COMPLEMENTATION GROUP S; FANCS


If you want more than one gene, you just have to write all your hgnc symbols in the values parameter, example with 2 genes:

biomaRt::getBM(mart = gene_mart,
attributes = c("hgnc_symbol", "mim_morbid_accession", "mim_morbid_description"),
filters = "hgnc_symbol",
values = c("BRCA1", "FOXP3"),
uniqueRows = TRUE)


This script will get the omim IDs and descriptions filtered by hgnc symbols. If you prefer to use ensembl gene IDs you can replace filters = "hgnc_symbol" by filters = "ensembl_gene_id". For more information you can refer to the biomaRt user guide at https://www.bioconductor.org/packages/devel/bioc/vignettes/biomaRt/inst/doc/biomaRt.html

Thanks Corentin! It worked like a charm.

