Question: Fetch "Alleles And Phenotypes" Information From Mgi
gravatar for pchiang5
6.6 years ago by
pchiang520 wrote:


I have a file with ~2600 genes with their ensembl gene id's, and would like to add a column of matched "alleles and phenotypes" information to the file. Although it is available on the website (for example, and can be added manually, is there a way to fetch the information automatically with R? I looked into the ftp site of MGI and the MGI dataset in biomaRt. There seemed no descriptive information like "Homozygous null embryos do not survive and have mesodermal ...." there in the datasets. Thanks

R biomart • 1.3k views
ADD COMMENTlink modified 6.5 years ago by Emily_Ensembl21k • written 6.6 years ago by pchiang520
gravatar for Emily_Ensembl
6.5 years ago by
Emily_Ensembl21k wrote:

Hello. I assume that what you're trying to do is use the short variation database to get variants using genes as your filter, then get the alleles and phenotypes for each of the variants.

I'm afraid that that's not actually possible at the moment. The reason for this is that in Ensembl the phenotypes in mouse are only associated with genes, not with variants, due to the data that we have available. What you can do instead is a query using the genes database to get the phenotype(s) associated with each gene, and a query to get the variants and alleles associated with each gene, then merge the two together. This is not ideal, of course, as there is no way of knowing which variants actually cause the phenotype, but then we also don't know that so you're no better off than us.

One thing I might suggest, however, is that BioMart might not be the best way to do this query. Firstly, you would have to do two separate queries then merge them together, which may be complicated. Secondly, BioMart can get a bit funny with lots of data. A query with ~2600 genes, each with, say, 3000 variants is 780,000 variants. BioMart doesn't like that amount of data and what it is likely to do is decide partway through your query that it can't manage it and just stop, giving you only part of your results and not actually telling you that it's done it. Because of this, I would suggest that you attempt to use the Perl API instead. This will allow you to do a single query which will print the phenotype and a list of the variants for each gene, and will be able to handle a query of this size. Let me know if you want to use this and need any help.

ADD COMMENTlink written 6.5 years ago by Emily_Ensembl21k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1148 users visited in the last hour