Fetch "Alleles And Phenotypes" Information From Mgi
1
0
Entering edit mode
10.3 years ago
pchiang5 ▴ 30

Hello,

I have a file with ~2600 genes with their ensembl gene id's, and would like to add a column of matched "alleles and phenotypes" information to the file. Although it is available on the website (for example, http://www.informatics.jax.org/marker/MGI:104659) and can be added manually, is there a way to fetch the information automatically with R? I looked into the ftp site of MGI and the MGI dataset in biomaRt. There seemed no descriptive information like "Homozygous null embryos do not survive and have mesodermal ...." there in the datasets. Thanks

r biomart • 1.9k views
ADD COMMENT
0
Entering edit mode
10.2 years ago
Emily 23k

Hello. I assume that what you're trying to do is use the short variation database to get variants using genes as your filter, then get the alleles and phenotypes for each of the variants.

I'm afraid that that's not actually possible at the moment. The reason for this is that in Ensembl the phenotypes in mouse are only associated with genes, not with variants, due to the data that we have available. What you can do instead is a query using the genes database to get the phenotype(s) associated with each gene, and a query to get the variants and alleles associated with each gene, then merge the two together. This is not ideal, of course, as there is no way of knowing which variants actually cause the phenotype, but then we also don't know that so you're no better off than us.

One thing I might suggest, however, is that BioMart might not be the best way to do this query. Firstly, you would have to do two separate queries then merge them together, which may be complicated. Secondly, BioMart can get a bit funny with lots of data. A query with ~2600 genes, each with, say, 3000 variants is 780,000 variants. BioMart doesn't like that amount of data and what it is likely to do is decide partway through your query that it can't manage it and just stop, giving you only part of your results and not actually telling you that it's done it. Because of this, I would suggest that you attempt to use the Perl API instead. This will allow you to do a single query which will print the phenotype and a list of the variants for each gene, and will be able to handle a query of this size. Let me know if you want to use this and need any help.

ADD COMMENT

Login before adding your answer.

Traffic: 2477 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6