6.5 years ago by
Hello. I assume that what you're trying to do is use the short variation database to get variants using genes as your filter, then get the alleles and phenotypes for each of the variants.
I'm afraid that that's not actually possible at the moment. The reason for this is that in Ensembl the phenotypes in mouse are only associated with genes, not with variants, due to the data that we have available. What you can do instead is a query using the genes database to get the phenotype(s) associated with each gene, and a query to get the variants and alleles associated with each gene, then merge the two together. This is not ideal, of course, as there is no way of knowing which variants actually cause the phenotype, but then we also don't know that so you're no better off than us.
One thing I might suggest, however, is that BioMart might not be the best way to do this query. Firstly, you would have to do two separate queries then merge them together, which may be complicated. Secondly, BioMart can get a bit funny with lots of data. A query with ~2600 genes, each with, say, 3000 variants is 780,000 variants. BioMart doesn't like that amount of data and what it is likely to do is decide partway through your query that it can't manage it and just stop, giving you only part of your results and not actually telling you that it's done it. Because of this, I would suggest that you attempt to use the Perl API instead. This will allow you to do a single query which will print the phenotype and a list of the variants for each gene, and will be able to handle a query of this size. Let me know if you want to use this and need any help.