Hi everyone,
I am trying to extract all Y chromosome genes from the mus musculus C57/B6 genome dataset ("mc57bl6nj_gene_ensembl") using biomaRt in R. The purpose is to create a list so I can extract Y chromosome gene counts later in my single-cell analysis pipeline.
My code worked successfully for extracting X chromosome genes (below). However, I noticed that the only values for the filter "chromosome_name" are chromosomes 1-19 and chromosome X. When I tried a value "Y" instead, I got 0 entries.
library("biomaRt")
mc57bl6nj = useMart("ENSEMBL_MART_MOUSE",
dataset="mc57bl6nj_gene_ensembl")
mc57bl6nj_x_genes <- getBM(attributes="external_gene_name", filters = "chromosome_name", values = "X", mart = mc57bl6nj)
head(mc57bl6nj_x_genes)
dim(mc57bl6nj_x_genes)
Do you know why the "Y" chromosome is missing? Is there another way I can extract all Y chromosome genes?
PS, first post; apologies for any formatting issues. Also new to genome science in general.
Thanks!
From, Eva
Oh my goodness, how WEIRD. Thanks for your answer, Emily!
Out of curiosity, can you hazard a guess as to why? There should be 172 coding genes (according to here: https://uswest.ensembl.org/Mus_musculus/Location/Chromosome?db=core;g=ENSMUSG00000094658;r=Y:2830680-2841854).
Do you have advice on an alternative method? I just need Y chromosome gene names, so I can do it by hand if I have to.
It's not unusual for highly repetitive and otherwise difficult to sequence loci, such as sex or organelle chromosomes to be skipped by smaller sequencing projects, such as the mouse strain project that produced this genome.
The link you give here is not for that genome, however, it's for the mouse reference, which is produced by the GRC and has a high quality genome. You can get those genes using BioMart, just use the mouse genes mart.
OH that makes sense--thanks for clarifying :) I will try that and hopefully there aren't so many differences between GRC and B6 when it comes to geneIDs on the Y chromosome!
From Eva