Extract Y chromosome genes from the mc57bl6nj_gene_ensembl dataset? No Y filter?
1
1
Entering edit mode
4.5 years ago
eknichols ▴ 10

Hi everyone,

I am trying to extract all Y chromosome genes from the mus musculus C57/B6 genome dataset ("mc57bl6nj_gene_ensembl") using biomaRt in R. The purpose is to create a list so I can extract Y chromosome gene counts later in my single-cell analysis pipeline.

My code worked successfully for extracting X chromosome genes (below). However, I noticed that the only values for the filter "chromosome_name" are chromosomes 1-19 and chromosome X. When I tried a value "Y" instead, I got 0 entries.

library("biomaRt")

mc57bl6nj = useMart("ENSEMBL_MART_MOUSE",
                dataset="mc57bl6nj_gene_ensembl")
mc57bl6nj_x_genes <- getBM(attributes="external_gene_name", filters = "chromosome_name", values = "X", mart = mc57bl6nj) 
head(mc57bl6nj_x_genes)
dim(mc57bl6nj_x_genes)

Do you know why the "Y" chromosome is missing? Is there another way I can extract all Y chromosome genes?

PS, first post; apologies for any formatting issues. Also new to genome science in general.

Thanks!

From, Eva

biomaRt R genome sequence gene • 1.1k views
ADD COMMENT
2
Entering edit mode
4.5 years ago
Emily 23k

There is no Y chromosome for that genome. Unfortunately, it's just not possible.

ADD COMMENT
0
Entering edit mode

Oh my goodness, how WEIRD. Thanks for your answer, Emily!

Out of curiosity, can you hazard a guess as to why? There should be 172 coding genes (according to here: https://uswest.ensembl.org/Mus_musculus/Location/Chromosome?db=core;g=ENSMUSG00000094658;r=Y:2830680-2841854).

Do you have advice on an alternative method? I just need Y chromosome gene names, so I can do it by hand if I have to.

ADD REPLY
0
Entering edit mode

It's not unusual for highly repetitive and otherwise difficult to sequence loci, such as sex or organelle chromosomes to be skipped by smaller sequencing projects, such as the mouse strain project that produced this genome.

The link you give here is not for that genome, however, it's for the mouse reference, which is produced by the GRC and has a high quality genome. You can get those genes using BioMart, just use the mouse genes mart.

ADD REPLY
0
Entering edit mode

OH that makes sense--thanks for clarifying :) I will try that and hopefully there aren't so many differences between GRC and B6 when it comes to geneIDs on the Y chromosome!

From Eva

ADD REPLY

Login before adding your answer.

Traffic: 1857 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6