Question

Extract Y chromosome genes from the mc57bl6nj_gene_ensembl dataset? No Y filter?

1

Entering edit mode

4.5 years ago

eknichols ▴ 10

Hi everyone,

I am trying to extract all Y chromosome genes from the mus musculus C57/B6 genome dataset ("mc57bl6nj_gene_ensembl") using biomaRt in R. The purpose is to create a list so I can extract Y chromosome gene counts later in my single-cell analysis pipeline.

My code worked successfully for extracting X chromosome genes (below). However, I noticed that the only values for the filter "chromosome_name" are chromosomes 1-19 and chromosome X. When I tried a value "Y" instead, I got 0 entries.

library("biomaRt")

mc57bl6nj = useMart("ENSEMBL_MART_MOUSE",
                dataset="mc57bl6nj_gene_ensembl")
mc57bl6nj_x_genes <- getBM(attributes="external_gene_name", filters = "chromosome_name", values = "X", mart = mc57bl6nj) 
head(mc57bl6nj_x_genes)
dim(mc57bl6nj_x_genes)

Do you know why the "Y" chromosome is missing? Is there another way I can extract all Y chromosome genes?

PS, first post; apologies for any formatting issues. Also new to genome science in general.

Thanks!

From, Eva

biomaRt R genome sequence gene • 1.1k views

ADD COMMENT • link updated 4.5 years ago by zx8754 11k • written 4.5 years ago by eknichols ▴ 10

score 2 · Answer 1 · 2019-12-05

2

Entering edit mode

4.5 years ago

Emily 23k

There is no Y chromosome for that genome. Unfortunately, it's just not possible.

ADD COMMENT • link 4.5 years ago by Emily 23k

0

Entering edit mode

Oh my goodness, how WEIRD. Thanks for your answer, Emily!

Out of curiosity, can you hazard a guess as to why? There should be 172 coding genes (according to here: https://uswest.ensembl.org/Mus_musculus/Location/Chromosome?db=core;g=ENSMUSG00000094658;r=Y:2830680-2841854).

Do you have advice on an alternative method? I just need Y chromosome gene names, so I can do it by hand if I have to.

ADD REPLY • link 4.4 years ago by eknichols ▴ 10

0

Entering edit mode

It's not unusual for highly repetitive and otherwise difficult to sequence loci, such as sex or organelle chromosomes to be skipped by smaller sequencing projects, such as the mouse strain project that produced this genome.

The link you give here is not for that genome, however, it's for the mouse reference, which is produced by the GRC and has a high quality genome. You can get those genes using BioMart, just use the mouse genes mart.

ADD REPLY • link 4.4 years ago by Emily 23k

0

Entering edit mode

OH that makes sense--thanks for clarifying :) I will try that and hopefully there aren't so many differences between GRC and B6 when it comes to geneIDs on the Y chromosome!

From Eva

ADD REPLY • link 4.4 years ago by eknichols ▴ 10