Question

Unexpected genes in mouse DGE (Digital Gene Expression) (gene-cell expression matrix)

0

Entering edit mode

5.0 years ago

chilifan ▴ 120

Hi! I recently aligned our samples (using STAR and the drop-seq pipeline) to the mouse genome downloaded from the pre made meta data links fund on page 4 in the drop-seq cookbook. Link: https://github.com/broadinstitute/Drop-seq/blob/master/doc/Drop-seq_Alignment_Cookbook.pdf

When processing the digital gene expression matrix with SingleR, the run was stopped because of duplicate gene names. I checked the gene names, and the reason for the duplicates were Rpl24, that existed both in capital and lower case letters. Like this: RPL24 <- human gene name Rpl24 <- mouse gene name

Checking the DGE for more of these tricky duplicates, I couldn't find any else. However, I found a couple of other strange occurrences of genes. I don't think they seem to fit in the mouse genome. For example: RP23-103I12.13 HOTAIRM1_5 KCDT12 TMEM185B

I double checked to make sure I aligned to the mouse genome and not the mixed genome. If these were contaminants they should have been sorted out at the aligning step, right? Looking at the ensembl numbers from the genome index in geneinfo.tab, the ensemble numbers for these genes does not exist. Actually, these genes usually doesn't have an ensembl number at all.

Anyone else that noticed this problem? Any idea how it can happen? Can I safely remove these strange genes from my DGE and move on?

Digital Gene Expression genome index drop-seq • 1.1k views

ADD COMMENT • link 5.0 years ago by chilifan ▴ 120

0

Entering edit mode

Why do you have human gene symbols in there if you've aligned only to mouse? Is it possible that you've used the wrong GTF file?

ADD REPLY • link 5.0 years ago by Jean-Karim Heriche 27k

0

Entering edit mode

That's what is so strange about it! I don't think I aligned to the mixed GTF file, my log says its the mouse, and in the geneinfo i can only see ENSMUS numbers. I'm very confused.

ADD REPLY • link 5.0 years ago by chilifan ▴ 120

0

Entering edit mode

If you're using files you downloaded, check that they correspond to what they should be. I would also check all files for occurrences of human gene symbols to try and find out at which step they appear in the pipeline.

ADD REPLY • link 5.0 years ago by Jean-Karim Heriche 27k