Question: Unexpected genes in mouse DGE (Digital Gene Expression) (gene-cell expression matrix)
0
gravatar for chilifan
7 months ago by
chilifan70
chilifan70 wrote:

Hi! I recently aligned our samples (using STAR and the drop-seq pipeline) to the mouse genome downloaded from the pre made meta data links fund on page 4 in the drop-seq cookbook. Link: https://github.com/broadinstitute/Drop-seq/blob/master/doc/Drop-seq_Alignment_Cookbook.pdf

When processing the digital gene expression matrix with SingleR, the run was stopped because of duplicate gene names. I checked the gene names, and the reason for the duplicates were Rpl24, that existed both in capital and lower case letters. Like this: RPL24 <- human gene name Rpl24 <- mouse gene name

Checking the DGE for more of these tricky duplicates, I couldn't find any else. However, I found a couple of other strange occurrences of genes. I don't think they seem to fit in the mouse genome. For example: RP23-103I12.13 HOTAIRM1_5 KCDT12 TMEM185B

I double checked to make sure I aligned to the mouse genome and not the mixed genome. If these were contaminants they should have been sorted out at the aligning step, right? Looking at the ensembl numbers from the genome index in geneinfo.tab, the ensemble numbers for these genes does not exist. Actually, these genes usually doesn't have an ensembl number at all.

Anyone else that noticed this problem? Any idea how it can happen? Can I safely remove these strange genes from my DGE and move on?

ADD COMMENTlink written 7 months ago by chilifan70

Why do you have human gene symbols in there if you've aligned only to mouse? Is it possible that you've used the wrong GTF file?

ADD REPLYlink written 7 months ago by Jean-Karim Heriche21k

That's what is so strange about it! I don't think I aligned to the mixed GTF file, my log says its the mouse, and in the geneinfo i can only see ENSMUS numbers. I'm very confused.

ADD REPLYlink written 7 months ago by chilifan70

If you're using files you downloaded, check that they correspond to what they should be. I would also check all files for occurrences of human gene symbols to try and find out at which step they appear in the pipeline.

ADD REPLYlink written 7 months ago by Jean-Karim Heriche21k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1269 users visited in the last hour