Entering edit mode
27 days ago
Jeyong
•
0
This is an example of count matrix about 'SPATA13' gene. I want to creat seurat object, so I delete ENSEMBL ID and move 'Symbol' column to rownames. Because I want to use CreateSeruatObject funtion.
But, duplicated gene symbols are present(I think because of multiple transcripts of one gene by alternative splicing) and then I think I have to sum count about each columns.
I wonder if this approch is right or wrong. If this is wrong, how can i treat my count matrix?
I must use gene symbol (not ENSEMBL ID).
thanks
This question comes up periodically.
Here are a few past threads: How to deal with the case that one gene symbol matches multiple ensembl ids?
When I convert the Ensembl IDs to gene symbols, why lots of genes are duplicated?
Thanks. I checked the link you provided. But my doubt is not solved..
I think I must use gene symbol when I use CreatSeuratObject. Is my thought of this wrong? Can I use the ensembl id to create the Seurat object, and can I do the downstream analysis like pca, clustering, DEG and GSEA, ... as well?
I'm worried about how to interpret gene expression if ensembl id1 increases and ensembl id2 decreases in the cluster. (id1 and id2 have one gene symbol) If then, how can i interpret it?
Thanks a lot about your answer.