I'm trying to perform differential gene expression analysis in R using single cell RNA sequencing data, to determine which genes are differentially expressed between clusters (cell type) of osteosarcoma tissue sample.
The public dataset can be found here: https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSM4952363. It has 3 data files: barcodes, features, and matrix.
I have combined them into into one matrix file (expression data) with barcode file as row names, and features file as colnames.
I have done this using these commands:
library(Matrix) mat <- Matrix::readMM("~/Downloads/GSM4952363_OS_1_matrix.mtx.gz") features <- read.delim("~/Downloads/GSM4952363_OS_1_features.tsv.gz", header=FALSE) barcodes <- read.delim("~/Downloads/GSM4952363_OS_1_barcodes.tsv.gz", header=FALSE) colnames(mat) <- barcodes[,1] rownames(mat) <- features[,2]
I then tried following this workflow https://satijalab.org/seurat/articles/pbmc3k_tutorial.html to perform clustering in seurat/differential gene expression using mostly all the same commands as the workflow. However the UMAP plot I have made showing the different cell types doesn't match what was published in the authors paper. I have 6 cell types and the paper shows 9 different cell types. The commands I have used are below if it would be possible for anyone to see where I've gone wrong it would be so appreciated.
Commands for clustering in Seurat:
library(dplyr) library(Seurat) library(patchwork) pbmc <- CreateSeuratObject(counts = mat, project = "pbmc3k", min.cells = 3, min.features = 200) pbmc[["percent.mt"]] <- PercentageFeatureSet(pbmc, pattern = "^MT-") VlnPlot(pbmc, features = c("nFeature_RNA", "nCount_RNA", "percent.mt"), ncol = 3
and the rest of the commands are the same as the workflow.
Would it be best to try a different workflow?