Question

scRNA seq differential gene expression analyses

0

Entering edit mode

2.3 years ago

bioinformatics ▴ 40

Hi,

I'm trying to perform differential gene expression analysis in R using single cell RNA sequencing data, to determine which genes are differentially expressed between clusters (cell type) of osteosarcoma tissue sample.

The public dataset can be found here: https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSM4952363. It has 3 data files: barcodes, features, and matrix.

I have combined them into into one matrix file (expression data) with barcode file as row names, and features file as colnames.

I have done this using these commands:

library(Matrix)
mat <- Matrix::readMM("~/Downloads/GSM4952363_OS_1_matrix.mtx.gz")
features <- read.delim("~/Downloads/GSM4952363_OS_1_features.tsv.gz",
                       header=FALSE)
barcodes <- read.delim("~/Downloads/GSM4952363_OS_1_barcodes.tsv.gz",
                       header=FALSE)

colnames(mat) <- barcodes[,1]
rownames(mat) <- features[,2]

I then tried following this workflow https://satijalab.org/seurat/articles/pbmc3k_tutorial.html to perform clustering in seurat/differential gene expression using mostly all the same commands as the workflow. However the UMAP plot I have made showing the different cell types doesn't match what was published in the authors paper. I have 6 cell types and the paper shows 9 different cell types. The commands I have used are below if it would be possible for anyone to see where I've gone wrong it would be so appreciated.

Commands for clustering in Seurat:

library(dplyr)
library(Seurat)
library(patchwork)
pbmc <- CreateSeuratObject(counts = mat, project = "pbmc3k", min.cells = 3, min.features = 200)
pbmc[["percent.mt"]] <- PercentageFeatureSet(pbmc, pattern = "^MT-")
VlnPlot(pbmc, features = c("nFeature_RNA", "nCount_RNA", "percent.mt"), ncol = 3

and the rest of the commands are the same as the workflow.

Would it be best to try a different workflow?

Thankyou

differential single cell expression • 1.8k views

ADD COMMENT • link updated 2.3 years ago by Pratik ★ 1.0k • written 2.3 years ago by bioinformatics ▴ 40

0

Entering edit mode

Hello guys

First of all, for this type of data you have to put all three uncompress files into a directory and put their path into the code below (an embedded seurat function)

#For triplet files from 10x protocol
Data <- Read10X(data.dir = "C:\\Users\\MiladAD\\Desktop\\Ready\\")

After that, create seurat object using command below

#Create seurat object
Seurat <- CreateSeuratObject(counts = Data, project = "ADBioinformatics", min.cells = 3, min.features = 200)

Rest of the pipeline is like what they said in their perfect vignettes.

It should be noted, the number of clusters can be different based on what resolution you tell to Seurat in the code below:

Seurat <- FindClusters(Seurat, resolution = 0.8)

hope it helps, Milad Eidi

ADD REPLY • link 2.3 years ago by milad eidi ▴ 20

0

Entering edit mode

Thankyou kindly for your help.

I received an error message when I used this command data <- Read10X(data.dir = "plot")

Error in Read10X(data.dir = "plot") : Barcode file missing. Expecting barcodes.tsv.gz In the folder plot I have the three files as: "GSM4952363_OS_1_features.tsv", "GSM4952363_OS_1_matrix.mtx", "barcodes.tsv".

Do you know where I've gone wrong?

ADD REPLY • link 2.3 years ago by bioinformatics ▴ 40

score 0 · Answer 1 · 2021-12-27

Hey,

The directory you're loading using Read10X() should contain the files you already have... but just have to put them into the format below.

enter image description here

To do this simply rename your GSM4952363_OS_1_features.tsv to features.tsv... then GSM4952363_OS_1_matrix.mtx to matrix.mtx.... barcodes.tsv is good. And finally gzip them individually... To do this you can open up terminal... go to your directory using cd (change directory) and then gzip the files like so:

gzip ./barcodes.tsv
gzip ./features.tsv
gzip ./matrix.mtx

you can then rerun Read10X() with your directory in R with Seurat loaded, and it should work.

This was all done on Ubuntu and probably works the same on a Mac, and probably a computational cluster as well...if you are on a different operating system... you might have to play around with some of the commands above.