Question: Analyzing digital gene expression data (DGE) from drop-seq pipeline with Seurat.
gravatar for chilifan
16 months ago by
chilifan70 wrote:

I am using Seurat to cluster data that previously has been filtered, aligned and turned into DGE by the Drop-Seq alignment pipline from Drop-seq tools. This has created a file sample_DGE.txt.gz. I then want to cluster my data and do a QC analysis through calculating the percent mithocondrial genes. I am following the Seurat Clustering tutorial found here: In this tutorial they use the files barcodes.tsv, genes.tsv and matrix.mtx generated by 10x genomics as raw data, and read it with the command Read10X(). I have generated these three files from our DGE data inspired by this biostars page: A: Storing a gene expression matrix in a matrix.mtx It works fine, except that the row name title "GENE" is stored as a column name, saved into barcodes.tsv, which later in Seurat is a problem because seurat uses "GENE" as one of the cell barcodes when calculating the percent mitochondrial DNA per cell. Example below:

enter image description here

This of course, makes it impossible to use VlnPlot, generating the error:

VlnPlot(object = pbmc, features.plot = c("nGene", "nUMI", "percent.mito"), nCol = 3) Error in if(all(data[,feature] == data,feature)) { : missing value where TRUE/FALSE needed

Simple removing "GENE" manually from the barcodes.tsv file creates a error in dimensions at the Read10X step. <- Read10X(data.dir = "dir/to/barcode_matrix_and_gene_files") Error in dimnamesGets(x, value) : invalid dimnames given for "dgTMatrix" object stop(gettextf("invalid dimnames given for %s object" dQuote(class(x))), domail + NA) dimnamesGets(x, value)

SO my question is: do anyone know a workaround to this problem? Or is there an equivalent to Read10X(), say ReadDGE() or ReadDropseq() that can be used directly on my DGE file?

seurat rna-seq R • 2.6k views
ADD COMMENTlink modified 16 months ago • written 16 months ago by chilifan70

Thank you @Igor that is the answer to a question I've been pondering for a long time. However, ?CreateSeuratObject uses this example:

pbmc_raw <- read.table(
  file = system.file('extdata', 'pbmc_raw.txt', package = 'Seurat'), = TRUE
pbmc_small <- CreateSeuratObject( = pbmc_raw)

which in my case would be:

# Load the PBMC dataset <- read.table(file=system.file("DGE.txt", package = 'Seurat'), =TRUE)

but it yields the error:

Error in read.table(file = system.file("DGE.txt", package = "Seurat"), : no lines available in input
ADD REPLYlink modified 16 months ago • written 16 months ago by chilifan70

If you are not sure what a function does, you can check by putting a ? in front of it. For example, ?system.file. That will tell you that system.file takes "character vectors, specifying subdirectory and file(s) within some package". In the example, they are using pbmc_raw.txt from the Seurat package. Your file is not stored in the Seurat package. You should specify the exact path where it is. Using system.file is not needed.

ADD REPLYlink modified 16 months ago • written 16 months ago by igor9.9k

Thank you @Igor, got it! :)

ADD REPLYlink written 16 months ago by chilifan70
gravatar for igor
16 months ago by
United States
igor9.9k wrote:

If you don't have 10x data, then you don't need to use Read10X(). This is a function to make reading 10x data easier since it's not stored as a simple CSV/TSV table. There is no need to try to recreate that format. In the same tutorial, you can skip to the next step, which is CreateSeuratObject() and then give it your data matrix. When you create the matrix, make sure it has the appropriate columns and rows.

ADD COMMENTlink written 16 months ago by igor9.9k
gravatar for chilifan
16 months ago by
chilifan70 wrote:

Because I like closure I will answer my own question. You need to read in the DGE data before Creating the seurat object. You also need to define column and row names manually and set the data type of data for both the row names (character) and the rest of the data columns (numeric)

#Read DGE file <- read.table(file = "DGE.txt", header = TRUE, row.names = 1, colClasses =c("character", rep("numeric", 10000)))

I can hardcode 10000 since I chose to include 10000 cells already in the DGE step.

# Initialize the Seurat object with the raw (non-normalized data).
# Keep all genes expressed in >= 3 cells (~0.1% of the data). Keep all cells with at least 200 detected genes
pbmc <- CreateSeuratObject( =, min.cells = 3, min.genes = 200, project = "10X_PBMC")

Thanks again @igor for giving me the clue to figure this out! :)

ADD COMMENTlink written 16 months ago by chilifan70
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1864 users visited in the last hour