How can I extract data from the expression matrix txt extension?
2
0
Entering edit mode
2.5 years ago

I want to analyze this data (GSE60361) with the seurat package and also extract the gene expression matrix (including cells in rows and genes in columns). Is it possible? What other tools are there to obtain the expression matrix of this data model (GSE60361)(GSE75688)?

single R expression cell • 1.6k views
ADD COMMENT
2
Entering edit mode
2.5 years ago
ATpoint 81k

There are files in the supplement you can use:

library(data.table)
counts <- data.table::fread("https://ftp.ncbi.nlm.nih.gov/geo/series/GSE60nnn/GSE60361/suppl/GSE60361_C1-3005-Expression.txt.gz")

The gene names are in the first column, but when you try to set them as rownames R will complain about non-unique values such as "Mar-1", which means that the authors use Excel (/facepalm) to manipulate or create this file. So you have to first fix the corrupted gene names, e.g. These gene symbol are from where? What source? and then move the first column to rownames.

There is another file, this time in Excel itself (/facepalm2), no clue what it is, maybe you can find out, seems like some kind of per-cell annotations:

library(readxl)
download.file("https://ftp.ncbi.nlm.nih.gov/geo/series/GSE60nnn/GSE60361/suppl/GSE60361_spikes_annotation_and_abundance.xlsx",
              "GSE60361_spikes_annotation_and_abundance.xlsx")
other <- readxl::read_xlsx("GSE60361_spikes_annotation_and_abundance.xlsx")

Seurat documentation will tell how to read a matrix into a Seurat object.

ADD COMMENT
0
Entering edit mode

Thank you so much. what can I do for this dataset? GSE65525

ADD REPLY

Login before adding your answer.

Traffic: 2799 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6