Question

What is a count matrix for input into Seurat supposed to look like?

0

Entering edit mode

3.7 years ago

Pratik ★ 1.0k

Hello,

I hope you are safe and well.

Could someone share what a count matrix for input into Seurat is supposed to look like?

I have count matrices however they each cells count matrix is in a separate file.

This is the data I want to analyze in Monocle 3: https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSM2978831 The files I'm looking at are in the Supplementary section.

I was guided to aggregate these files into one count matrix file and then bring it in Seurat to normalize it. Then, from Seurat, transform the normalised data and use it as input to Monocle.

This is what I have used to aggregate the data:

> setwd ("~/Desktop/GSE110154_RAW/csv/")
> files <- list.files(path="~/Desktop/GSE110154_RAW/csv/")
> genes <- read.table(files[1], header=FALSE, sep=",")[,1]
> df    <- do.call(cbind,lapply(files,function(fn)read.table(fn,header=FALSE, sep=",")[,2]))
> df    <- cbind(genes,df)
> head (df)

which results in:

     genes                                                                   
[1,] "1/2-SBSRNA4" "0" "0"  "0" "0"   "0" "0"   "3" "0"   "77" "0"   "0"  "0"
[2,] "A1BG"        "0" "0"  "0" "58"  "0" "0"   "0" "0"   "0"  "0"   "0"  "0"
[3,] "A1BG-AS1"    "0" "38" "0" "0"   "0" "0"   "0" "0"   "0"  "0"   "0"  "0"
[4,] "A1CF"        "0" "8"  "0" "123" "8" "418" "0" "144" "0"  "108" "21" "0"
[5,] "A2LD1"       "0" "0"  "0" "0"   "0" "0"   "0" "0"   "0"  "0"   "12" "0"
[6,] "A2M"         "0" "0"  "0" "0"   "0" "0"   "0" "0"   "0"  "0"   "0"  "0"

and then to write the files I did:

> write.table(df, "~/Desktop/GSE110154_RAW/df4.csv", row.names = F, col.names=F, sep = ",")

which results in:

    "1/2-SBSRNA4","0","0","0","0","0","0","3","0","77","0","0","0","0","3","0","0","1","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","7","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","5","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","22","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","2","0","0","0","0","0","38","0","0","0","3","0","0",...
    "A1BG","0","0","0","58","0","0","0","0","0","0","0","0","0","10","0","0","0","0","0","0","0","0","0","0","0","23","0","0","0","0","0","0","0","0","0","3","0","0","0","0","0","0","0","0","0","0","10","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","2","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","14","0","0","2","35","0","0","0","0","0","40","0","0","0","0","0","0","0","26","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","6","0","0","0","0","0","0","0","11","0","0","0","0","0","38","0","0","0","0","0","0","42","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","30","0","0","0","0","0","0","0","0","0","0","0","0","0","13","0","0","0","0","0","0","18","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0",...
etc...

I'm just not quite sure what a Seurat count matrix is supposed to look like?

I also need to find a good tutorial on how to input this data into Seurat afterwards, normalize and transform it, to input into Monocle3.

I would greatly appreciate anyones help!

Very Respectfully, Pratik

R RNA-Seq Monocle3 Seurat • 5.9k views

ADD COMMENT • link updated 23 months ago by Ram 43k • written 3.7 years ago by Pratik ★ 1.0k

1

Entering edit mode

Side note: Avoid setting working directories in R code. Create a directory for each task/project/whatever, and either create an R project there, or store the R script and read files using their full path. Allow for the fact that files can be moved - so create soft links to files in your working folder. That way, you'll at least have a record of where the file was (and hopefully the file is saved in a better location than on the Desktop or in the Downloads folder).

ADD REPLY • link 3.7 years ago by Ram 43k

0

Entering edit mode

Thank you RamRS!

Any clue on what a count matrix for input into Seurat is supposed to look like?

Or a tutorial on how to tutorial on how to input this data into Seurat, normalize and transform it, to input into Monocle3?

ADD REPLY • link 3.7 years ago by Pratik ★ 1.0k

1

Entering edit mode

I'm not a single cell RNAseq person, others should be able to help you with that. I had some suggestions on basic R programming/project organization practices, which I mentioned.

ADD REPLY • link 3.7 years ago by Ram 43k

0

Entering edit mode

Thank you RamRS, I really do appreciate your guidance! I eventually want to become proficient at R programming/project organization practices.

Very Respectfully, Pratik

ADD REPLY • link 3.7 years ago by Pratik ★ 1.0k

0

Entering edit mode

This is incredible wisdom I was thinking about today. Thank you for looking way ahead for me. Although I was kind-of rudely snappy/out-of-place like a turtle haha... This is valuable to me now as I try to be better organized. Thank you for looking out Ram : )

Also this StackOverflow question/answer helped supplement your guidance to explain it to me like I am 5 : )

ADD REPLY • link 23 months ago by Pratik ★ 1.0k

1

Entering edit mode

Glad it has been helpful, Pratik! I myself still stick to these rules to ensure easier context switching - loading an R project would bring all relevant scripts, files and plots into a sandbox that I can then play in.

ADD REPLY • link 23 months ago by Ram 43k

0

Entering edit mode

Hi I have a similar problem like you Mr. Pratik Mehta... did you find out how to handle it and How to give count matrix to seurat as input ?

ADD REPLY • link 2.4 years ago by poria.laghayee • 0

0

Entering edit mode

Hey yea, you just need to have your cells as columns and your genes as rows. The "meat" of the data frame should be your counts. This should be good to input into Seurat.

ADD REPLY • link 2.4 years ago by Pratik ★ 1.0k

0

Entering edit mode

2.4 years ago

fracarb8 ★ 1.6k

I can only see one issue with your aggregated csv file, and it is that you are missing the sample name. As the filename of the file should (hopefully does) be the sample name, you need to add it to the columns, so that you know which column belongs to which sample. I would personally use a for loop and make sure I process each sample individually, but in your case a names(df) <- c("gene",files) should work. You might want to check and clean the file names first (e.g. removing the .csv, ...). Apart from that, df4.csv looks fine. Move the genes column to the rownames and makes sure the rest are numbers and not characters, and seurat will read it just fine.

For tutorials, the seurat website covers everything you might need.

ADD COMMENT • link 2.4 years ago by fracarb8 ★ 1.6k

score 3 · Accepted Answer · 2020-08-24

The expected count matrix has genes as rows, samples as columns and the gene names are not part of the actual matrix but must be used as rownames. Nothing special here compared to "normal" RNA-seq. If you want to save memory you can try and convert the matrix to one of the commonly-used sparse matrix formats such as dgCMatrix from the Matrix package but this is optional. If you want to do it Matrix::Matrix(your.normal.matrix) should do the trick.