Removing genes of Seurat object during integration pipeline
0
0
Entering edit mode
9 weeks ago
Stevens • 0

Hi, I am performing clustering analysis in Seurat on 6 samples of single cell RNA-seq data (3WT, 3KO) to identify the cell types, perform DE analysis and then do some further downstream analysis.

To do this I am integrating the 6 samples so that they can be compared easily, however the Seurat pipeline for integration doesn't remove genes that are only expressed in a few cells, and I am not sure whether or not you should perform gene removal (it is performed in the non-integration Seurat pipeline).

So, does anyone know whether you should perform some kind of gene filtering when you perform integration in Seurat and if so how should you do it?

genes filtering remove seurat integration • 282 views
0
Entering edit mode

What is a "few" cells? So how many? You would need to manually calculate the fraction of cells expressing a certain gene. Not sure in which format Seurat stores data but if it is a sparse matrix then it would probably come down to Matrix::rowSums on the raw counts, divided by the total number of cells to get the fraction. And then some kind of cutoff that you feel comfortable with, e.g. at least 5% of cells must express the gene to consider it. That obviously assumes that you are ok with potentially discarding cell types that are rare or captured at low number in your data, for which these genes could be very meaningful.

0
Entering edit mode

I was thinking of making it an arbitrary cut off of 10 cells, Seurat's documentation uses 3 for reading in a single sample. I believe that at least 10-20% of the genes have no expression in any cells, I'm not sure if that actually affects downstream processing in Seurat thought.

The real problem I have with it is that Seurat stores information on the data that's read into it in a few different variables and I'm not sure if altering the matrix in one place disrupts the rest of the data.

They state in https://github.com/satijalab/seurat/issues/147 that gene filtration could throw off normalization assumptions so without knowing where Seurat stores the correct matrix to manipulate I'm not sure how to approach it or whether the results from it will be valid.

1
Entering edit mode

Doesn't sctransform has a min_cells parameter that you could use here? It will only consider genes detected in at least min_cells cells. At least standalone sctransform::vst has it. Check the docs for Seurat towards that, this is probably the easiest way.

0
Entering edit mode

Yeah, you can filter min_cells in Seurat with SCtransform. That would probably change the gene numbers in the SCT assay of the object. I used the older method for normalization and scaling that doesn't use SCtransform as it was giving me errors downstream when I tried using it, so I may not have that option.

Thanks for bringing that to my attention though, I may retry the SCTransform method again.