Entering edit mode
3.8 years ago
Firingam
▴
30
I have to normalize a single cell RNA seq dataframe (sc-RNA seq) with Bioconductor. To do this i decided to rely on SCnorm. Before its application I have to investigate over a few details, i.e. count-depth. In order to acquire this information, I want to apply plotCountDepth. This function provides a set of filter that i want to set. However I'm not sure about the biological significance of these filters. The main issue is with FilterExpression that will cut out a gene if its distribution median is below a certain threshold. So what is the correct biological approach to choose a threshold?
Completely unclear (at least to me) what the issue is. Please try to explain better.
If you have a single cell dataframe, how would you choose gene to cut off from your rows basing your decision on their medians?I explain better : you have a n X m matrix where n are genes and m are cells. I want to explore the count-depth feature in order to choose the.normalization process hereafter. The genes which will have their median (median is taken from the gene expression distribution where each value is the gene expression in a cell) below a certain threshold will be excluded from your analysis. I'm uncertain about how to choose this threshold. I live you the link of plotCountDepth so you can check on FilterExpression field.
So to clarify, you're trying to determine a threshold for removing genes that are not expressed? In general, this is done by removing genes expressed in very few cells (say < 10 or even < 3 if you have few cells and think you may have rare populations). It's rather arbitrary, but removes most of the genes that don't provide any useful info or have much of a biological impact.
I have not seen folks filter on actual expression levels, though I guess you could rank markers by median/average expression to help identify those with more "robust" changes.
I applied another strategy related to the batch and cell types. I found an article that explains the ratio and the tissue where the cells have been taken. I normalized and filtered depending on the tissue (presuming to) preserving the biological identity