I have a sample dataset derived from single cell RNA Sequencing with 1800 samples. Some genes have only few counts. Using WGCNA I can compute modules and even define the module membership for each gene for each module. I want to find the number of counts for which a gene would be safely clustered into a module. Would it be valid to: Define genes with counts e.g. lower 5, by computing the max(counts) for each gene in the original dataset and select gene names. Create subsets with 80% the original dataset counts and compute the module membership in each subset. Compare module labels between subsets for groups of genes (counts lower 5, counts >5 and <10, and so on..) and select the group for which the module label doesn't change? What would be a more statistically valid way to compute module membership preservation for genes?
The number of counts don't decide if a gene would clustered into one module or another. To set a gene in a module or in another what it is important (for WGCNA) is if the correlation between the genes is high.
To calculate the cluster preservation there is a function (modulePreservation), as well as several tutorials.