Question: Finding Variable Genes in Seurat, scRNA-seq
2
gravatar for asyndeton17
22 months ago by
asyndeton1720
asyndeton1720 wrote:

Hi,

I have a data matrix for scRNA-seq data (Drop-seq). How do I choose the parameters appropriately for the FindVariableGenes function in Seurat? Is there a plot I should be looking at beforehand to determine the correct parameters?

I can provide plots if needed.

Thanks

seurat rna-seq scrna-seq • 3.9k views
ADD COMMENTlink modified 5 months ago by Nitin Sharma30 • written 22 months ago by asyndeton1720

While you are waiting for someone give you an answer for this have you checked the manuals/vignette for Seurat?

ADD REPLYlink modified 22 months ago • written 22 months ago by genomax68k

Yes, the help page for the function says to examine "the plot" first, but it doesn't refer to which plot.

ADD REPLYlink written 22 months ago by asyndeton1720

Did any of you come up with a good answer to this problem?

ADD REPLYlink written 13 months ago by PaulG0
0
gravatar for halo22
21 months ago by
halo22120
Indianapolis, IN
halo22120 wrote:

You can just run seurat with the parameters in the vignette. The dispersion vs avg-expression plot can help you decide the cutoff for x.low.cutoff, x.high.cutoff and y.cutoff. Once you figure out the parameters run FindVariableGenes with new parameters again.

ADD COMMENTlink written 21 months ago by halo22120
1

I have some questions about the calculation and cut off of the dispersion, as dispersion.function
Function to compute y-axis value (dispersion). Default is to take the standard deviation of all values,why it would have some negative values,and it seems that the genes are filtered actually by the object@hvg.info$gene.dispersion.scaled,sometimes in the plot ,it will show some white lines ,can u explain it what does that mean?

ADD REPLYlink written 19 months ago by ovela7710
1

What do you look for in the plot exactly? I found that changing x.low.cutoff between 0.0125 (in the PBMC 3k tutorial) and 0.1, for example, will have a huge effect on the number of variable genes, but you can barely tell the difference in the plot.

ADD REPLYlink written 19 months ago by igor7.7k

Hi igor. I am now facing the same problem. I don't know how to select correct parameters for x.low.cutoff, x.high.cutoff and y.cutoff. I also found that a little change in one of these parameters will lead to huge change in numbers of variable genes.

I can get the plot as tutorial shows. But I don't know how to use that plot to help me select these parameters.

Do you have any suggestions now?

ADD REPLYlink written 16 months ago by lishen07090

Hi @igor , @lishen0709 – did you find a solution to your question about selecting the correct parameters for x and y cutoffs in Seurat's FindVariableGenes?

Thanks!

ADD REPLYlink written 8 months ago by gaelgarcia05190

yes, a bad picture ...

ADD REPLYlink written 7 months ago by linouhao0

can I ask how to get cutoff from function FindVariableGenes

ADD REPLYlink written 7 months ago by linouhao0
0
gravatar for Nitin Sharma
5 months ago by
Nitin Sharma30
United Kingdom
Nitin Sharma30 wrote:

I found that changing x.low.cutoff between 0.0125 (in the PBMC 3k tutorial) and 0.1, for example, will have a huge effect on the number of variable genes

@igor, this is bound to happen as the number of genes is higher below 0 ( as can be seen by the dense plotting)

I was facing the same problem and on googling, I found this

Cutoff to find out number of variable genes? https://github.com/satijalab/seurat/issues/634

"All methods for HVG selection have some cutoff parameters, and unfortunately, criteria for 'optimality' are difficult to identify.

If you have UMI data, we suggest identifying HVG on the basis of variance-to-mean ratio, as we demonstrate here: https://satijalab.org/seurat/mca.html

Again, with UMI datasets - we typically do not notice large differences in the analysis depending on the exact number of genes selected- ranging from 2k genes to even the full transcriptome."

and if we go the link mentioned in the discussion, we will get this

Data Preprocessing

We perform standard log-normalization.

mca <- NormalizeData(object = mca, normalization.method = "LogNormalize", scale.factor = 10000)

FindVariableGenes calculates the variance and mean for each gene in the dataset in the dataset (storing this in object@hvg.info), and sorts genes by their variance/mean ratio (VMR). We have observed that for large-cell datasets with unique molecular identifiers, selecting highly variable genes (HVG) simply based on VMR is an efficient and robust strategy. Here, we select the top 1,000 HVG for downstream analysis.

mca <- FindVariableGenes(object = mca, mean.function = ExpMean, dispersion.function = LogVMR, do.plot = FALSE) hv.genes <- head(rownames(mca@hvg.info), 1000)

Also, as I found the plot to be pretty useless, I skipped plotting it by adding " do.plot = FALSE"

pbmc<-FindVariableGenes(object = pbmc, mean.function = ExpMean, dispersion.function = LogVMR, x.low.cutoff = 0.0125, x.high.cutoff = 3, y.cutoff = 0.5, do.plot = FALSE)

this will select variable genes defined by cutoff but will skip plotting

ADD COMMENTlink modified 5 months ago • written 5 months ago by Nitin Sharma30
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1254 users visited in the last hour