Question: Finding Variable Genes in Seurat, scRNA-seq
2
gravatar for asyndeton17
2.8 years ago by
asyndeton1740
asyndeton1740 wrote:

Hi,

I have a data matrix for scRNA-seq data (Drop-seq). How do I choose the parameters appropriately for the FindVariableGenes function in Seurat? Is there a plot I should be looking at beforehand to determine the correct parameters?

I can provide plots if needed.

Thanks

seurat rna-seq scrna-seq • 6.1k views
ADD COMMENTlink modified 16 months ago by CuriusScientist50 • written 2.8 years ago by asyndeton1740

While you are waiting for someone give you an answer for this have you checked the manuals/vignette for Seurat?

ADD REPLYlink modified 2.8 years ago • written 2.8 years ago by genomax83k

Yes, the help page for the function says to examine "the plot" first, but it doesn't refer to which plot.

ADD REPLYlink written 2.8 years ago by asyndeton1740

Did any of you come up with a good answer to this problem?

ADD REPLYlink written 2.1 years ago by PaulG0
0
gravatar for halo22
2.7 years ago by
halo22140
Indianapolis, IN
halo22140 wrote:

You can just run seurat with the parameters in the vignette. The dispersion vs avg-expression plot can help you decide the cutoff for x.low.cutoff, x.high.cutoff and y.cutoff. Once you figure out the parameters run FindVariableGenes with new parameters again.

ADD COMMENTlink written 2.7 years ago by halo22140
1

I have some questions about the calculation and cut off of the dispersion, as dispersion.function
Function to compute y-axis value (dispersion). Default is to take the standard deviation of all values,why it would have some negative values,and it seems that the genes are filtered actually by the object@hvg.info$gene.dispersion.scaled,sometimes in the plot ,it will show some white lines ,can u explain it what does that mean?

ADD REPLYlink written 2.5 years ago by ovela7710
1

What do you look for in the plot exactly? I found that changing x.low.cutoff between 0.0125 (in the PBMC 3k tutorial) and 0.1, for example, will have a huge effect on the number of variable genes, but you can barely tell the difference in the plot.

ADD REPLYlink written 2.5 years ago by igor10k

Hi igor. I am now facing the same problem. I don't know how to select correct parameters for x.low.cutoff, x.high.cutoff and y.cutoff. I also found that a little change in one of these parameters will lead to huge change in numbers of variable genes.

I can get the plot as tutorial shows. But I don't know how to use that plot to help me select these parameters.

Do you have any suggestions now?

ADD REPLYlink written 2.3 years ago by lishen07090

Hi @igor , @lishen0709 – did you find a solution to your question about selecting the correct parameters for x and y cutoffs in Seurat's FindVariableGenes?

Thanks!

ADD REPLYlink written 20 months ago by gaelgarcia05210

yes, a bad picture ...

ADD REPLYlink written 19 months ago by linouhao0

can I ask how to get cutoff from function FindVariableGenes

ADD REPLYlink written 19 months ago by linouhao0
0
gravatar for CuriusScientist
16 months ago by
Institute for Cancer Research, Dept. of Tumor Biology, Oslo University Hospital, Oslo, Norway
CuriusScientist50 wrote:

I found that changing x.low.cutoff between 0.0125 (in the PBMC 3k tutorial) and 0.1, for example, will have a huge effect on the number of variable genes

@igor, this is bound to happen as the number of genes is higher below 0 ( as can be seen by the dense plotting)

I was facing the same problem and on googling, I found this

Cutoff to find out number of variable genes? https://github.com/satijalab/seurat/issues/634

"All methods for HVG selection have some cutoff parameters, and unfortunately, criteria for 'optimality' are difficult to identify.

If you have UMI data, we suggest identifying HVG on the basis of variance-to-mean ratio, as we demonstrate here: https://satijalab.org/seurat/mca.html

Again, with UMI datasets - we typically do not notice large differences in the analysis depending on the exact number of genes selected- ranging from 2k genes to even the full transcriptome."

and if we go the link mentioned in the discussion, we will get this

Data Preprocessing

We perform standard log-normalization.

mca <- NormalizeData(object = mca, normalization.method = "LogNormalize", scale.factor = 10000)

FindVariableGenes calculates the variance and mean for each gene in the dataset in the dataset (storing this in object@hvg.info), and sorts genes by their variance/mean ratio (VMR). We have observed that for large-cell datasets with unique molecular identifiers, selecting highly variable genes (HVG) simply based on VMR is an efficient and robust strategy. Here, we select the top 1,000 HVG for downstream analysis.

mca <- FindVariableGenes(object = mca, mean.function = ExpMean, dispersion.function = LogVMR, do.plot = FALSE) hv.genes <- head(rownames(mca@hvg.info), 1000)

Also, as I found the plot to be pretty useless, I skipped plotting it by adding " do.plot = FALSE"

pbmc<-FindVariableGenes(object = pbmc, mean.function = ExpMean, dispersion.function = LogVMR, x.low.cutoff = 0.0125, x.high.cutoff = 3, y.cutoff = 0.5, do.plot = FALSE)

this will select variable genes defined by cutoff but will skip plotting

ADD COMMENTlink modified 16 months ago • written 16 months ago by CuriusScientist50
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1014 users visited in the last hour