How to determine the cut off threshold when removing the RNA-Seq with low variance?

0

Entering edit mode

2.4 years ago

Chenxi(Michael) • 0

Hi, I am currently using Python doing RNA-Seq Analysis with Cancer Gene Profile, which has 20000 gene rows and 800 sample columns. The dataset has been normalized, but I don't know what percentage of the gene should be removed (or gene with how low the variance) should be removed. How to determine this cut-off threshold ? Thank you

analysis RNA sequence gene • 1.2k views

ADD COMMENT • link updated 2.4 years ago by ATpoint 82k • written 2.4 years ago by Chenxi(Michael) • 0

0

Entering edit mode

This question is a bit hard to answer without knowing what you plan to do with the data. What questions will you be asking? When is a gene not interesting? Will any of your questions involve knowing that any particular gene does not vary across the data set? Are you looking to filter the data set from 20,000 genes to some much smaller number? Have you plotted the distribution of variances to see if there is a natural cut off for your purposes?

ADD REPLY • link 2.4 years ago by seidel 11k

0

Entering edit mode

What is the way to select differentially expressed genes by variance in a gene profile for clustering purposes?

ADD REPLY • link 2.4 years ago by Chenxi(Michael) • 0

0

Entering edit mode

Perform a differential analysis, please read manuals of e.g. DESeq2 to get started. https://bioconductor.org/packages/release/bioc/html/DESeq2.html

ADD REPLY • link 2.4 years ago by ATpoint 82k

Login before adding your answer.