Entering edit mode
2.4 years ago
Chenxi(Michael)
•
0
Hi, I am currently using Python doing RNA-Seq Analysis with Cancer Gene Profile, which has 20000 gene rows and 800 sample columns. The dataset has been normalized, but I don't know what percentage of the gene should be removed (or gene with how low the variance) should be removed. How to determine this cut-off threshold ? Thank you
This question is a bit hard to answer without knowing what you plan to do with the data. What questions will you be asking? When is a gene not interesting? Will any of your questions involve knowing that any particular gene does not vary across the data set? Are you looking to filter the data set from 20,000 genes to some much smaller number? Have you plotted the distribution of variances to see if there is a natural cut off for your purposes?
What is the way to select differentially expressed genes by variance in a gene profile for clustering purposes?
Perform a differential analysis, please read manuals of e.g. DESeq2 to get started. https://bioconductor.org/packages/release/bioc/html/DESeq2.html