Question: filtering out the genes in RNA-seq experiment
gravatar for ashkan
5.0 years ago by
ashkan110 wrote:

Hi Guys
I have a set of RNA-seq data and so far I have prepared my data and the number of raw read counts for each gene for each sample is calculated also I have a matrix in which the columns are samples and rows are genes. now I want to filter out some of the genes to reduce the false positive rate. would you please let me know how I can do the filtering?

actually I have tried "read count per million" and it is calculated for every gene in every sample but I don't know how to determine the best cut off value for that. (for example can I say if the number of read counts of a gene is 2 or less than 2 and it happens in at least 10 sample this gene must be removed?)



ADD COMMENTlink modified 5.0 years ago by alolex910 • written 5.0 years ago by ashkan110

Filtering is generally performed on the adjusted p-values and fold-changes. Have you used edgeR/DESeq2/etc. to calculate that yet?

ADD REPLYlink written 5.0 years ago by Devon Ryan95k

@Devon: I have not done DE analysis yet. before that I want to remove some genes that are not expressed. as you know even the genes which are not expressed, have few read count.

so I want to filter out these genes.

ADD REPLYlink modified 5.0 years ago • written 5.0 years ago by ashkan110

Just do independent filtering after the fact (if you use DESeq2, this is automatic).

ADD REPLYlink written 5.0 years ago by Devon Ryan95k
gravatar for alolex
5.0 years ago by
United States
alolex910 wrote:

You can use the R function varFilter() that is part of the genefilter package to remove genes that are invariant across all samples.  This will remove all non-expressed genes from your list (usually cuts mine by half).  If you are using packages like DESeq2, I think it does this for you, so no need to run varFilter() before hand.  Also, DESeq2 will adjust the calculated fold change for genes that have low read counts since low read counts can inflate true fold changes, so you shouldn't have to worry about low counts when using DESeq2.  

ADD COMMENTlink written 5.0 years ago by alolex910
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1242 users visited in the last hour