Question: filtering out the genes in RNA-seq experiment
1
gravatar for ashkan
3.8 years ago by
ashkan110
ashkan110 wrote:

Hi Guys
I have a set of RNA-seq data and so far I have prepared my data and the number of raw read counts for each gene for each sample is calculated also I have a matrix in which the columns are samples and rows are genes. now I want to filter out some of the genes to reduce the false positive rate. would you please let me know how I can do the filtering?

actually I have tried "read count per million" and it is calculated for every gene in every sample but I don't know how to determine the best cut off value for that. (for example can I say if the number of read counts of a gene is 2 or less than 2 and it happens in at least 10 sample this gene must be removed?)

thanks,

Behzad
 

ADD COMMENTlink modified 3.8 years ago by alolex890 • written 3.8 years ago by ashkan110

Filtering is generally performed on the adjusted p-values and fold-changes. Have you used edgeR/DESeq2/etc. to calculate that yet?

ADD REPLYlink written 3.8 years ago by Devon Ryan89k

@Devon: I have not done DE analysis yet. before that I want to remove some genes that are not expressed. as you know even the genes which are not expressed, have few read count.

so I want to filter out these genes.

ADD REPLYlink modified 3.8 years ago • written 3.8 years ago by ashkan110
2

Just do independent filtering after the fact (if you use DESeq2, this is automatic).
 

ADD REPLYlink written 3.8 years ago by Devon Ryan89k
1
gravatar for alolex
3.8 years ago by
alolex890
United States
alolex890 wrote:

You can use the R function varFilter() that is part of the genefilter package to remove genes that are invariant across all samples.  This will remove all non-expressed genes from your list (usually cuts mine by half).  If you are using packages like DESeq2, I think it does this for you, so no need to run varFilter() before hand.  Also, DESeq2 will adjust the calculated fold change for genes that have low read counts since low read counts can inflate true fold changes, so you shouldn't have to worry about low counts when using DESeq2.  

ADD COMMENTlink written 3.8 years ago by alolex890
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1115 users visited in the last hour