Question: Got fewer DE genes after applying bioconductor genefilter for edgeR analysis?
1
gravatar for cafelumiere12
4.1 years ago by
United States
cafelumiere1270 wrote:

I'm using edgeR for DE analysis of RNASeq data. I've been using manual filtering suggested in the EdgeR manual till now.

I followed genefilter package and tried to chose a theta that has the optimal number of rejections. Basically this is what I did :  run edgeR once for the data set, then use the p-value results for genefilter calculations and plots.

dge <- DGEList(counts=counts[,!is.na(group)], group=group[!is.na(group)])
d <- dge
d <- calcNormFactors(d)
d <- estimateDisp(d, trend="none", robust=TRUE)
et <- exactTest(d, pair = levels(d$samples$group))
dgeTable = topTags(et, n=nrow(d))

res = data.frame(
        filterstat = rowSums(d$counts[rownames(dgeTable),]),
        pvalue = as.data.frame(dgeTable)$PValue,
        row.names = rownames(dgeTable))

rejBH = filtered_R(alpha=alpha, filter=res$filterstat, test=res$pvalue, theta=theta, method="BH")

theta_chosen = theta[max(which(rejBH==max(rejBH)))]
        use = res$filterstat > quantile(res$filterstat, probs=theta_chosen)
        dgeFilt=dge[use,]
        d <- dgeFilt

 

I used rowSums as filtering criteria, and the after I filter out the genes - in this case I got theta = 0.19, I ran the edgeR exact test again:

### rerun the model
d <- dgeFilt
d <- calcNormFactors(d)
d <- estimateDisp(d, trend="none", robust=TRUE)

et <- exactTest(d, pair = levels(d$samples$group))

My question is - even though the previous calculated max(rejBH) - the max number of rejections is 102 , using the same alpha ( 0.1) I'm only getting 60 DE genes after applying the filter. Actually, without applying any filter, the number of DE genes is 75 (this matches rejBH at 0%.) 

- Why is the new DE genes number not 102? 

- Why am I getting less DE genes even though from the rejection plots it looks like I'll have the max number of rejections at theta = 0.19?

 

Am I doing anything wrong here or did I misunderstand something?

Thanks in advance for any suggestion!

genefilter edger rna-seq R • 2.0k views
ADD COMMENTlink modified 4.1 years ago by Devon Ryan89k • written 4.1 years ago by cafelumiere1270
2
gravatar for Devon Ryan
4.1 years ago by
Devon Ryan89k
Freiburg, Germany
Devon Ryan89k wrote:

You only need to run exactTest() once and you don't need to subset d. Once you have your filter threshold, you just readjust the pvalues of the included rows in res.

BTW, the reason the numbers differ when you subset d is because you now have a different background distribution used for shrinkage.

ADD COMMENTlink written 4.1 years ago by Devon Ryan89k

Thank you!!

ADD REPLYlink written 4.1 years ago by cafelumiere1270

Hi Devon, this is probably kind of a stupid question but I was wondering  - wouldn't not rerunning the model kind of like "cheating" ? I guess my original thought was that we filter out the genes that do not pass the filter and just run the model as if they weren't there? My understanding was that when using manual filtering suggested in the edgeR manual, the model was fit after the genes were filtered?

Thank you!

ADD REPLYlink written 4.1 years ago by cafelumiere1270

That's not at all a stupid question! Firstly, have a read through the Bourgon et al. paper that layed out the theory behind all of this (they're using microarrays, but it's the same in RNAseq). The group that wrote that went on the make the genefilter Bioconductor package that DESeq2 (also written by some of those authors) uses to do this.

In short, just because we don't have enough signal from a gene to bother testing it for DE doesn't mean that it's not useful for (A) library normalization or (B) variance shrinkage. Yes, the method used in the edgeR vignette prefilters by count (actually, cpm, if I recall correctly) and then fits the remainder. If you search the bioconductor archive, you'll find at least one discussion between the edgeR and DESeq2/genefilter authors about this.
 

ADD REPLYlink written 4.1 years ago by Devon Ryan89k

Thanks a lot Devon!

ADD REPLYlink written 4.1 years ago by cafelumiere1270
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1584 users visited in the last hour