Remove glmLRT rows under a certain pValue
2
1
Entering edit mode
4.6 years ago

Hello everyone, I have another trivial question. I've searched around but haven't been able to find a solid answer.

Very trivial question: How do I remove rows under a certain pValue in a DGELRT object? I have ran glmTreat on a DGEGLM object that was generated by glmQLFit. I have found genes that are highly under or overexpressed, but I have also found junk reads that have pValues of 1 or close to 1. Here are the topTags from an LRT object that was output from calling glmTreat on a DGEGLM object:

  > treat <- glmTreat(DGEList8_GLMFit, contrast= FBvsEC_contrast, lfc=log2(1.5))
  > topTags(treat)
  GeneID   Chr     Start       End Strand Length     logFC unshrunk.logFC   logCPM       PValue
606500 606500 chr16  89561430  89561501      +     72 -19.74088      -19.74600 16.82385 5.418552e-24
26799   26799  chr6  85677294  85677368      -     75 -25.79423      -25.85765 13.96527 1.082539e-23
26806   26806  chr1 173865968 173866028      -     61 -20.87574      -20.88726 16.12826 1.085020e-23
26790   26790 chr18  49491664  49491729      -     66 -29.88506      -30.66800 12.39961 1.431579e-23
26809   26809 chr17  28723430  28723491      +     62 -20.77361      -20.78272 16.01472 2.803338e-23
9297     9297 chr11  62853904  62853968      -     65 -20.12804      -20.13388 15.97089 2.244666e-22
26788   26788 chr16   2155023   2155105      -     83 -24.98016      -25.05564 13.71254 2.657334e-22
692212 692212  chr1  28578743  28578822      -     80 -22.17292      -22.18588 14.83805 4.610559e-22
26769   26769  chr1 173864175 173864217      -     43 -23.65515      -23.70433 13.75147 4.003719e-21
26805   26805  chr1  75787889  75787972      +     84 -24.33920      -24.40447 13.22989 1.191277e-20

FDRs were omitted but they are there. How would remove the hits that have a pValue less than 0.05?

Also, I've read somewhere that the pValues shown here are not the real pValues, and are actually a hypothesis and mean-adjusted pValue that are specific to the glmTreat function itself? If you could kindly explain what glmTreat does, I would greatly appreciate it because I was not able to find any good answers after searching for a few hours.

Damon

pvalue edger r glm qlf • 1.2k views
ADD COMMENT
2
Entering edit mode
4.6 years ago
ATpoint 82k

glmTreat tests against a certain fold change as the null hypothesis, in this case 1.5 instead of the default 0. This is meant to keep those genes that are probably more biologically-meaningful (as low FCs might not actually have any biological effect) or to reduce the number of candidate genes in case of high numbers of DEGs. Still, this requires greater statistical power than a standard test against a FC of 0 so you might lose quite some DEGs when having low numbers of replicates. In any case, you should use the FDRs that are produced after running the output through topTags and not nominal p-values.

Say you have topTags as tt then do tt_signif <- tt[tt$FDR < 0,05,]. The tt is simply a data.frame so standard rules of subsetting in R apply.

ADD COMMENT
0
Entering edit mode

Thank you for the reply. But what if the FDR is low, the log change is high, but the pValue is high for a certain gene? Would that gene still be considered significant?

ADD REPLY
0
Entering edit mode

I don't think this is a realistic example. How would FDR be high and p be low? FDR in simple terms "penalizes" or "down-corrects" p-values to account for multiple-testing, so it is always lower than the nominal p-value. If not something very odd is happening.

ADD REPLY
1
Entering edit mode
4.6 years ago
Gordon Smyth ★ 7.0k

We try to document our software in such a way that you don't need to search for explanations, let alone searching for hours.

Just type ?glmTreat to bring up the help page. If the short explanation on the help page isn't enough, then read the reference that is cited.

Reading the help page would also tell you how to limit to genes with FDR below a certain value. Hint: see the p.value argument.

ATpoint is right that you should be giving attention to the FDR column rather than to the PValue column. FDR accounts for multiple testing while the p-values themselves do not. If FDR < 0.05, then the p-value certainly will be also.

ADD COMMENT
0
Entering edit mode

williams.damon26, just that you know Gordon Smyth is the senior author of edgeR so you can take his answers as a reliable reference. I recommend also to search the Bioconductor support page for answers towards common edgeR questions as the developers are very active over there.

ADD REPLY

Login before adding your answer.

Traffic: 1971 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6