Question: edgeR RNA-seq Results Across Conditions
0
gravatar for asyndeton17
15 months ago by
asyndeton1720
asyndeton1720 wrote:

Hi,

I recently performed RNA-seq differential expression analysis. I first did a comparison between Condition 2 vs. Control. Then I did Condition 3 vs. Control. However, the raw counts do not seem to be substantiating the output data from edgeR. For instance, the Krt17 gene supposedly has a log fold change of 7 in the comparison between Condition 2 vs. Control, and the Krt17 gene does not show up on the list of differentially expressed genes (at an FDR of 0.05 or lower) for Condition 3 vs. Control. Yet, when I go back to the raw counts for the Krt17 gene, this is what I have: Control replicates(0, 0, 0), Condition 2 (25, 27, 25), and Condition 3 (16, 14, 11). Clearly, it seems to be "overexpressed" in both, but it doesn't show up in the other list. How do I fix this or account for such an observation?

For those wondering, I ran my analysis following most of what the Griffith Lab tutorial suggests.

Thanks

edger rna-seq • 558 views
ADD COMMENTlink modified 15 months ago by shoujun.gu340 • written 15 months ago by asyndeton1720

Try normalizing the data to a linear scale first then fit a model with limma.

#filter lowly expressed
y <- DGEList(counts = raw_counts)
y <- y[!rowSums(y$counts == 0) == ncol(raw_counts),] 
A <- aveLogCPM(y)
y2 <- y[A>1,]

#library size factor, quantile, and batch normalizations
design <- model.matrix(~0+group)
cont.matrix <- makeContrasts(group1 - group2, 
    group2 - group3, group1 - group3, levels = design)
y3 <- calcNormFactors(y2, method = "TMM")
dge <- voomWithQualityWeights(y3, design=design, 
    normalization="quantile", plot=FALSE)
rbe <- removeBatchEffect(dge, batch)

#fit to linear model
fit <- lmFit(dge, design)
fit2 <- contrasts.fit(fit, cont.matrix)
fit3 <- eBayes(fit2, trend = TRUE, robust = TRUE)

#examine results
topTable(fit3, coef=1, num=Inf, sort.by = "P", adjust = "BH")
topTable(fit3, coef=2, num=Inf, sort.by = "P", adjust = "BH")
topTable(fit3, coef=3, num=Inf, sort.by = "P", adjust = "BH")
ADD REPLYlink modified 15 months ago • written 15 months ago by mforde841.2k

Would it work if I filtered lowly expressed counts before creating the edgeR object? If so, what sort of cutoff should I make?

ADD REPLYlink written 15 months ago by asyndeton1720

It's already taken care of in the code. Specifically, the number of instances a gene / transcript feature has a sample count greater than 0 and the average CPM across all samples is atleast 1. Usually, if you used the same seq machine to run all of your samples, you will use the sequencing lane as factor for batch, and this can scale out to include multiple sequencing machines from different institutions, etc.

ADD REPLYlink modified 15 months ago • written 15 months ago by mforde841.2k

Can you post your code here?

ADD REPLYlink written 15 months ago by moushengxu350

For the most part, I am following the Griffith Lab Tutorial: https://github.com/griffithlab/rnaseq_tutorial/blob/master/scripts/Tutorial_Module4_Part4_edgeR.R

ADD REPLYlink written 15 months ago by asyndeton1720
0
gravatar for shoujun.gu
15 months ago by
shoujun.gu340
Rockville/MD
shoujun.gu340 wrote:

is its FDR too big? or you may check your filter criteria to see whether this gene is filtered by not enough reads ?

ADD COMMENTlink modified 15 months ago • written 15 months ago by shoujun.gu340
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1552 users visited in the last hour