disagreement between edgeR logFC and readCounts submitted
Entering edit mode
20 months ago

Hi, I am trying to find differentially bound regions (ChIPseq data) by edgeR. The experiment has two groups (A and B) with only one replicate in each.

I used the following code for the group B vs. group A analysis:

reads <-  readCounts.txt
group <- factor(c("A", "B"))
y <- DGEList(counts = reads, group = group)
keep <- filterByExpr(y)
y <- y[keep,,keep.lib.sizes=FALSE]
y <- calcNormFactors(y, method = "TMM")
bcv <- 0.2
et <- exactTest(y, dispersion=bcv^2, pair = c("A", "B"))
FDR <- p.adjust(et$table$PValue, method="BH")
et$table$FDR <- FDR
results_EXT <- as.data.frame(et$table)
results_EXT_sig_dn <- filter(results_EXT, FDR < 0.05 & logFC < 0)

The problem I am facing is, for some regions in the decreased binding (results_EXT_sig_dn) results, the logFC is negative even when the group B reads are higher than group A . I have also looked at the scaled reads (reads multiplied by normalization factors) which are labelled as "_s"

"chr" "start" "end" "logFC" "logCPM" "PValue" "FDR" "reads_A" "reads_B" "reads_A_s" "reads_B_s"
"chr10" "42683394" "42683726" "-1.627117" "8.793078" "0.0001153746" "0.002275910" "2691" "3136" "2852.46" "2947.84"
"chr11" "92788101" "92788347" "-1.604364" "6.331717" "0.0001889196" "0.003331863" "490" "580" "519.40" "545.20"
"chr13" "73985586" "73985745" "-6.205887" "4.309577" "1.710055e-23" "6.610901e-19" "168" "8" "178.08" "7.52"
"chr22" "46897721" "46897850" "-1.748511" "3.734153" "0.0002410538" "0.004000832" "86" "92" "91.16" "86.48"

In the above data, line 1 is the header, lines 2 & 3 are anomalies, and lines 4 & 5 are as expected

I will be thankful if anyone can explain this anomaly.

edgeR • 966 views
Entering edit mode
20 months ago

to me looks like none of your fold changes seem to match the expectations, take line 5

log2(86/91) would be around -0.08 but you have -1.74 there.

I think you might be matching up the wrong counts, make sure to merge by row.name

Entering edit mode

Thank you for your reply. Please have look at the below and let me know your view. The code I used to join two data frames:

lines <- inner_join(results_EXT_sig_dn, reads)
Joining, by = c("chr", "start", "end")

line 5 from two tables:

"chr" "start" "end" "logFC" "logCPM" "PValue" "FDR"
"chr22" "46897721" "46897850" "-1.74851077922428" "3.73415308103011" "0.000241053755340764" "0.00400083166912345"

"chr" "start" "end" "reads_A" "reads_B"
"chr22" "46897721" "46897850" "86" "92"

group A: 1.06
group B: 0.94

# code to generate normalized reads (reads multiplied with norm.factors)
lines <- mutate(lines, reads_A_s = reads_A * 1.06, reads_B_s = reads_B * 0.94)

# exactTest source code (edgeR)
abundanceA <- mglmOneGroup(y1+matrix(prior.count[j1],ntags,n1,byrow=TRUE),offset=offset.aug[j1],dispersion=dispersion)
abundanceB <- mglmOneGroup(y2+matrix(prior.count[j2],ntags,n2,byrow=TRUE),offset=offset.aug[j2],dispersion=dispersion)
logFC <- (abundanceB-abundanceA)/log(2)

The logFC calculation by exactTest uses 4 variables: dispersion, offset, prior count, and reads. Dispersion depends on BCV (0.2). Offset and prior.count values are dependent on the library size and norm.factors. Reads are as provided in the table.

I tried my code with another set of data (RNAseq with no replicates) and I did not find this anomaly. This gives me confidence that my code works. I am not able to comprehend why this anomaly is happening to certain regions in the downregulated. Other regions in the downregulated and all upregulated regions' logFC direction (positive or negative) is in coherence with the reads (more or less reads in B).

Entering edit mode

I think the discrepancy of a -0.08 log2FoldChange showing up as -1.7 is far more worrisome than whether the gene is up or down-regulated

The latter it may feel like a big deal because it looks like a binary choice: up or down, plus or minus, but remember that is just an artifact of the logarithm switching sign at 1.

A small change around 1 is meaningless because tiny changes can make it look up or down-regulated.

A huge change, such a gene that show little change showing up with 4x fold change ... now that is a big error

I would suggest looking closely at how you combined the data. Whether or not it seems to work elsewhere is not all that comforting; the whole point of bugs is that they only manifest themselves only in some cases,

Entering edit mode

Thank you for your reply. I do not disagree with you on the biological importance of the level of fold change. Both the fold change direction and the fold change level are equally important. It would be excellent if I can find solutions for both. I looked at the source code and I am not able to figure out where it can be corrected. I am seeking for pointers toward solutions.

Entering edit mode

We can't easily troubleshoot because this is about combining data sets, your edgeR data is already combined with coordinates. The problem could also in producing that data.

in general, my advice in such cases is to merge on a single unique column, that way it becomes pretty clear what is going on

just recently, I had a very similar error when I merged gene names onto an edgeR object, I ended up with wrong gene names and noticed it during testing only.

I would go back to the edgeR and closely investigate how you are merging the chromosomal coordinates onto the edgeR result file

Entering edit mode

Hi, I could not figure it out with edgeR. I was reading multiple posts suggesting NOISeq as an option to analyze the "no replicate" data. The noiseq function seems to be specifically designed for "no replicate" data. I didn't find any discrepancies that I found using edgeR for my data. The direction and the level of fold change agree with the reads (normalized) provided. Thank you for the conversation.


Login before adding your answer.

Traffic: 3226 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6