Question: Best way to address different batches of RNA-seq
gravatar for tud55122
3.8 years ago by
United States
tud5512220 wrote:

I did two batches of RNA-seq upon same drug treatment but at different time points on the same cell line. For each time point, I have corresponding DMSO controls. All my Fold Change values are calculated over the corresponding DMSO control. I'm trying to trace the gene expression dynamics upon my drug treatment. I found that if I simply merge two datasets into one master table and plot the RPKM and Fold Change values, the numbers do not reflect the real dynamics I saw by q-PCR. For example: I knew one gene (and a few more) that can be induced highest upon 4-day treatment but the RPKM values and Fold Changes do not show the same trend. So my question is:

What's the best way to address this issue?

Appreciate your help.


ADD COMMENTlink modified 17 months ago by Biostar ♦♦ 20 • written 3.8 years ago by tud5512220

Below are the scripts I used.

> samples=read.table("Samples.txt",sep="\t",header=T)
> counts = readDGE(samples$countf)$counts
> noint = rownames(counts) %in% c("__no_feature","__ambiguous","__too_low_aQual"                               ,"__not_aligned","__alignment_not_unique","no_feature","ambiguous","too_low_aQua                               l","not_aligned","alignment_not_unique")
> counts = counts[!noint,]
> dim(counts)
> cpms = cpm(counts)
> keep = rowSums(cpms>0)>=1
> counts = counts[keep,]
> dim(counts)
> d = DGEList(counts=counts, group=samples$condition)
> d = calcNormFactors(d)
> design <- model.matrix(~0+samples$condition)
> d <- estimateGLMCommonDisp(d,design)
> d <- estimateGLMTrendedDisp(d,design)
Loading required package: splines
> d <- estimateGLMTagwiseDisp(d,design)
> fit <- glmFit(d,design)
> et <- glmLRT(fit, contrast=c(-1,1,0,0,0,0,0,0,0,0))
> tt = topTags(et, n=nrow(d),"none")
> write.table(tt,"EdgeR/test.txt",sep="\t",row.names=T, quote=F)
ADD REPLYlink modified 3.8 years ago by GenoMax95k • written 3.8 years ago by tud5512220

Please use ADD COMMENT/ADD REPLY when responding to existing posts to keep threads logically organized.

ADD REPLYlink written 3.8 years ago by GenoMax95k

Ok, then perhaps you could try to add the batch into the model/design. Although if you processed the DMSO/drug samples at the same time, it shouldn't be very impactful.

ADD REPLYlink written 3.8 years ago by Carlo Yague5.5k

Is your fold change also computed based on RPKM values ? RPKM is not a proper between sample normalization metric so I suggest that you try a different method such as edgeR/DESEQ.

ADD REPLYlink written 3.8 years ago by Carlo Yague5.5k

Thanks for the reply. My fold change is computed based on counts values. I used edgeR to calculate counts and generated RPKM values.

ADD REPLYlink written 3.8 years ago by tud5512220

What was different between the two batches? In addition, as said above you shouldn't use RPKM, use statistics as in DESeq2

ADD REPLYlink written 3.7 years ago by Asaf8.5k

As mentioned by @tud55122 and @Asaf you shouldn't use RPKM values. I have used DESeq2 and there you have option to use batch+ condition information to make comparisons. I am not aware is edgeR ihas similar options. If you have counts data then you can easily follow DESeq2 vignette to see how to do that. I use the 'vst' based normalization which is better than RPKM based normalization. Hope it helps.

ADD REPLYlink written 17 months ago by piyushjo550

Seems you have two timepoints? Two conditions? and then a number of replicates per condition/timepoint - could you describe exactly which samples were sequenced together?

ADD REPLYlink written 17 months ago by kristoffer.vittingseerup3.5k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2224 users visited in the last hour