Question: Normalizing for RNA abundance across replicates from a time course
gravatar for Chloe
3.8 years ago by
Queensland University of Technology
Chloe0 wrote:

Hi all,

I am trying to normalize my read counts for differential gene expression with edgeR

I have a set of 21 bam files from aligning my reads to a genome, corresponding to 3 replicates at each of my 7 time points.

I would like to do DGE using edgeR, but first I need to normalize for RNA abundance between replicates.

I was told I might be able to use RSEM or edgeR to produce a normalized count matrix. The issue is that my reads were generated using the QuantSeq library prep kit, so only one fragment is produced per transcript (and therefore the read count should be a direct reflection of the number of transcripts). For this reason QuantSeq recommends using HTSeq to produce a count matrix.

Is there away to produce a count matrix with HTSeq and then normalise across the replicates, without interfering with the fact that the read counts should be a direct reflection of the transcript counts? Can edgeR normalise the count matrix?

I think I have to avoid using FPKM (part of RSEM?) but I am not sure if it is appropriate to use RPKM, TMM, Upper quartile etc. I don't know much about these kinds of counts other then that they exist.

I was trying to work it out with RSEM but it doesn't seem to accept my bam files as they were produced by aligning to a genome not transcriptome

Thanks, Chloe

ADD COMMENTlink modified 3.8 years ago by Jake Warner810 • written 3.8 years ago by Chloe0
gravatar for Jake Warner
3.8 years ago by
Jake Warner810
Jake Warner810 wrote:

Hi Chloe, You can use HTseq to generate a count table and then pass it to edgeR. Then, in edgeR, you can group your samples by replicates, normalize (TMM), perform DE tests, etc. I assume you would compare each time-point to it's precedent or to T0.

For example:

#edgeR workflow:
group <- factor(c(1,1,1,2,2,2,3,3,3,4,4,4,5,5,5,6,6,6,7,7,7)) #group samples
y <- DGEList(counts=counts, group=group)
mean(y$samples$lib.size) #mean library size
y <- calcNormFactors(y) #TMM normalization
z <- cpm(y, normalized.lib.size=TRUE) # counts per million:
de_T1_T2 <- exactTest(y, pair=c(1,2)) #DE testing

There's a lot of good info in the edgeR vignette:

ADD COMMENTlink written 3.8 years ago by Jake Warner810

Awesome thanks I'll give this a go

ADD REPLYlink written 3.8 years ago by Chloe0
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 979 users visited in the last hour