Question: Normalizing for RNA abundance across replicates from a time course
0
gravatar for Chloe
5 months ago by
Chloe0
Queensland University of Technology
Chloe0 wrote:

Hi all,

I am trying to normalize my read counts for differential gene expression with edgeR

I have a set of 21 bam files from aligning my reads to a genome, corresponding to 3 replicates at each of my 7 time points.

I would like to do DGE using edgeR, but first I need to normalize for RNA abundance between replicates.

I was told I might be able to use RSEM or edgeR to produce a normalized count matrix. The issue is that my reads were generated using the QuantSeq library prep kit, so only one fragment is produced per transcript (and therefore the read count should be a direct reflection of the number of transcripts). For this reason QuantSeq recommends using HTSeq to produce a count matrix.

Is there away to produce a count matrix with HTSeq and then normalise across the replicates, without interfering with the fact that the read counts should be a direct reflection of the transcript counts? Can edgeR normalise the count matrix?

I think I have to avoid using FPKM (part of RSEM?) but I am not sure if it is appropriate to use RPKM, TMM, Upper quartile etc. I don't know much about these kinds of counts other then that they exist.

I was trying to work it out with RSEM but it doesn't seem to accept my bam files as they were produced by aligning to a genome not transcriptome

Thanks, Chloe

ADD COMMENTlink modified 5 months ago by Jacob Warner370 • written 5 months ago by Chloe0
1
gravatar for Jacob Warner
5 months ago by
Jacob Warner370
Jacob Warner370 wrote:

Hi Chloe, You can use HTseq to generate a count table and then pass it to edgeR. Then, in edgeR, you can group your samples by replicates, normalize (TMM), perform DE tests, etc. I assume you would compare each time-point to it's precedent or to T0.

For example:

#edgeR workflow:
group <- factor(c(1,1,1,2,2,2,3,3,3,4,4,4,5,5,5,6,6,6,7,7,7)) #group samples
y <- DGEList(counts=counts, group=group)
mean(y$samples$lib.size) #mean library size
y <- calcNormFactors(y) #TMM normalization
z <- cpm(y, normalized.lib.size=TRUE) # counts per million:
de_T1_T2 <- exactTest(y, pair=c(1,2)) #DE testing
#etc

There's a lot of good info in the edgeR vignette: https://www.bioconductor.org/packages/devel/bioc/vignettes/edgeR/inst/doc/edgeRUsersGuide.pdf

ADD COMMENTlink written 5 months ago by Jacob Warner370

Awesome thanks I'll give this a go

ADD REPLYlink written 5 months ago by Chloe0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1329 users visited in the last hour