Question

How should effective lengths returned by Salmon be collapsed for Differential Expression?

0

Entering edit mode

6.1 years ago

arf1389 ▴ 10

Hi All,

I used Salmon to align a set of technical replicate fasta files to my reference transcriptome with the seq-bias and gc-bias corrections enabled.

I know that the tximport package reports the effective length vector should be processed the following way for use in edgeR:

cts <- txi$counts

normMat <- txi$length

normMat <- normMat/exp(rowMeans(log(normMat)))

library(edgeR)

o <- log(calcNormFactors(cts/normMat)) + log(colSums(cts/normMat))

y <- DGEList(cts)

y$offset <- t(t(log(normMat)) + o)

#y is now ready for estimate dispersion functions see edgeR User's Guide

What I am unsure of is...

If I collapse my technical replicates by the sum or the mean, how should I collapse the effective length vector returned from tximport? Should I take the mean of the effective lengths? The sum?

RNA-Seq salmon alignment • 2.3k views

ADD COMMENT • link 6.1 years ago by arf1389 ▴ 10

score 1 · Accepted Answer · 2018-03-23

The answer is to this question is a feature that was added as of Salmon v 0.9.0.

Added the quantmerge command. This allows producing a multi-sample TSV file with aggregated abundance metrics over samples from many different quantification runs

This can be used to merge technical replicate count estimates and produce a new data set with the merged counts and effective lengths.