SizeFactors for BAM normalization
1
2
Entering edit mode
17 months ago
Anand Rao ▴ 440

My goal is to make PCA and correlation plots of my RNA-Seq BAM files. Some useful discussion on BioStars such as this, have helped guide my steps.

In another post, responding to a question on library size normalization at this BioStars post, user ATpoint indicates size factor calculation must be performed as follows:

## edgeR:: calcNormFactors
tmp.NormFactors <- calcNormFactors(object = raw.counts, method = c("TMM"), doWeighting = FALSE)

## raw library size:
tmp.LibSize <- colSums(raw.counts)

## calculate size factors:
SizeFactors <- tmp.NormFactors * tmp.LibSize / 1000000


In my analyses, I used DESeq2 instead of edgeR, after importing SALMON quantification using tximport, using syntax instructions at BioConductor, as follows:

library(DESeq2)
Design <- DataFrame((cbind(BiolRep, Genotype, TimePoints)))
dim(Design)
#[1] 144   3
rownames(Design) <- colnames(txi.salmon$counts) design_formula <- ~ TimePoints * Genotype dds <- DESeqDataSetFromTximport(txi.salmon, Design.df, design_formula) NormValues <- estimateSizeFactorsForMatrix(counts(dds))  So my 1st question is this: To use DESeq2-based size Factors for converting BAM to BigWig, using bamCoverage of deepTools, I would still need to calculate SizeFactors as follows, rather than use just the (inverse of the) NormValues, am I right? SizeFactors <- NormValues * LibSize / 1000000  And my 2nd question is : With SizeFactors calculated as above, I'd then have to use the inverse of those values to obtain my final normalized BAM files as inputs for use with deepTools, with the following syntax, am I right? bamCoverage -b$BAM_IN -o $BigWig_OUT --normalizeUsing None --scaleFactor$(1/Size_factor) --effectiveGenomeSize \$ACGTtotalCount


Could you please confirm or correct the approach I have indicated above? Thanks in advance!

deepTools bamCoverage sizeFactors DESeq2 • 864 views
5
Entering edit mode
17 months ago

Please use 1/calcNormFactors(object = raw.count) as the scaling factor. Whether you use TMM or the default RLE is largely immaterial to me. Your bamCoverage command looks fine.

0
Entering edit mode

Thanks, Devon. Just to be doubly sure I understood you right, LibSize is not relevant or factored into the sizeFactor value, just the calcNormFactors values, yes? (i.e. before it's inverse is used with bamCoverage)

2
Entering edit mode

Correct, you don't need to account for library size.

1
Entering edit mode

Yes that is true, as Devon says. The DESeq2 factors already have the lib.size-part incorporated while in edgeR you have to calculate it manually.

0
Entering edit mode

Thank you for confirming _/\_

0
Entering edit mode

On a related topic - for multiBigwigSummary, is it possible to specify --bwfiles and --labels as 2 text files containing the respective lists, rather than explicitly at the command line? I have ~ 150 input BW files, so syntax clarity may become an issue, hence this query. This is a very minor issue though, if I can even call it that :) TIA!

1
Entering edit mode

No, there's no way to feed the file names in via a file, since we kind of assume that anyone handling that many files is using something like snakeMake to automatically generate the command. As an aside, it's tough to interpret any plots with that many samples.

0
Entering edit mode

I agree - the plotPCA and heatmap images generated were hard to interpret, I had to use much smaller and meaningful subsets to be able to 'see' anything. Thanks very much for your help.