Question

Design matrix for voom: with or without batch blocking?

0

Entering edit mode

5.3 years ago

Sebastian Hesse ▴ 340

To calculate differential expression in a proteomics dataset I would like to use LIMMA. I have two questions about it.

1) So far I did the LIMMA analysis on my log2 transformed data only, however it appears that it is recommended to perform a room transformation first as they represent counted data (from MS/MS DIA).

For the voom transformation a design matrix is requested, apparently the same one as for LIMMA. Due to batch effects I am blocking for two batches in the design matrix for LIMMA, using: design <- model.matrix(~0 + Causative.gene +batch_date.proc + batch_cell.number, data = pData(ExSet)).

Should I use the same matrix vor VOOM or better not include the batch block in it?

2) Could you tell me a code how I could check if VOOM transformation for my data is actually necessary? As much as I understand VOOM corrects for heteroskedacity in the data. How could I check if my expression data are actually affected by heteroskedacity? (I found this post about it but don't know how to apply this to my data with 170 samples and 8 groups to compare: https://datascienceplus.com/how-to-detect-heteroscedasticity-and-rectify-it/)

Thanks a lot! Sebastian

r voom limma proteomics • 2.4k views

ADD COMMENT • link updated 4.2 years ago by Gordon Smyth ★ 7.0k • written 5.3 years ago by Sebastian Hesse ▴ 340

score 1 · Answer 1 · 2019-01-18

If your data is count data, then it will definitely suffer from heteroskedacity. There are other ways to make count data suitable for Limma, e.g. limma-trend, but AFAIK voom is the most robust. One way to see this is to plot variance against mean expression for each protein. You will see that genes with higher expression have a higher variance.

I would use the same design matrix as you will use in limma, including the blocking factors.

score 0 · Answer 2 · 2020-01-20

See Chapter 15 (RNA-seq Data) in the limma User's Guide. The comments there apply to proteomics counts as well as RNA-seq.

Heteroscedasticity is examined in limma by either a voom plot (for the voom pipeline) or plotSA (for the limma-trend pipeline. Standard univariate heterscedasticity analyses (like the link you give) are not useful in the proteomics context.

As @i.subdbery has remarked, you certainly need to transform. Count data are always heteroscedastic. The only choice is between voom or limma-trend.