Question

Choosing the right batch effect control algorithm for bulk RNA-seq data meta-analysis

1

Entering edit mode

10 months ago

LauferVA 4.2k

Hello biostars,

I am putting together a meta-analysis of glioblastoma bRNA-seq data from many different sources.

Because in this particular case I need to compare control to treated samples, I have a problem of perfect separation between overarching Study, contributing institution, and PatientID that prevents me from cleanly partialing out variation due to Study or Institution.

As such, I plan to use one or more of the batch-control algorithms for RNA-seq that have been proposed, for instance RUV, sva, and PEER.

I am writing to ask:

1) Please share any experience you have with these. Are there cases in which one excels?

2) Equally useful would be head-to-head comparisons of these. This manuscript seems to suggest a slight preference for SVA, but I have not seen a head-to-head comparison of all three.

PEER SVA RUV RNA-seq • 488 views

ADD COMMENT • link updated 10 months ago by Papyrus ★ 2.9k • written 10 months ago by LauferVA 4.2k

1

Entering edit mode

Thanks for sharing that paper!

Once, I used SVA+RUV and mostly checked that the surrogated variables found with the 2 methods were similar/correlated (between themselves, and then, maybe, to known clinical/technical variables). My main doubt with these methods has been always deciding the number of SVs to include... For example running SVA with "leek" and "BE" and using the minimum number of estimated SVs, or looking at the SVs you then get with RUV and the correlation... But yes, this is completely arbitrary =)

ADD REPLY • link 10 months ago by Papyrus ★ 2.9k