Checking potential batch effects with svaseq
1
2
Entering edit mode
4.8 years ago

I tried to validate whether the batch effects in my rna-seq experiment are real or not. You can find more info regarding the experiment in this post.

Shortly: Because of different time points when preparing the RNA and libraries I assumed that I have 3 potential batch effects in total.

I used svaseq from the sva-package to accomplish this. During the procedure I was wondering about some details.

1. using the function num.sv led to 10 estimated surrogate variables. Interestingly, svaseq blow up with the following error: error in density.default(x, adjust = adj) : 'x' contains missing values Using 9 variables worked fine. Has someome an idea what is going on? I checked my matrix and it does not contain any NA's.

Edit (20160705): It seems that the function num.sv() from sva has a bug. I tried it on several experiments and svaseq() always spit out the same error. If I defined the n.sv-argument within svaseq as num.sv-1, it always works perfectly fine.

2. If one does not use num.sv and because of that does not supply a number to the n.sv-argument of svaseq, it estimates the number of surrogate variables for its own. Interestingly this differs completely from the num.sv function because in my case I got 6 instead of 10 surrogate variables which confused me a bit. Can someone explain this difference?

Using my own estimated batch effects and the 9 surrogate variables estimated by num.sv (9 because 10 didn't work so far -> as described above) led to the same biggest hit. So both result lists are different but e.g. the top hit is the same and there are also a lot of other overlaps. Thinking very naive I would say this is a good hint, that svaseq found my batch-effects in a proper way and that they are real. I mean my batch effects are factors and the sv's from svaseq are continuous. It is unlikely that both models will give me the same top hit by chance, isn't it? What would you think?

I also tried to associate my batch effects with the estimated ones by svaseq. Unfortunately, the power is to low to get reliable results. I checked some plots (box plots, stripcharts) and here the association between the batch-effects and the SV's is visible but not fully convincing.

RNA-Seq R • 4.1k views
0
Entering edit mode

Hi Tobias.Wohland,

I am facing the same problem and I am not sure how to proceed. I tried your way, n.sv-1, but it did not work again. When I am calculating n.sv using "leek" method, the number of surrogate variables were 84. with the method "be", the number of surrogate variables were 1.

0
Entering edit mode

I suggests you add the code you used to reach the errors. Otherwise anyone trying to help will find very difficult to figure out what might be going wrong. It is possible that the function you are using has indeed a bug, but it is also possible that you are not using it right. Without the code it is impossible to tell.

0
Entering edit mode

0
Entering edit mode
2.2 years ago
bookerbn • 0

I encounter a similar error when working with sva and smartsva.cpp and I found that all my categorical variables read in as characters. Once I changed them all to factors the function worked without issues. I hope this helps! Cheers!