error in differential analysis using GLM test
1
0
Entering edit mode
4.7 years ago

Hello. I am trying to use splAdder for alternative splicing analysis. I have three replicates of barley both for control and treated respectively i.e c1, c2,c3 and t1, t2, t3. I ran splAdder successfully and got the output. I have quantified and confirmed alternative splicing events based on the RNA-Seq data. But, now to perform differential analysis, I did GLM test using script spladder_test.py with command line:

python2.7 spladder_test.py -o splAdder_result -a r1c.sorted, r2c.sorted, r3c.sorted -b r1t.sorted, r2t.sorted, r3t.sorted


The following error appears:

/home/aasim/.local/lib/python2.7/site-packages/statsmodels/genmod/generalized_linear_model.py:656: RuntimeWarning: divide by zero encountered in double_scalars
(self.df_resid))
Traceback (most recent call last):
File "spladder_test.py", line 873, in <module>
main()
File "spladder_test.py", line 855, in main
pvals = run_testing(cov, dmatrix0, dmatrix1, sf, CFG)
File "spladder_test.py", line 627, in run_testing
(disp_raw, disp_raw_conv) = estimate_dispersion(cov, dmatrix1, sf, CFG)
File "spladder_test.py", line 327, in estimate_dispersion
(disp_raw, disp_raw_conv, _) = estimate_dispersion_chunk(gene_counts, matrix, sf, CFG, sp.arange(gene_counts.shape[0]), log=CFG['verbose'])
File "spladder_test.py", line 268, in estimate_dispersion_chunk
result = modNB.fit()
File "/home/aasim/.local/lib/python2.7/site-packages/statsmodels/genmod/generalized_linear_model.py", line 903, in fit
cov_kwds=cov_kwds, use_t=use_t, **kwargs)
File "/home/aasim/.local/lib/python2.7/site-packages/statsmodels/genmod/generalized_linear_model.py", line 1007, in _fit_irls
raise PerfectSeparationError(msg)
statsmodels.tools.sm_exceptions.PerfectSeparationError: Perfect separation detected, results not available


How to fix it?

GLM test spladder • 1.6k views
1
Entering edit mode
4.7 years ago

In general English, this error message indicates that the conditions that you are comparing in the regression model are 'very well' separated; so, the error can be viewed as positive. However, in biology, 'perfect' separation is never expected between different conditions because the complexity of biology far exceeds our current level of understanding of it. It's akin to ROC analysis where a model returns an AUC of 1.0 (100%), which is suspicious and is generally only observed with low sample numbers.

Thus, this error is most likely a consequence of your low sample numbers. To avoid the error, you would have to increase your sample n and then re-do.

If you still believe that there's something else going on unrelated to what i have written, then I encourage you to post on the GitHub issues page: https://github.com/ratschlab/spladder/issues

Kevin