Question

error in differential analysis using GLM test

0

Entering edit mode

6.2 years ago

bisht20diksha ▴ 30

Hello. I am trying to use splAdder for alternative splicing analysis. I have three replicates of barley both for control and treated respectively i.e c1, c2,c3 and t1, t2, t3. I ran splAdder successfully and got the output. I have quantified and confirmed alternative splicing events based on the RNA-Seq data. But, now to perform differential analysis, I did GLM test using script spladder_test.py with command line:

python2.7 spladder_test.py -o splAdder_result -a r1c.sorted, r2c.sorted, r3c.sorted -b r1t.sorted, r2t.sorted, r3t.sorted

The following error appears:

/home/aasim/.local/lib/python2.7/site-packages/statsmodels/genmod/generalized_linear_model.py:656: RuntimeWarning: divide by zero encountered in double_scalars
  (self.df_resid))
Traceback (most recent call last):
  File "spladder_test.py", line 873, in <module>
    main()
  File "spladder_test.py", line 855, in main
    pvals = run_testing(cov, dmatrix0, dmatrix1, sf, CFG)
  File "spladder_test.py", line 627, in run_testing
    (disp_raw, disp_raw_conv) = estimate_dispersion(cov, dmatrix1, sf, CFG)
  File "spladder_test.py", line 327, in estimate_dispersion
    (disp_raw, disp_raw_conv, _) = estimate_dispersion_chunk(gene_counts, matrix, sf, CFG, sp.arange(gene_counts.shape[0]), log=CFG['verbose'])
  File "spladder_test.py", line 268, in estimate_dispersion_chunk
    result = modNB.fit()
  File "/home/aasim/.local/lib/python2.7/site-packages/statsmodels/genmod/generalized_linear_model.py", line 903, in fit
    cov_kwds=cov_kwds, use_t=use_t, **kwargs)
  File "/home/aasim/.local/lib/python2.7/site-packages/statsmodels/genmod/generalized_linear_model.py", line 1007, in _fit_irls
    raise PerfectSeparationError(msg)
statsmodels.tools.sm_exceptions.PerfectSeparationError: Perfect separation detected, results not available

How to fix it?

GLM test spladder • 2.0k views

ADD COMMENT • link updated 6.2 years ago by Kevin Blighe 87k • written 6.2 years ago by bisht20diksha ▴ 30

score 1 · Answer 1 · 2018-02-28

In general English, this error message indicates that the conditions that you are comparing in the regression model are 'very well' separated; so, the error can be viewed as positive. However, in biology, 'perfect' separation is never expected between different conditions because the complexity of biology far exceeds our current level of understanding of it. It's akin to ROC analysis where a model returns an AUC of 1.0 (100%), which is suspicious and is generally only observed with low sample numbers.

Thus, this error is most likely a consequence of your low sample numbers. To avoid the error, you would have to increase your sample n and then re-do.

If you still believe that there's something else going on unrelated to what i have written, then I encourage you to post on the GitHub issues page: https://github.com/ratschlab/spladder/issues

Kevin