Question

CNVkit: CSV error when testing installation

0

Entering edit mode

7.9 years ago

opm • 0

I am having issues getting CNVkit up and running.

I installed from source to a local directory because I do not have root privileges on our cluster:

git clone https://github.com/etal/cnvkit.git
python setup.py build
python setup.py install --prefix=$HOME/local

I installed all of the python and R dependencies, and made sure that my PYTHONPATH and R_LIBS environment variables were set correctly.

When I run make in the test/ directory, I get the following output/error:

python ../cnvkit.py segment -t .01 build/p2-5_5.cnr -o build/p2-5_5.cns
Dropped 1 outlier bins:
  chromosome     start       end    gene    log2    weight
0      chr16  29466010  29466278  BOLA2B -26.849  0.437268
Traceback (most recent call last):
  File "../cnvkit.py", line 13, in <module>
    args.func(args)
  File "/path/to/cnvkit/cnvlib/commands.py", line 714, in _cmd_segment
    rlibpath=args.rlibpath)
  File "/path/to/cnvkit/cnvlib/segmentation/__init__.py", line 61, in do_segmentation
    sample_id=cnarr.sample_id)
  File "/path/to/cnvkit/cnvlib/tabio/__init__.py", line 69, in read
    dframe = reader(infile, **kwargs)
  File "/path/to/cnvkit/cnvlib/tabio/seg.py", line 48, in read_seg
    for sid, dframe in results:
  File "/path/to/cnvkit/cnvlib/tabio/seg.py", line 102, in parse_seg
    engine="python",
  File "/path/to/local/lib/python2.7/site-packages/pandas/io/parsers.py", line 562, in parser_f
    return _read(filepath_or_buffer, kwds)
  File "/path/to/local/lib/python2.7/site-packages/pandas/io/parsers.py", line 315, in _read
    parser = TextFileReader(filepath_or_buffer, **kwds)
  File "/path/to/local/lib/python2.7/site-packages/pandas/io/parsers.py", line 645, in __init__
    self._make_engine(self.engine)
  File "/path/to/local/lib/python2.7/site-packages/pandas/io/parsers.py", line 805, in _make_engine
    self._engine = klass(self.f, **self.options)
  File "/path/to/local/lib/python2.7/site-packages/pandas/io/parsers.py", line 1608, in __init__
    self.columns, self.num_original_columns = self._infer_columns()
  File "/path/to/local/lib/python2.7/site-packages/pandas/io/parsers.py", line 1907, in _infer_columns
    line = self._buffered_line()
  File "/path/to/local/lib/python2.7/site-packages/pandas/io/parsers.py", line 1975, in _buffered_line
    return self._next_line()
  File "/path/to/local/lib/python2.7/site-packages/pandas/io/parsers.py", line 2006, in _next_line
    orig_line = next(self.data)
_csv.Error: line contains NULL byte
make: *** [build/p2-5_5.cns] Error 1

I get a very similar error (_csv.Error: line contains NULL byte) when I try to run CNVkit on my own data.

Any help is greatly appreciated! I can't seem to figure out what is causing the problem.

cnvkit • 2.8k views

ADD COMMENT • link updated 7.9 years ago by Eric T. ★ 2.8k • written 7.9 years ago by opm • 0

score 2 · Accepted Answer · 2016-07-06

My guess is that the R part of segmentation is failing, and returning an empty or corrupted file instead of a dataframe of segments.

For diagnosing the issue:

Check that you're able to load the "PSCBS" library in R
Run the unit test suites in the same test/ directory with make test. This runs two sets of tests, with the R-based functionality isolated in the second set. This should help pinpoint which piece of functionality is failing.
Try manually segmenting a test file using the Python-based "haar" method instead of the default CBS: cnvkit.py segment -m haar build/p2-5_5.cnr -o build/p2-5_5.cns. If all else fails, you can use this segmentation method for your own work.

I've just changed the development version of CNVkit to report the faulty line when parsing segmentation data. Could you pull the latest from GitHub, re-run make test, and post the new error message here if it's any different?

It looks like you're using the development version of CNVkit, which could contain surprises. I'm not sure which versions of pandas, R, and PSCBS you're using; if the tips above don't work for you, could you post the versions here?

To install the most recent stable version of CNVkit (v0.7.11), I recommend either of these options for you:

Anaconda (or the minimal bundle Miniconda) can be installed and run under a user account without root access, and will include a good set of precompiled dependencies like pandas. If you install Anaconda and then do conda install cnvkit -c bioconda then all of the dependencies, including R packages, will be pulled in automatically and there's less risk of surprises.
Otherwise, pip install cnvkit --user should install CNVkit and its Python dependencies under your own account. I recommend using virtualenv (directly or via e.g. virtualenv-burrito) to keep the installation clean and isolated; then you don't need to use the "--user" flag or fiddle with paths.