Error ecountered while running conifer with demo data set
5
1
Entering edit mode
6.8 years ago

Hi all

I am trying to run conifer using the demo data set which comes bundled with it. Following are the commands I ran:

python conifer.py  analyze --probes conifer_v0.2.2/sampledata/probes.txt --rpkm_dir conifer_v0.2.2/sampledata/RPKM_data/ --output /conifer_v0.2.2/sampledata/test_Run_analysis.hdf5 --svd 6 --write_svals /conifer_v0.2.2/sampledata/test_Run_singular_values.txt

Error:

[INIT] Finished reading RPKM files. Total number of samples in experiment: 26 (0 failed to read properly)
[INIT] Attempting to process chromosomes:  chr1, chr2, chr3, chr4, chr5, chr6, chr7, chr8, chr9, chr10, chr11, chr12, chr13, chr14, chr15, chr16, chr17, chr18, chr19, chr20, chr21, chr22, chrX, chrY
[RUNNING: chr1] Now on: chr1
[RUNNING: chr1] Found 19822 probes; probeID range is [0-19822]
[RUNNING: chr1] Calculating median RPKM
[RUNNING: chr1] Masking 412 probes with median RPKM < 1.000000
Traceback (most recent call last):
  File "conifer_v0.2.2/conifer.py", line 682, in <module>
    args.func(args)
  File "/conifer_v0.2.2/conifer.py", line 157, in CF_analyze
    probeIDs = np.array(map(operator.itemgetter("probeID"),chr_probes))[probe_mask]
IndexError: boolean index did not match indexed array along dimension 0; dimension is 19822 but corresponding boolean dimension is 19821
Closing remaining open files:/conifer_v0.2.2/sampledata/test_Run_analysis.hdf5...done

A google search with the error pointed to NumPy version compatibility problem, however I have the latest version, please see the output of pip freeze below:

$pip freeze
adium-theme-ubuntu==0.3.4
cycler==0.10.0
decorator==4.0.11
functools32==3.2.3.post2
matplotlib==2.0.2
netifaces==0.10.4
networkx==1.11
numexpr==2.6.2
numpy==1.13.1
pygobject==3.20.0
pyparsing==2.2.0
pysam==0.8.3
python-dateutil==2.6.0
pytz==2017.2
scour==0.32
six==1.10.0
subprocess32==3.2.7
tables==3.2.3.1
unity-lens-photos==1.0
conifer python numpy cnv • 4.8k views
ADD COMMENT
2
Entering edit mode
2.9 years ago
temp_1 ▴ 20

I have spend some time working on it, and I find out how to solve it.

The problem may raise from version incompatibility.

Bug-free scripts can be found on https://github.com/fuzhican/CoNIFER

The dependency versions for conifer 0.2.2 in my solution:

  - pytables=3.5.2=py27h9f153d1_2
  - python=2.7.18=h15b4118_1
  - hdf5=1.10.5=nompi_h3c11f04_1104
  - numpy=1.16.5=py27h95a1406_0
  - numexpr=2.7.1=py27hb3f55d8_0

The solution as follow:

Migrating from PyTables 2.x to 3.x by pt2to3

pt2to3 -i conifer.py
pt2to3 -i conifer_functions.py

line 142 of conifer.py should be edited to

rpkm = RPKM_data[start_probeID - 1:stop_probeID,:]

Meanwhile, I encountered another error and fixed it, post it for someone who need.

CoNIFER NameError global name samples not defined

This error was raise by line 112 of conifer_functions.py

return np.loadtxt(samples[s], dtype=np.float, delimiter="\t", skiprows=0, usecols=[2])

line 112 of conifer_functions.py should be edited to

return np.loadtxt(rpkm_filename, dtype=np.float, delimiter="\t", skiprows=0, usecols=[2])

The solution was inspired by the answer of Rodrigo, thanks to Rodrigo!

ADD COMMENT
1
Entering edit mode
6.8 years ago
Rodrigo ▴ 190

I think is a bug with the source code of conifer. The error you get,

Traceback (most recent call last):
  File "conifer_v0.2.2/conifer.py", line 682, in <module>
    args.func(args)
  File "/conifer_v0.2.2/conifer.py", line 157, in CF_analyze
    probeIDs = np.array(map(operator.itemgetter("probeID"),chr_probes))[probe_mask]
IndexError: boolean index did not match indexed array along dimension 0; dimension is 19822 but corresponding boolean dimension is 19821

basically means that the two sequences chr_probes and probe_mask are not of the same size. Specifically chr_probes has one more item than probe_mask.

After looking at the file conifer.py, we can find the possible source of the problem being (line 142),

rpkm = RPKM_data[start_probeID:stop_probeID,:]

and (line139)

stop_probeID = chr_probes[-1]['probeID']

where rpkm is used to get the length of median which is used to get the length of probe_mask.

I don't have installed some requirements for this module so I can't run the tests. But my best guess is that line 142 of conifer.py should be edited to

rpkm = RPKM_data[start_probeID:stop_probeID + 1,:]

can you try that and let me know if it works?

ADD COMMENT
0
Entering edit mode

I have the exact same problem with the sample data, changing the 142 line in conifer.py did not helped. The error is thrown again. here is the output

Traceback (most recent call last):
  File "conifer.py", line 683, in <module>
    args.func(args)
  File "conifer.py", line 158, in CF_analyze
    probeIDs = np.array(map(operator.itemgetter("probeID"),chr_probes))[probe_mask]
IndexError: boolean index did not match indexed array along dimension 0; dimension is 465 but corresponding boolean dimension is 464
Closing remaining open files:analysis.hdf5...done
ADD REPLY
0
Entering edit mode

Hello!

I tried your method and it worked for all the chromosomes except 'chr Y' in the sample data set but that is understandable.

Note: I had to change the conifer.py and conifer_functions.py making it compatible with PyTables 3.x. Also, I didn't do any changes with the versions of other tools to use CoNIFER. I hope this helps.

ADD REPLY
1
Entering edit mode
6.3 years ago
lech.nieroda ▴ 10

I don't know if you are still looking for the solution, however I've run into exactly the same issue and found a fix. The reason for the error is sloppy programming and incompatibilities between certain pytables, numpy, numexpr and hdf5 versions. For example: pytables 3.x has major api changes and has obsoleted functions that conifer uses, but if you use an older 2.x version, numexpr functions from 2.2 and above are broken as well. Also, the installation script from pytables is broken, so that it regards 1.13 as a lower version than 1.4, so you have to use numpy 1.9. Furthermore, hdf5 1.10 breaks pytables functions, so you have to downgrade to 1.8.

The short answer: use hdf5 1.8, pytables 2.4.0, numpy 1.9.3, numexpr 2.1 in order to run conifer 0.2.2.

ADD COMMENT
0
Entering edit mode

Apparently, hdf5 1.8 is not supported with pytables 2.4.0 when I try to install pytables with hdf5 1.8 it says not supported. Did you face the same issue?

ADD REPLY
0
Entering edit mode
5.2 years ago

Re:The short answer: use hdf5 1.8, pytables 2.4.0, numpy 1.9.3, numexpr 2.1 in order to run conifer 0.2.2 I'm sorry but which is the correct python module of hdf5 1.8 to install with pip?? The other (ytables 2.4.0, numpy 1.9.3, numexpr 2.1) are right but i can't find the last. Thank you very much

ADD COMMENT
0
Entering edit mode

hdf5 1.8 can be found at this location: https://support.hdfgroup.org/ftp/HDF5/releases/hdf5-1.8/

ADD REPLY
0
Entering edit mode
4.9 years ago

Hi,

someone can help me because i am trying conifer and when i run with the test samples it runs but when i try with my samples I get this error:

[RUNNING: chr1] Now on: chr1 [RUNNING: chr1] Found 39 probes; probeID range is [40-110] [RUNNING: chr1] Calculating median RPKM [RUNNING: chr1] Masking 0 probes with median RPKM < 1.000000 Traceback (most recent call last): File "/home/mferreira/programs/conifer_v0.2.2/conifer.py", line 682, in <module> args.func(args) File "/home/mferreira/programs/conifer_v0.2.2/conifer.py", line 157, in CF_analyze probeIDs = np.array(map(operator.itemgetter("probeID"),chr_probes))[probe_mask] IndexError: index 39 is out of bounds for axis 1 with size 39 Closing remaining open files:analysis.hdf5...done

I am only testing with 3 samples can be because of the low number of samples?

thank you!

0
Entering edit mode

Can you share the command that you are using?

ADD REPLY

Login before adding your answer.

Traffic: 2897 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6