Hello,
I've come across an issue when running CNVkit genemetrics in the latest stable release. The command appears to complete successfully but only returns a small portion of the expected output. This was not an issue with my previous installation of CNVkit, which was 0.9.3. I have also managed to fix this issue with a small change to a function of the CopyNumArray
class.
I'm hoping to get in touch with the developer here, as I didn't see a way to contact him directly. This seems like the kind of issue that would have showed up during testing (and the piece of code I changed isn't different between versions), so I'm wondering if the problem is with some dependency instead. This new CNVkit installation is through conda, so maybe that could be a contributing factor.
To go into more detail, the issue was in the by_gene
function. At one point, we are essentially trying to extract certain lines from a pandas dataframe given a list of indices. The problem is that the code as written uses iloc
instead of loc
to specify which rows to extract. The difference is that loc
will return lines whose row name matches the argument, while iloc
returns lines whose actual row number matches the argument. So, the problem arises in that by_gene
works with a list of indices that represent row names, but it is trying to extract lines from the dataframe based on row number. The output that you get out of genemetrics is then left to the few cases where row name and row number are identical.
Again, none of this code seems to have changed between the versions of CNVkit that I have used, so it would seem that this used to work as expected. My colleague has created an issue on the Github page about this as well, so hopefully we can get in contact with the dev and figure out what's going on. I'm happy to provide more details and some files if necessary.
Adam