Entering edit mode
8.0 years ago
lien
▴
90
Hi all,
I downloaded a python script that was part of the Supplementary data from the paper 'Genome-wide copy number analysis of single cells'. Nature protocols 2012, Baslan et al. This script computes bin boundaries for the entire human genome and takes 3 different files as input. I have checked these files, and they look ok. The output file I get only lists boundaries for (part of) chromosome 1 and not for all chromosomes. I've looked at the script, but cannot seem to find where it is going wrong. I've added a link to the script with the Gist below:
Any thoughts will be appreciated, Thanks.
Does
hg19.goodzones.bwa.k50.bed
contain more than the first chromosome and is it sorted to match the order ofhg19.chrom.sizes.txt
?hg19.goodzones.bwa.k50.bed contains all the chromosomes of the genome (1-22, X and Y).
I have sorted the 3 files that are used as input:
The python script seems to start out correct, but then at one point it looks like it just stops writing to the output file. Available free disk space is not an issue, as there is more than 2TB still available.
You could try adding a few
print("1")
sorts of statements inside each instance of:That'd allow you to at least see where it's breaking its loop over regions.
I see you already added statements like
print chromarray
, what's the output of those?I get a lot (!) of the errors below:
So I guess there must be something wrong with the input files I'm using, but I have checked these already and they look okay.