Is cnvkit coverage output split into non-overlapping segments?
5.4 years ago
fbrundu ▴ 330

I am trying to compute breadth of coverage for exome data using cnvkit coverage command.

The output is in the form

chromosome  start   end gene    depth   log2
1   12098   12258   LOC102725121,DDX11L1    396.431 8.63093
1   12553   12721   LOC102725121,DDX11L1    402.667 8.65344
1   13331   13701   LOC102725121,DDX11L1    551.632 9.10756

reporting the coverage depth for each segment defined in first, second and third field. At a first look it seems that the segments are non-overlapping (given the cnvkit pipeline I would say that it is definitely so), but I am unsure, because in the documentation I didn't find a detailed description the output of the coverage command.

If so, it would speed the computation of coverage breadth. Are the segments non-overlapping?

5.4 years ago
fbrundu ▴ 330

There are no overlaps (at least not in my results), you can check yours with:

for file in $( find . -name "*.sort.targetcoverage.cnn" )
    tail +2 $file | sort -k1,1 -k2n,2 | \
        awk 'BEGIN{ chr=0 } { if($1==chr) { if($2<last) {print "Problem\t" $0 "\nLast\t" last_rec; } } chr=$1; last=$3; last_rec=$0; }'

If it gives you no output, no segment head overlaps another segment's tail.

5.4 years ago
Eric T. ★ 2.8k

Yes, the outputs should be non-overlapping if you follow the standard pipeline (e.g. "quick start" guide in the docs). Any overlapping targets are normally merged and re-subdivided as necessary in the target command, which is also automatically run as a step within the batch command.

The coverage and segment commands do not strictly require that the bins and segments be non-overlapping, and if you skip target in a manually constructed pipeline and directly give the coverage command a BED of overlapping target intervals, it's possible to get overlapping segments. But following the standard pipeline, you will not see overlaps.


