CNV-Seq read count normalisation
4.8 years ago
nanana

I've been using CNV-Seq to detect CNV in a tumour normal pair.

CNV-Seq produces 2 different files:

A .count file, e.g.:

 chromosome      start   end     test    ref
    X       1       1000000 46775   114751
    X       500001  1500000 51545   130859
    X       1000001 2000000 48616   126085
    X       1500001 2500000 49244   126727

And a .cnv file:

"chromosome"    "start" "end"   "test"  "ref"   "position"      "log2"  "p.value"       "cnv"   "cnv.size"      "cnv.log2"      "cnv.p.value"
"X"     1       1000000 46775   114751  5e+05   -0.0481481369630764     8.39828906687997e-11    0       NA      NA      NA
"X"     500001  1500000 51545   130859  1e+06   -0.0975597262049925     4.48810759735315e-38    0       NA      NA      NA
"X"     1000001 2000000 48616   126085  1500000 -0.128344519593103      8.97317524652341e-64    0       NA      NA      NA
"X"     1500001 2500000 49244   126727  2e+06   -0.117155042936424      1.1243550712914e-53     0       NA      NA      NA
"X"     2000001 3000000 45486   130448  2500000 -0.273431318743759      5.73887669762662e-268   0       NA      NA      NA

My understanding was that the read counts in these files (columns test and ref in both files) represented the normalised counts (i.e. correcting for a difference in sequencing depth between tumour/normal bam files).

However, on plotting these read count values this doesn't seem to be the case, as I consistently see a sequencing depth in the normal sample ref ~2x that seen in the tumour sample test, which is consistent with the depth we sequence to. This seems odd, as the .cnv files contains the "final" cnv calls (with associated p-values etc).

Does anyone have any insight into this. I've emailed the corresponding author on the original paper, but it seems that he has moved on.

