Question

CNVkit with same data for control and tumor sample

0

Entering edit mode

2.4 years ago

Jan • 0

Hi, I am optimizing pipeline for CNV analysis of WES data. I was getting quite strange output out of CNVkit, so I tried to run the same data as both, tumor (sample) and control (from which the reference is build). To my surprise the output contained quite a variable bins and segments! enter image description here

Reading from CNVkit docs, I supposed that the Fix step should account for the differences between log2 ratios between the sample and the control (hence produce zeros everywhere)

[The corresponding “expected” normalized log2 read-depth values from the reference are then subtracted for each set of bins.][2]

Head -n 1 for sample.targetcoverage.cnn:

chromosome start end gene depth log2

chr1 65509 65629 ensembl_gene_id=ENSG00000186092;gene_symbol=OR4F5 32.175 5.00787

Head -n 1 for reference.cnn (build from same bam as sample):

chromosome start end gene log2 depth gc rmask spread

chr1 65509 65629 ensembl_gene_id=ENSG00000186092;gene_symbol=OR4F5 -0.144045 32.175 0.333333 0.213561

Head -n 1 for resulting sample.cnr

chromosome start end gene log2 depth weight

chr1 65509 65629 ensembl_gene_id=ENSG00000186092;gene_symbol=OR4F5 -0.243448 32.175 0.953424

I noticed that the reference.cnn is one column short - the number of colnames does not match the actual colvalues..

Any insights would be welcomed!

CNV CNVkit • 566 views

ADD COMMENT • link 2.4 years ago by Jan • 0