I realized library size may be an issue and most CNV tools seem to ignore this.
Say, we try to call CNV out of a tumor tissue those genome is almost doubled. If we directly compare the depth of each bin between tumor and normal control and the library size of tumor (fastq size) and normal tissue is equal, the depth of genomically doubled tumor tissue will still have the same depth as the normal tissue. If CNV caller ignore this library size issue, CNV called from genome doubled tissue will be incorrect (estimated diploid baseline is actually double genome). Can anyone share some insight on this? Thanks
The library preparation and quantification that determines the library size is normally done based on the amount of DNA, not the number of cells.
Exactly. If both tumor and normal tissue require same DNA amount during library prep and output roughly same size of fastq, DNA molecule in genomically doubled tumor tissue is "diluted" by requiring the same DNA amount. The genomically doubled region of tumor tissue will have the same depth as in normal tissue and CNV will not be called. Am I right?
A perfectly doubled genome couldn't be distinguished from normal, but chromosomal instability by definition results in many gains and losses. In simple terms, algorithms designed for this problem see that it cannot be a copy number of 2 when there are extensive losses that would correspond to copy numbers 1, 0, -1, -2.
See what you mean, but I think what CY means with library size is simply total sequencing read coverage. But maybe I was interpolating too aggressively.
Exactly, by library size I mean the total sequencing depth or fastq size.