Question

Library size normalization during CNV calling from genomically doubled tumor tissue

0

Entering edit mode

5.0 years ago

CY ▴ 750

I realized library size may be an issue and most CNV tools seem to ignore this.

Say, we try to call CNV out of a tumor tissue those genome is almost doubled. If we directly compare the depth of each bin between tumor and normal control and the library size of tumor (fastq size) and normal tissue is equal, the depth of genomically doubled tumor tissue will still have the same depth as the normal tissue. If CNV caller ignore this library size issue, CNV called from genome doubled tissue will be incorrect (estimated diploid baseline is actually double genome). Can anyone share some insight on this? Thanks

CNV • 1.7k views

ADD COMMENT • link updated 5.0 years ago by markus.riester ▴ 550 • written 5.0 years ago by CY ▴ 750

0

Entering edit mode

The library preparation and quantification that determines the library size is normally done based on the amount of DNA, not the number of cells.

ADD REPLY • link 5.0 years ago by igor 13k

0

Entering edit mode

Exactly. If both tumor and normal tissue require same DNA amount during library prep and output roughly same size of fastq, DNA molecule in genomically doubled tumor tissue is "diluted" by requiring the same DNA amount. The genomically doubled region of tumor tissue will have the same depth as in normal tissue and CNV will not be called. Am I right?

ADD REPLY • link 5.0 years ago by CY ▴ 750

0

Entering edit mode

A perfectly doubled genome couldn't be distinguished from normal, but chromosomal instability by definition results in many gains and losses. In simple terms, algorithms designed for this problem see that it cannot be a copy number of 2 when there are extensive losses that would correspond to copy numbers 1, 0, -1, -2.

ADD REPLY • link 5.0 years ago by markus.riester ▴ 550

0

Entering edit mode

See what you mean, but I think what CY means with library size is simply total sequencing read coverage. But maybe I was interpolating too aggressively.

ADD REPLY • link 5.0 years ago by markus.riester ▴ 550

0

Entering edit mode

Exactly, by library size I mean the total sequencing depth or fastq size.

ADD REPLY • link 5.0 years ago by CY ▴ 750

score 0 · Answer 1 · 2019-08-12

0

Entering edit mode

5.0 years ago

markus.riester ▴ 550

Every purity and ploidy aware copy number caller takes this into account. Have a look at the ASCAT or ABSOLUTE paper.

ADD COMMENT • link 5.0 years ago by markus.riester ▴ 550

0

Entering edit mode

Yes, they estimate ploidy, but it is based on allele frequencies, not library size.

ADD REPLY • link 5.0 years ago by igor 13k

0

Entering edit mode

? Allele frequencies are used in these algorithms to eliminate wrong purity and ploidy combinations.

ADD REPLY • link 5.0 years ago by markus.riester ▴ 550

0

Entering edit mode

I just wanted to clarify that library size is not used, which is what the original question was interested in. I was not disagreeing with anything that was stated.

ADD REPLY • link 5.0 years ago by igor 13k