Question

PureCN mappingbiasfile and minimum number of NormalDB samples question

1

Entering edit mode

3.7 years ago

cg_ref_database ▴ 20

I am using the R-package PureCN to process predominantly tumor-only samples, but I do have 4 tumor-normal pairs.

I understand that one processed matched normal sample (ideally derived from young and healthy individuals, sample processed using the same methods and probe kit as the tumor samples, same sequencing machine, etc.) is better than not having a technically matched normal sample. I have 4 technically matched normal samples and I was able to create a NormalDB, but does anyone know if 4 is sufficient?

I have run the "minimal test run" (Section 4.3 of PureCN Best Practices 19 July 2020) and PureCN.R did produce a number of files:

02_amplification_pvalues.csv
02_chromosomes.pdf
02.csv
02_dnacopy.seg
02_genes.csv
02_local_optima.pdf
02.log
02_loh.csv
02.pdf
02.rds
02_segmentation.pdf
02_variants.csv

So I believe that the minimal test ran as expected. However, to proceed with the "Production pipeline run," the code in section 4.3 indicates that the run is to be executed with a mappingbiasfile.

Assuming that 4 samples are sufficient to create a NormalDB, then I have a question about the generation of the mappingbiasfile that's created from the NormalDB file. Specifically, can 4 samples can be used to create the mappingbiasfile? If so, do any values in the calculateMappingBiasVcf function of PureCN be used as is, or do some of the values have to be changed?

calculateMappingBiasVcf <- function(normal.panel.vcf.file, min.normals = 2,
                                min.normals.betafit = 7,
                                min.median.coverage.betafit = 5,
                                yieldSize = 5000, genome)

PureCN question • 1.1k views

ADD COMMENT • link updated 3.7 years ago by markus.riester ▴ 550 • written 3.7 years ago by cg_ref_database ▴ 20

0

Entering edit mode

Hi, i have some question. it is not reply for your question, sorry.

what did you use vcf caller?

as PureCN Best Practices, there are some caller Mutect, varscan, FreeBayes.

from what i have searched about tumor-only caller, Mutect(Not Mutect2) can't take tumor-only.

if you let me know about that information, it could be very helpful.

ADD REPLY • link 3.4 years ago by jeongmeani • 0

0

Entering edit mode

Mutect works fine without normals. You need to do extensive filtering though. I think that’s why they don’t recommend it.

ADD REPLY • link 3.4 years ago by markus.riester ▴ 550

score 0 · Answer 1 · 2020-08-01

4 normals should work for coverage normalization and is certainly better than nothing for mapping bias estimation. I usually recommend 10 normals from ideally 2-3 different batches, but got good results with less than that.

For mapping bias, 7 is the minimum number to fit beta binomial distributions to the allelic fractions of heterozygous SNPs. So with fewer, it is currently not estimating the over dispersion. This over dispersion is currently not consistently used, so the impact is still minimal. I would not change the defaults. The upcoming version will be a little bit smarter in borrowing information from neighboring SNPs, stay tuned.

If you are using off-target reads and have ultra deep sequencing data, I would recommend using the current GitHub version.