Question: PureCN mappingbiasfile and minimum number of NormalDB samples question
1
gravatar for cg_ref_database
11 days ago by
cg_ref_database20 wrote:

I am using the R-package PureCN to process predominantly tumor-only samples, but I do have 4 tumor-normal pairs.

I understand that one processed matched normal sample (ideally derived from young and healthy individuals, sample processed using the same methods and probe kit as the tumor samples, same sequencing machine, etc.) is better than not having a technically matched normal sample. I have 4 technically matched normal samples and I was able to create a NormalDB, but does anyone know if 4 is sufficient?

I have run the "minimal test run" (Section 4.3 of PureCN Best Practices 19 July 2020) and PureCN.R did produce a number of files:

02_amplification_pvalues.csv
02_chromosomes.pdf
02.csv
02_dnacopy.seg
02_genes.csv
02_local_optima.pdf
02.log
02_loh.csv
02.pdf
02.rds
02_segmentation.pdf
02_variants.csv

So I believe that the minimal test ran as expected. However, to proceed with the "Production pipeline run," the code in section 4.3 indicates that the run is to be executed with a mappingbiasfile.

Assuming that 4 samples are sufficient to create a NormalDB, then I have a question about the generation of the mappingbiasfile that's created from the NormalDB file. Specifically, can 4 samples can be used to create the mappingbiasfile? If so, do any values in the calculateMappingBiasVcf function of PureCN be used as is, or do some of the values have to be changed?

calculateMappingBiasVcf <- function(normal.panel.vcf.file, min.normals = 2,
                                min.normals.betafit = 7,
                                min.median.coverage.betafit = 5,
                                yieldSize = 5000, genome)
purecn question • 87 views
ADD COMMENTlink modified 10 days ago by markus.riester500 • written 11 days ago by cg_ref_database20
0
gravatar for markus.riester
10 days ago by
markus.riester500 wrote:

4 normals should work for coverage normalization and is certainly better than nothing for mapping bias estimation. I usually recommend 10 normals from ideally 2-3 different batches, but got good results with less than that.

For mapping bias, 7 is the minimum number to fit beta binomial distributions to the allelic fractions of heterozygous SNPs. So with fewer, it is currently not estimating the over dispersion. This over dispersion is currently not consistently used, so the impact is still minimal. I would not change the defaults. The upcoming version will be a little bit smarter in borrowing information from neighboring SNPs, stay tuned.

If you are using off-target reads and have ultra deep sequencing data, I would recommend using the current GitHub version.

ADD COMMENTlink written 10 days ago by markus.riester500
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1654 users visited in the last hour