Question: PureCN mappingbiasfile and minimum number of NormalDB samples question
gravatar for cg_ref_database
11 days ago by
cg_ref_database20 wrote:

I am using the R-package PureCN to process predominantly tumor-only samples, but I do have 4 tumor-normal pairs.

I understand that one processed matched normal sample (ideally derived from young and healthy individuals, sample processed using the same methods and probe kit as the tumor samples, same sequencing machine, etc.) is better than not having a technically matched normal sample. I have 4 technically matched normal samples and I was able to create a NormalDB, but does anyone know if 4 is sufficient?

I have run the "minimal test run" (Section 4.3 of PureCN Best Practices 19 July 2020) and PureCN.R did produce a number of files:


So I believe that the minimal test ran as expected. However, to proceed with the "Production pipeline run," the code in section 4.3 indicates that the run is to be executed with a mappingbiasfile.

Assuming that 4 samples are sufficient to create a NormalDB, then I have a question about the generation of the mappingbiasfile that's created from the NormalDB file. Specifically, can 4 samples can be used to create the mappingbiasfile? If so, do any values in the calculateMappingBiasVcf function of PureCN be used as is, or do some of the values have to be changed?

calculateMappingBiasVcf <- function(normal.panel.vcf.file, min.normals = 2,
                                min.normals.betafit = 7,
                                min.median.coverage.betafit = 5,
                                yieldSize = 5000, genome)
purecn question • 87 views
ADD COMMENTlink modified 10 days ago by markus.riester500 • written 11 days ago by cg_ref_database20
gravatar for markus.riester
10 days ago by
markus.riester500 wrote:

4 normals should work for coverage normalization and is certainly better than nothing for mapping bias estimation. I usually recommend 10 normals from ideally 2-3 different batches, but got good results with less than that.

For mapping bias, 7 is the minimum number to fit beta binomial distributions to the allelic fractions of heterozygous SNPs. So with fewer, it is currently not estimating the over dispersion. This over dispersion is currently not consistently used, so the impact is still minimal. I would not change the defaults. The upcoming version will be a little bit smarter in borrowing information from neighboring SNPs, stay tuned.

If you are using off-target reads and have ultra deep sequencing data, I would recommend using the current GitHub version.

ADD COMMENTlink written 10 days ago by markus.riester500
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1654 users visited in the last hour