Question

Statistical Basis for Calling Mutations

0

Entering edit mode

9.9 years ago

west.alex • 0

I am working on a project where we are sampling short (8-20bp), highly repetitive regions in tissue samples and comparing them to detect mutations (primarily indels). One challenge is that our reading technique (nextGen sequencing, using molecular tags to correct PCR errors) generates a distribution of alleles rather than one consensus allele for any given sample. What kinds of mathematical models are commonly used for comparing these sorts of distributions? (We have a technique that is working well, but think there must be better.) Beyond this, is there a good methodology for eliminating samples with too few reads to be trustworthy? (We want to avoid false positives.)

Below is an example. There are 8 different sources (A-H) and 2 samples from each (1-2). At least one of the samples from one of the sources has a mutation, and in at least one sample from one of the sources a false mutation is usually found due to low number of reads.

Genotype            A-1    A-2    B-1    B-2    C-1    C-2    D-1    D-2    E-1    E-2    F-1    F-2    G-1    G-2    H-1    H-2
ACCCCCCCCCCC        6                           11     4                                  7      8      18     10     7      3
ACCCCCCCCCCCC       21     4                    57     34                                 29     37     14     59     79     19
ACCCCCCCCCCCCC                                  4                                                2      3      3      2
ACCCCCCCCCCTC                                                                                                  2
ACCCGCCCCCCCCGCC    25     2      9      12     27     20     70     33     15     4                    14     34     36     16
CCCCCCCCCCCC                      2      7                                  18     2      9      12
CCCCCCCCCCCCC                     3      21                                 11     2      17     23
CCCCCCCCCCCCCC                           2

Thanks for any advice!

statistics mutation-calling • 1.7k views

ADD COMMENT • link updated 2.8 years ago by Ram 45k • written 9.9 years ago by west.alex • 0