Statistical Basis for Calling Mutations
0
0
Entering edit mode
6.4 years ago
west.alex • 0

I am working on a project where we are sampling short (8-20bp), highly repetitive regions in tissue samples and comparing them to detect mutations (primarily indels).  One challenge is that our reading technique (nextGen sequencing, using molecular tags to correct PCR errors) generates a distribution of alleles rather than one consensus allele for any given sample.  What kinds of mathematical models are commonly used for comparing these sorts of distributions?  (We have a technique that is working well, but think there must be better.)  Beyond this, is there a good methodology for eliminating samples with too few reads to be trustworthy?  (We want to avoid false positives.)

Below is an example.  There are 8 different sources (A-H) and 2 samples from each (1-2).  At least one of the samples from one of the sources has a mutation, and in at least one sample from one of the sources a false mutation is usually found due to low number of reads.

Genotype A-1 A-2 B-1 B-2 C-1 C-2 D-1 D-2 E-1 E-2 F-1 F-2 G-1 G-2 H-1 H-2
ACCCCCCCCCCC 6       11 4         7 8 18 10 7 3
ACCCCCCCCCCCC 21 4     57 34         29 37 14 59 79 19
ACCCCCCCCCCCCC         4             2 3 3 2  
ACCCCCCCCCCTC                           2    
ACCCGCCCCCCCCGCC 25 2 9 12 27 20 70 33 15 4     14 34 36 16
CCCCCCCCCCCC     2 7         18 2 9 12        
CCCCCCCCCCCCC     3 21         11 2 17 23        
CCCCCCCCCCCCCC       2                        

 

Thanks for any advice!

statistics math mutation calling • 1.4k views
ADD COMMENT

Login before adding your answer.

Traffic: 1934 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6