Question: Statistical Basis for Calling Mutations
0
gravatar for west.alex
3.8 years ago by
west.alex0
United States
west.alex0 wrote:

I am working on a project where we are sampling short (8-20bp), highly repetitive regions in tissue samples and comparing them to detect mutations (primarily indels).  One challenge is that our reading technique (nextGen sequencing, using molecular tags to correct PCR errors) generates a distribution of alleles rather than one consensus allele for any given sample.  What kinds of mathematical models are commonly used for comparing these sorts of distributions?  (We have a technique that is working well, but think there must be better.)  Beyond this, is there a good methodology for eliminating samples with too few reads to be trustworthy?  (We want to avoid false positives.)

Below is an example.  There are 8 different sources (A-H) and 2 samples from each (1-2).  At least one of the samples from one of the sources has a mutation, and in at least one sample from one of the sources a false mutation is usually found due to low number of reads.

Genotype A-1 A-2 B-1 B-2 C-1 C-2 D-1 D-2 E-1 E-2 F-1 F-2 G-1 G-2 H-1 H-2
ACCCCCCCCCCC 6       11 4         7 8 18 10 7 3
ACCCCCCCCCCCC 21 4     57 34         29 37 14 59 79 19
ACCCCCCCCCCCCC         4             2 3 3 2  
ACCCCCCCCCCTC                           2    
ACCCGCCCCCCCCGCC 25 2 9 12 27 20 70 33 15 4     14 34 36 16
CCCCCCCCCCCC     2 7         18 2 9 12        
CCCCCCCCCCCCC     3 21         11 2 17 23        
CCCCCCCCCCCCCC       2                        

 

Thanks for any advice!

ADD COMMENTlink modified 3.8 years ago • written 3.8 years ago by west.alex0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2283 users visited in the last hour