Entering edit mode

27 days ago

kristina.mahan
▴
130

I UV mutated a haploid algae genome and want to view the variants in the bam file on IGV. What Coverage allele-fraction threshold should I use to look at variants?

Depends on your sequencing coverage and desired False Discovery Rate.

The sequencing coverage is >200x

At each genomic position you have a specific coverage, say X. Y out of X reads may support an alternative variant, X - Y - reference. You need to perform a statistical test (there are tens of types of stat tests, more and less sophisticated) how unusual is to see Y reads given the error rate of your sequencing machine of Z. You can simply use a Binomial test. Then you get a bunch of p-values. You put these p-values into some FDR correction procedure and get your approximate threshold.

This procedure has its drawbacks and is not exactly correct, but may be useful.

Can you point me to some statistical tests?

For example, your probability of error as a substitution is 0.1% (in Illumina machines it is a very small number). You see 5 nucleotides A and 195 nucleotides B at some position. You apply Binomial test and find a p-value ( https://en.wikipedia.org/wiki/Binomial_test )

This is the simplest test, then people model errors with more efficient regression models with various link functions, but you may start with Binomial.