Calculate allele frequency for structural variant breakpoints
0
0
Entering edit mode
4.9 years ago
nanana ▴ 110

I have a list of high quality structural variant breakpoints that are the product of multiple different SV callers. While some of the approaches I use estimate the allele frequency for called breakpoints, others do not, so in some cases I have high quality variants that lack allele frequency estimates.

Because of this (and **), I would like to be able to calculate allele frequencies for a list of genomic coordinates.

Allele frequencies in this context refer to the fraction of reads supporting an alternate allele. A homozygous variant supported by all reads would have an allele frequency of 1, whereas a heterozygous would be 0.5 (assuming 100% purity of sample).

This should be possible by looking at the alignment file (.bam) and counting the number of reads supporting and disagreeing for each variant call. E.g. AR/RR+AR (alternate reads/referece reads + alternate reads).

I am currently thinking that I will have to write my own tool to do this (probably using PySam, but if anyone can suggest an existing approach, or an easy way of achieving this I would be happy to not re-invent this particular wheel!

** Also because I see a high discrepancy between the allele frequency values given by different approaches (DELLY, LUMPY+SVtyper, Meerkat, novobreak) for the same variant

genome next-gen • 2.4k views
0
Entering edit mode

Can't you do this using samtools mpileup?