I would like to announce our recently published software tool called MAGERI that is designed to facilitate the detection of ultra-rare variants from various kinds of high-throughput sequencing datasets prepared using the molecular barcoding technology.
The ability to detect ultra-rare variants having ~0.1% frequency in the sample is one of the key objectives for successful circulating tumor DNA screening, studying rare tumor subpopulations and rare drug resistant variants in viral populations.
However, the sequencing error rate is far beyond the limit required for accurate rare variant calling even for sequencing datasets of top-tier quality. Recent development of the molecular barcoding technology allows eliminating sequencing errors by tagging each input molecule with an unique molecular identifier (UMI) [Marx. Nature Methods 2017]. UMI-tagged read groups can be then assembled into consensuses correcting sequencing errors. Still, residual PCR errors introduced at first PCR cycles and during UMI tag attachment can decrease the accuracy of variant calling. Moreover, (to the best of my knowledge) so far there is no dedicated variant caller that can model error rates in UMI-tagged read group consensus sequences. MAGERI software aims to solve this problem by implementing a consensus assembly, alignment and variant calling pipeline optimized for the UMI-tagged data [Shugay et al. Plos Comp Biol 2017].
Note that the datasets containing rare variants with known frequency and a control dataset from healthy donor plasma DNA are publicly available at SRA; see this repository for metadata and analysis scripts/templates. We hope that these benchmark datasets will be of use to the community, especially for the researchers developing software tools for UMI-tagged data processing and rare variant calling software.