Calling variants with UMI-collapsed duplex consensus reads
19 days ago
blid11


I am trying to call variants with a panel that makes use of UMI adapters. As with Illumina's TSO500 pipeline, UMI tagged fragments of DNA are amplified and I can use a tool such as fgbio's CallDuplexConsensusReads to collapse all reads with a given UMI to a consensus read (to eliminate PCR errors). For example, 5 forward reads and 5 reverse reads can be collapsed into a forward and reverse consensus read, respectively, which can then be collapsed into a duplex consensus read. Ideally, we would at least have 1 forward read and 1 reverse read contribute to the duplex consensus read, which can be specified by setting the min-reads parameter of CallDuplexConsensusReads to 2 1 1 (2 reads total, 1 fwd, 1 rev).

We had some lower quality data, and the supplier of the panel told us to use CallDuplexConsensusReads with min-reads set to 1 1 0, meaning that only one read (either forward or reverse) is required to be declared a consensus read. The result is that most reads are retained and there is a mix of what are basically raw reads and UMI-collapsed duplex consensus reads, although these consensus reads may just be constructed from only a forward or reverse consensus read.

Using CallDuplexConsensusReads with min-reads set to 2 1 1 results in a fraction of the original reads being used: e.g. out of 130 million reads, 16 million are collapsed into ~500k consensus reads with at least 1 fwd, 1 rev read. The mean UMI-collapsed coverage of targets (by sample) is in a range of 50-127, however these would be "true" duplex consensus reads. Also, some targets have relatively good UMI-collapsed coverage (25-800).

In summary, I am wondering if it would be best to call variants using matched tumour-normal samples with raw reads, the "true" collapsed reads (from min-reads set to 2 1 1), or the mix of raw and collapsed reads (from min-reads set to 1 1 0)? The 1 1 0 flag was suggested as an interim solution to the low quality data, however, I feel that calling variants on a mix of raw and collapsed reads is not ideal.

Mutect2 CallDuplexConsensusReads fgbio UMI • 180 views
19 days ago

Its not my primary expertise, but your feeling coincides with my feeling - I would think that using UMI-groups that are build of only a single read is risky if you are dealing with very rare variants. If using the collapsed reads means that you have only a small number of variants that is a limitation of the data.

Of course it depends on what your end goals are, and what the cost of missing variants is vs the cost of false positives. Generally with variants people are more worried about false positives than false negatives, but I guess there might be cases where this isn't true.


