Question

cfDNA ultra-deep sequencing with UMIs

0

Entering edit mode

3.9 years ago

Iñaki • 0

Dear all,

I am analyzing sequencing data from a capture panel which uses UMIs. I did ultra-deep sequencing to detect variants in a very low VAF (less than 1%). This is an already fragmented DNA coming from plasma. After doing my pipeline using fgbio, I realized that I have multitude of reads with the same start/end and different UMIs, so they are called as different families. This is not too rare to me for the wild-type reads. But for ultra-rare mutations, it is difficult to me to understand that there are 5 reads (fragments) with the same start/end and different UMIs (different in all positions, not just one base). That woud mean that these 5 DNA fragments have been cut exactly in the same positions. Does anyone have an explation for this? Is there anything incorrect in our pipeline?

Best,

Iñaki

sequencing liquid ultra-deep UMIs biopsy • 1.5k views

ADD COMMENT • link updated 3.9 years ago by i.sudbery 22k • written 3.9 years ago by Iñaki • 0

0

Entering edit mode

Or there was an error in library prep perhaps?

ADD REPLY • link 3.9 years ago by GenoMax 154k

score 1 · Answer 1 · 2021-12-28

I would check the sequence of the UMIs. Are they similar to each other? (So your 5 reads might have UMIs ATGATG, ATTATG, ATCATG, CTCATG, ATGCATC, which are all only 1 base different from ATGATG) We wrote UMI-tools because we noticed that in deep sequencing many of the fragments we had mapping to the same coordinates had very similar UMI sequences, and assume these are caused by PCR or sequencing errors.

By default fgbio does not do any error correction of the UMIs it detects, but the documentation suggests that it does implement the adjacency algorithm we proposed with UMI-tools, although I've never tried it. Depending on your downstream applications you might also use UMI-tools itself, however, UMI-tools does not have an equivalent of fgbio's CallMolecularConsensusReads if you are using that, although it maybe possible to format the output of umi_tools group to use as input to CallMolecularConsensusReads.