Hi,
I have paired-end RRBS data with a 6bp UMI encoded in the middle of the header made by our sequencing core using BCLconvert, looking like this (UMI in bold, index in italics):
@A01685:159:H2YHFDSX7:4:2463:10655:16971:TAGCGC 1:N:0:CGTCTAAC
After aligning (with bismark) and sorting (samtools sort), the header looks like this:
A01685:159:H2YHFDSX7:4:2463:10655:16971:TAGCGC_1:N:0:CGTCTAAC
I would like to deduplicate these reads using UMI-tools dedup software, but in the documentation it is stated that the UMI needs to be encoded at the end of the readname.
I tried running UMI-tools dedup on these reads, but it does not recognize the UMI. How do I specificy the location of the UMI in the header here? I cannot seem to figure it out based on the documentation.
Cheers!