4 months ago
I have never used UMIs before and have been asked to analyse data with them. I have been trying to read how to extract the UMI and then use them to dedup reads after alignment.

Can someone clarify that if the UMI is the last 8 bases of read 2 (of paired end data) what the command should be? I have found UMI tools but I'm struggling with the documentation. I found this post Transferring UMI from paired-end read 2 to header of read 1 where the person only has UMI sequence in read 2 (no other sequence). I found it confusing as --read2-in is actually read 1 ( is this not changing the order of reads?). Also with this is UMI tools looking at the FIRST12 bases? Is there a way of looking at the LAST n bases of read2??

umi_tools extract -I read2.fastq.gz -S read2_processed.fastq.gz --read2-in=read1.fastq.gz --read2-out=read1_processed.fastq.gz --bc-pattern=NNNNNNNNNNNN
