I was playing around with test data of MISEG (for the purpose of analyzing TCR data with UMI). The barcode.txt records the adaptor sequence + UMI (marked as N). The adapter sequences is either lower or upper cased indicating fuzzy or seed search according to the manual.
I am curious about the portion of sequence before UMI.
As far as I understand, the sequencing before UMI should be i7 index (library index). So all the sequences (around 20 bases) before UMI are i7 index? Should not the library index already be removed during demultiplex?
Why fuzzy or seed search? What is the intuitive explanation for this and what are the sequences corresponding to these two part?