I have complete sequences from 10x Chromium that look like the following:
|----my 16 bp cell barcode----|---8bp 10x UMI---|--SO--|-------------------cDNA (100 bp)----------------------| SO = switch oligo
The reads are demultiplexed and don't have the Illumina barcodes in the sequence. I want to map the cDNA (sequence of interest), however I want to preserve the data provided by my barcode, 10x UMI, and switch oligo (about 40 bp). This is so I can identify duplicates and which cell each read originated from.
I can found trimming softwares (cutadapt and trimmomatic) that can delete the the 40bp. However, because I want to retain this information, I do not want to simply trim the sequences. On the other hand, keeping the sequences have lead to 70-90% not mapping because of the 40bp of non-genetic information.
I have thought about 'cutting and pasting' the information into the header, but am not aware of a program that already does this.
How to I align these reads while preserving the data provided by the 40bp barcodes?