I am building small RNA libraries designed like this:
[5' adapter]-[6N umi] --- [insert] --- [6N umi]-[3' adapter]
I collect paired end data but map and analyze reads 1 and 2 separately. I'd like to combine the 6N umis from each read into a single 12N sequence and use that to deduplicate.
For example, can I start with this:
@ Read1_raw
nnnnnnTCCAGGCCCTGCCTCTCATGGGCCCAAGACCCAGTGTGGGGTAGAATGGC
@ Read2_raw
NNNNNNGTAGGCAAGGTGGGAATATAGAAGTACAAGTACAGAGAAGGGGACCTGCCC
and get something like this after umi extraction:
@ Read1_processed / nnnnnnNNNNNN
TCCAGGCCCTGCCTCTCATGGGCCCAAGACCCAGTGTGGGGTAGAATGGC
@ Read2_processed / nnnnnnNNNNNN
GTAGGCAAGGTGGGAATATAGAAGTACAAGTACAGAGAAGGGGACCTGCCC
I've previously used UMI-tools
to extract UMIs from one read and transfer them to the other and I know it's able to handle split UMIs by transferring each UMI to either R1 or R2 but I'm having trouble figuring out if it can merge split UMIs and apply them to both reads. Is there a way to do this with UMI-tools
? If not, any suggestions on how to approach this?
Thanks so much, that does just what I wanted. Much appreciated.