How to combine UMIs split across reads 1 and 2 into a single UMI for deduplication?
1
0
Entering edit mode
4.3 years ago
SethG • 0

I am building small RNA libraries designed like this:

[5' adapter]-[6N umi] --- [insert] --- [6N umi]-[3' adapter]

I collect paired end data but map and analyze reads 1 and 2 separately. I'd like to combine the 6N umis from each read into a single 12N sequence and use that to deduplicate.

For example, can I start with this:

@ Read1_raw
nnnnnnTCCAGGCCCTGCCTCTCATGGGCCCAAGACCCAGTGTGGGGTAGAATGGC

@ Read2_raw
NNNNNNGTAGGCAAGGTGGGAATATAGAAGTACAAGTACAGAGAAGGGGACCTGCCC

and get something like this after umi extraction:

@ Read1_processed / nnnnnnNNNNNN
TCCAGGCCCTGCCTCTCATGGGCCCAAGACCCAGTGTGGGGTAGAATGGC

@ Read2_processed / nnnnnnNNNNNN
GTAGGCAAGGTGGGAATATAGAAGTACAAGTACAGAGAAGGGGACCTGCCC

I've previously used UMI-tools to extract UMIs from one read and transfer them to the other and I know it's able to handle split UMIs by transferring each UMI to either R1 or R2 but I'm having trouble figuring out if it can merge split UMIs and apply them to both reads. Is there a way to do this with UMI-tools? If not, any suggestions on how to approach this?

umitools umi • 1.4k views
ADD COMMENT
0
Entering edit mode
4.3 years ago

Yep, UMI-tools can do this. Simply provide a bc-pattern and a bc-pattern2 and provide your read2 input and output files to read2-in and read2-out:

umi_tools extract --stdin=read1.fq.gz \
                  --read2-in=read2.fq.gz \
                  --bc-pattern=NNNNNN \
                  --bc-pattern2=NNNNNN \
                  --stdout=read1.processed.fq \
                  --read2-out=read2.processed.fq
ADD COMMENT
0
Entering edit mode

Thanks so much, that does just what I wanted. Much appreciated.

ADD REPLY

Login before adding your answer.

Traffic: 4577 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6