Question: Demultiplexing and deduplicating using barcodes and UMIs in the "mate" read
1
gravatar for lech.kaczmarczyk
10 months ago by
lech.kaczmarczyk50 wrote:

Hi All,

I'd like to demultiplex and deduplicate reads using in-line barcodes from read2 and UMIs from the read1 and read2. The format is like this (read2 is a short read, containing only barcode and UMI).

1: UMI-primer-READ 2: barcode-UMI

UMI-tools seems to be suitable, but I failed to find how to sort based on UMI combinations. Maybe you know of a tool that can handle both UMIs and barcodes, and can sort reads based on combinations of UMIs from both PE reads?

Any help with where to start with that highly appreciated...

Cheers, Lech

ADD COMMENTlink modified 10 months ago by i.sudbery7.3k • written 10 months ago by lech.kaczmarczyk50

Could you post a few lines as example? Do you have an index list?

ADD REPLYlink written 10 months ago by Gabriel R.2.7k

Barcodes embedded in the gene specific RT primers containing partial illumina adapters, below is the example (UMI and barcodes in bold). I don't have the reads yet as I'd like to have the strategy thought-through before starting. The experiment is aimed at assessing T->C conversions (SLAMseq) within the amplicon.

Reverse (RT) primers:

5'GTTCAGACGTGTGCTCTTCCGATCTNNNNNTTTCTCCTGCTTGCTGATCCACATCTGCTG 5'GTTCAGACGTGTGCTCTTCCGATCTNNNNNATTCTCCTGCTTGCTGATCCACATCTGCTG 5'GTTCAGACGTGTGCTCTTCCGATCTNNNNNAGTCTCCTGCTTGCTGATCCACATCTGCTG

Forward primer:

5'CACGACGCTCTTCCGATCTNNNNNNGACGTGGACATCCGCAAAGACC

ADD REPLYlink modified 10 months ago • written 10 months ago by lech.kaczmarczyk50

But that is assuming the adapters were sequenced. Were there sequencing cycles to sequence the indices?

ADD REPLYlink written 10 months ago by Gabriel R.2.7k

My plan is to sequence only UMI and index using read2 (Reverse), so 8-9 cycles. Is there a problem with that? If this is for some reason (cost) inefficient approach, please let me know.

ADD REPLYlink written 10 months ago by lech.kaczmarczyk50

I don't get it. You will simply prime next to read2 and hope to run into the barcode+UMI?

ADD REPLYlink written 10 months ago by Gabriel R.2.7k

Sorry, maybe I was not clear (or it is me, who simply don't get it):

My barcode+UMI will be located just upstream of the following primer:

Multiplexing Read 2 Sequencing Primer 5' GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT

In such a case,. I get the barcode+UMI sequence in the first 8 cycles. Isn't that right?

ADD REPLYlink written 10 months ago by lech.kaczmarczyk50

if you have dedicate cycles, it's ok, I was wondering about the output format.

ADD REPLYlink written 10 months ago by Gabriel R.2.7k
0
gravatar for i.sudbery
10 months ago by
i.sudbery7.3k
Sheffield, UK
i.sudbery7.3k wrote:

Unfortunately demultiplexing is not something that UMI-tools does. Are you sure you need to demultiplex? For applications where you just need to quantify the number of reads per gene per cell (like most single-cell RNA-seq experiments), you can go directly to the per-cell quantification without first demultiplexing. (you would do this with umi_tools whitelist -> umi_tools extract -> read-mapping -> umi_tools count.

If you do want a per barcode BAM file then you need to process the files in two steps. First remove the UMI using umi_tools extract in paried-end mode, passing read2 to its standard in and read1 to its --read2-in. Then demultiplex using a dedicated demultiplexer. I use reaper from the Enight lab. Then align the read1s, and finally run umi_tools dedup. This, funnily enough, is the workflow that umi_tools was first designed to work for.

ADD COMMENTlink written 10 months ago by i.sudbery7.3k

Thanks a lot, this is very informative. As this is not scRNAseq, and the barcodes represent individual samples, separate BAM per barcode would be ideal. I was also thinking about using FastX barcode splitter, after combining the pairs into one read after trimming, etc. Will try your solution!

ADD REPLYlink modified 10 months ago • written 10 months ago by lech.kaczmarczyk50
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1921 users visited in the last hour