Tool: umitools - working with UMI incorporated data
gravatar for Joe Brown
5.3 years ago by
Joe Brown70
Bigelow Lab
Joe Brown70 wrote:


umitools facilitates the processing of data that has incorporated a unique molecular identifier (UMI). It assumes the UMI is incorporated as part of the read.

Using the IUPAC sequence design of the UMI, strip the sequence from the 5' end of the fastq:

umitools trim --end 5 unprocessed_fastq.gz NNNNNV > out.fq

The UMI sequence for reads are appended onto the read name and processed again after the reads are mapped. Duplicate UMIs at any given start site need to be removed:

umitools rmdup unprocessed.bam out.bam > before_after.bed



I've updated this to account for mismatches among a given UMI sequence set at a start site. This allows the user to essentially merge very similar UMIs into fewer representative sequences.

umitools rmdup --mismatches 1 unprocesed.bam out.bam > before_after.bed

sequencing tool umi • 3.7k views
ADD COMMENTlink modified 3.9 years ago • written 5.3 years ago by Joe Brown70

Dose umitools adapt to paired-end data(PE is popular in NGS analysis)?

ADD REPLYlink written 5.0 years ago by xfliwz50

PE is popular? What are you trying to do? What's your UMI incorporation design?

ADD REPLYlink modified 5.0 years ago • written 5.0 years ago by Joe Brown70

Hello, in my PE reads, both 1.fq and 2.fq have UMIs.

1.fq: UMI1=============

      2.fq: UMI2=============

To take advantage of UMIs, I should take two UMIs into consideration.

So, does umitools can solve my problem?

ADD REPLYlink modified 4.7 years ago • written 4.7 years ago by xfliwz50

unexpected problem with this tool: paired-end reads find themselves with different names, which causes BWA-MEM to quit. What aligner do you use downstream of umitools that does not require paired reads to have the same name?

ADD REPLYlink written 4.3 years ago by sowalsky0

I could make this work on PE reads, but it's unclear how I would be counting the UMIs at a given start. Would you want to remove R1s independently of R2s?

If you were interested in sharing data with me I think we can get it worked out. If you've already solved it and made the code available somewhere, I'd love to check it out!

ADD REPLYlink written 3.9 years ago by Joe Brown70
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1856 users visited in the last hour