Dealing with UMIs

0

Entering edit mode

8 months ago

Ramon • 0

Hi everyone,

I'm looking for some advice regarding UMI processing.

Currently, I'm working on establishing the process of panel data on a previous construct pipeline. I've got access to the raw sequencing data (FASTQ), and the collapsed BAM file from the DRAGEN pipeline.

In your experience, what are the best methods for working from the FASTQ file with molecular barcodes (Illumina UMIs)? How do you collapse the molecular barcode families? Are there any well-documented tools for it?

Do you follow your workflow without making any changes when starting from the collapsed BAM file? Or do you remove UMIs from the read using UMI tools and then continue the workflow?

Any tips are appreciated!

UMI • 1.1k views

ADD COMMENT • link 8 months ago by Ramon • 0

1

Entering edit mode

UMI-tools is an established package for dealing with UMI. You will also find a detailed usage guide linked here: https://github.com/CGATOxford/UMI-tools

If your data already contains collapsed UMI's then that may limit usage of umi-tools.

ADD REPLY • link 8 months ago by GenoMax 154k

0

Entering edit mode

My main issue with UMI-tools is that I understand that the dedup and group functions don't work for FASTQ, but BAM files. I will look at their documentation again.

Thanks!

ADD REPLY • link 8 months ago by Ramon • 0

0

Entering edit mode

In that case you can try fastp instead: Use fastp to preprocess FASTQ data with unique molecular identifer (UMI) integrated

There is some advice from @Ian.Sudbery here: De-duplicate UMI at FASTQ level

ADD REPLY • link 8 months ago by GenoMax 154k

0

Entering edit mode

Cool, I didn't know this tool! I'll delve into it!

Thanks!

ADD REPLY • link 8 months ago by Ramon • 0

0

Entering edit mode

Is there any particular reason you want to work with the fastq rather than aligning reads and then collapsing them?

ADD REPLY • link 8 months ago by i.sudbery 22k

0

Entering edit mode

Not really! I want to provide the user of my pipeline with the flexibility to start with either BAM or FASTQ.

However, I believe that starting from the collapsed reads BAM file, UMIs shouldn't be an issue anymore. Am I right?

ADD REPLY • link 8 months ago by Ramon • 0

Login before adding your answer.