Question: Bam output of deduplication using UMItools
0
gravatar for Ati
9 months ago by
Ati30
Ati30 wrote:

I have paired RNA-seq data with high duplication rate. My reads contain UMI so after aligning with STAR, I run umitools dedup with --paired option. I would expect that the output bam file would have an equal number of read1 and read2 (output of samtools flagstat).

I'm a bit confused with the results as the number of read1 and read2 are equal before using umitools but after that they are different. Could anyone please clarify this to me?

Thank you in advance!

ADD COMMENTlink modified 9 months ago • written 9 months ago by Ati30

You should not be de-duplicating RNAseq data unless you have UMI's. It is not clear if you actually have UMI's in your reads even though you have referred to umitools. You don't use umitools only after aligning with STAR.

ADD REPLYlink modified 9 months ago • written 9 months ago by genomax91k

The question is adjusted. What do you mean? The reads need to be aligned first for the deduplication using UMItools!

ADD REPLYlink written 9 months ago by Ati30

So you did extract the UMI's with umitools before doing the alignments? As to why you have different read1 and read2 numbers that is likely because only one of the read pairs is mapping (see: A: Why number of #read1 and #read2 is different in samtools flagstat output? ).

ADD REPLYlink modified 9 months ago • written 9 months ago by genomax91k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 905 users visited in the last hour