UMI and the reason of high duplicate rate
1
0
Entering edit mode
2.3 years ago
Sara ▴ 220

in our RNA-seq data (UMI is used) we have generated we have very high duplicate rate (after removing UMI and duplicate we would have only 10 % of the reads). can you let me know what metrices I can collect to investigate the problem?

rna-seq • 1.4k views
1
Entering edit mode
• Can I ask why UMIs were used? Usually UMIs are used in situations where a large duplication rate is expected.

• What do you mean by "removing UMIs". Normally one wouldn't remove UMIs, you would move then to the read name or BAM UMI tag

• What tool and what command have you used for deduplication?

• Are you sure you are doing the deduplication paired-end and not single end?

If this is straight forward traditional RNA-seq, then the normal reason for high UMI duplication would be too little RNA going into the library prep process.

0
Entering edit mode

after removing UMI

Is the duplication truly at UMI level? Sounds like there may be an experimental issue (over amplification?) with the samples, if true.

0
Entering edit mode

yes at the UMI level we have high duplication rate.

1
Entering edit mode

Please don't add answers unless you're answering the principal question. Use Add Comment or Add Reply instead.

1
Entering edit mode

In RNAseq you expect there to be duplication at the read level since there can be many copies of RNA present in sample. What is worrisome is duplication with UMI.

after removing UMI and duplicate we would have only 10 % of the reads

How much of the duplication is from UMI and how much from reads? Perhaps the duplication contribution is small from UMI side. Data would be fine to use then.

2
Entering edit mode
2.3 years ago

If you have high UMI duplication, it means the lab people did too much PCR; but perhaps they had little choice, if there wasn't enough RNA at different steps. Talk to the people who prepped the sample, find out how many cycles of PCR they did, for starters.