Question

duplicate reads in RNAseq datasets

0

Entering edit mode

3 months ago

Bioinfonext ▴ 480

Hi, I was doing multiqc on raw rnaseq datasets, it is showing higher level of read duplication. Do I need to take any step for this datasets before processing for read quantification. I am using trimmommatic to remove low quality read and adapter sequences but not sure if I need to take any other steps. Multiqc file attached. Many thanks. enter image description here

RNAseq R • 587 views

ADD COMMENT • link updated 3 months ago by dsull ★ 7.8k • written 3 months ago by Bioinfonext ▴ 480

score 1 · Answer 1 · 2025-08-01

1

Entering edit mode

3 months ago

GenoMax 154k

Is this total RNAseq or mRNAseq data? If totalRNAseq, it may be just rRNA.

Some duplication is expected in RNAseq since there will multiple copies of transcripts for some of the genes. In case the amount of starting material was limited, and if the person making the libraries went a bit overboard with PCR cycles to generate enough material, that can lead to PCR duplicates. It would be difficult to identify that issue for certain unless there were UMI's.

You may want to move forward with the analysis and see how things go.

ADD COMMENT • link 3 months ago by GenoMax 154k

0

Entering edit mode

Thanks for your response. This is total RNAseq data.

ADD REPLY • link 3 months ago by Bioinfonext ▴ 480

0

Entering edit mode

Just try aligning the reads and see what happens — and you can check which transcripts are most abundant. You might also consider aligning to rRNA as well, then tossing out those reads, and running QC again.

ADD REPLY • link 3 months ago by dsull ★ 7.8k