Question: Duplication rate differs up to 30% from that of Fastqc for single end reads
0
gravatar for Matthias
4 days ago by
Matthias20
Germany
Matthias20 wrote:

I found a quite big difference in the duplication between Fastp and Fastqc. For all my ~40 SE RNAseq samples, the rate is around 10-30% lower in Fastp compared to Fastqc. Is there an explanation for this and which one should trust more for RNAseq data?

In this scatterplot, the duplication rates for both tools were calculated based on the raw (i.e. untrimmed) reads. enter image description here

I posted this originally on Github.

fastqc rna-seq fastp • 62 views
ADD COMMENTlink modified 4 days ago • written 4 days ago by Matthias20

You would expect to see duplication in RNAseq of any kind. This is because there are multiple copies of RNA from many genes in your samples. Why are you concerned about this?

ADD REPLYlink written 4 days ago by genomax91k

Did you even read my question? ;-) It's not about the duplication rate in general, but the huge difference between Fastqc and Fastp.

ADD REPLYlink modified 3 days ago • written 3 days ago by Matthias20

Fastqc does not look at your entire dataset (I don't know about fastp) when it checks read duplication. It only uses sequences which first appear in the first 100,000 sequences in each file for this module. While this is generally representative of data it looks like they may not be in your case (especially if fastp is looking at either entire data or some other subset of the data). Most of the QC results for FastQC are representative of the overall data and generally work well.

If you truly want to identify sequence duplicates in your data I would recommend using clumpify.sh from BBMap suite. A: Introducing Clumpify: Create 30% Smaller, Faster Gzipped Fastq Files Use the option addcounts=t to get sequence duplication counts for each sequence type.

ADD REPLYlink written 3 days ago by genomax91k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1745 users visited in the last hour