fastp json -> multiqc and fastp processed reads ->fastqc -> multiqc show different level of sequence duplication in my RNAseq data
0
0
Entering edit mode
2.6 years ago

My RNAseq PE data has near 60% sequence duplication. I am using fastp to delete duplicate reads. I am doing my rna seq analysis in galaxy. So when I use multiqc along with fastp json output it shows that all duplication is removed. but when I analyze the processed reads by fastqc and check through multiqc same level of sequence duplication is still there. Can anyone help me deal with this. Sequence duplication % in raw reads Sequence duplication % in raw reads

Sequence duplcation level in % when fastp_processedread->fastQC->multiqc Sequence duplcation level in % when fastp_processedread->fastQC->multiqc

Multiqc directly from fastp json reports directly combined by multiqc fastp json reports directly directly combined by multiqc

What is happening? can anyone tell. and can I proceed with fastp processed reads

fastp duplication sequence fastqc • 2.2k views
ADD COMMENT
0
Entering edit mode

I would not put a lot of confidence in "duplication detection" by FastQC. FastQC looks at only first 100K reads in checking this (LINK) so if you are really interested in finding sequence duplication then you will want to use a program like clumpify.sh that works on sequence level (see: Introducing Clumpify: Create 30% Smaller, Faster Gzipped Fastq Files. And remove duplicates. )

can I proceed with fastp processed reads

Yes. In some experiments you expect some duplication to be present because that is normal (e.g. RNAseq where there can be multiple copies of transcripts).

ADD REPLY

Login before adding your answer.

Traffic: 3238 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6