Duplication and deduplication in FASTQC report
0
0
Entering edit mode
2.4 years ago
tea.vuki ▴ 10

Hello,

I have read a lot of instructions on analyzing duplication and deduplication including this (Revisiting the FastQC read duplication report) amazing explanation that helped me a lot. However, I still have certain technical questions (that might be very basic and stupid but I am new to this so apologize): what do numbers on X and Y plot even mean? In my report I have 85988702 total sequences with the length of 76 and in Sequence Duplication Levels I got there results: sequence remained after deduplication: 79.57% (now correct me if I am wrong but I assume in simple terms that this means that initially I had 20.43% of the sequences that were duplicats?), and I have a peak with the blue line on >10 on the X axis, does that mean that I have sequences that have between 10 and 50 copies or that I have 10 sequences with duplicates? I hope that I was clear enough. I will post a picture of my results so that you understand what I want to ask. Basically, I need someone to explain me in detalis the meaning of the numbers on the X and Y axis. Thank you in advance!1

fastqc • 954 views
ADD COMMENT
1
Entering edit mode

This should help: https://sequencing.qcfail.com/articles/libraries-can-contain-technical-duplication/

Your intuition/reasoning is correct for all questions you have posed.

I assume in simple terms that this means that initially I had 20.43% of the sequences that were duplicats?

You still have those duplicates in your data (not just initially). If you were to deduplicate the data then you will lose 20% of reads. For RNAseq data dedeuplication is not warranted unless you have an independent means of deciding PCR duplicates (e.g. Unique molecular indexes, UMI).

ADD REPLY
0
Entering edit mode

Thank you so much! I will check the link now.

ADD REPLY

Login before adding your answer.

Traffic: 2734 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6