Hello everyone!
I have read the help document from FastQC group, but there is not enough detailed information.
Here is my understanding of this duplicate sequence plot from FastQC:
From the title "Percent of seqs remaining if deduplicated 14.11%", it means if I do some deduplication process on my data, I will only get 14.11%? Which means the duplication level is very high?
From the red line I can say about 60% of the deduplicated sequences are at the duplication level of "1", about 25% of the deduplicated sequences are at the duplication level of ">10"?
From the blue line I can say about %10 of the total sequences are at the duplication level of "1" and about 65% of the total sequences are at the duplication level of ">10"?
Is this interpretion right?
Can I say the libraries can contain technical duplication according to this plot? What else analysis should I do to exclude this judgement?
Background can be found here
Thank you very much in advance!