What is Deduplication in the context of Sequence Duplication Levels?
1
0
Entering edit mode
8.5 years ago

I am reading through the documentation of FastQC and when describing the "Sequence Duplication Levels" plots generated by fastqc, they state

... the red plot the sequences are de-duplicated and the proportions shown are the proportions of the deduplicated set which come from different duplication levels in the original data.

I understand that they are binning duplicate transcripts, but I don't understand what "deduplication" is. From my naive guess, it is removing the duplicates, but then the red line should be at 100% at x = 1, but that is clearly not the case.

Some explanation would be of great help. Thanks!

RNA-Seq fastqc • 11k views
ADD COMMENT
1
Entering edit mode
8.5 years ago

The concepts are a little more difficult to untangle

Revisiting the FastQC read duplication report

ADD COMMENT

Login before adding your answer.

Traffic: 2706 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6