Fastqc /PicardMarkDuplicate
1
0
Entering edit mode
22 months ago
Bioinf • 0

Hello, is there someone who can explain me the difference between FASTQC an PICARD MARK DUPLICATE in marking duplicates. I got different duplication rate with the same samples.

fastqc duplicate • 591 views
ADD COMMENT
0
Entering edit mode
22 months ago
GenoMax 141k

See: https://www.bioinformatics.babraham.ac.uk/projects/fastqc/Help/3%20Analysis%20Modules/8%20Duplicate%20Sequences.html

To cut down on the memory requirements for this module only sequences which first appear in the first 100,000 sequences in each file are analysed, but this should be enough to get a good impression for the duplication levels in the whole file. Each sequence is tracked to the end of the file to give a representative count of the overall duplication level. To cut down on the amount of information in the final plot any sequences with more than 10 duplicates are placed into grouped bins to give a clear impression of the overall duplication level without having to show each individual duplication value.

Because the duplication detection requires an exact sequence match over the whole length of the sequence, any reads over 75bp in length are truncated to 50bp for the purposes of this analysis. Even so, longer reads are more likely to contain sequencing errors which will artificially increase the observed diversity and will tend to underrepresent highly duplicated sequences.

So FastQC duplicate detection is not looking at the entire dataset and should only be used for qualitative QC.

Picard is looking at the entire dataset so should be accurate.

ADD COMMENT

Login before adding your answer.

Traffic: 2710 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6