Hello there,
I am having a ot of MiSeq data I am trying to analyze and I figured out using FastQC that I have a lot of fails in the report it generates and I wanted to pick your brain to get sense of what should be done in that case.
As you all know FastQC generates this kind of informations :
[PASS] Basic Statistics [FAIL] Per base sequence quality [PASS] Per sequence quality scores [FAIL] Per base sequence content [FAIL] Per base GC content [WARNING] Per sequence GC content [PASS] Per base N content [WARNING] Sequence Length Distribution [FAIL] Sequence Duplication Levels [WARNING] Overrepresented sequences [FAIL] Kmer Content
The issue here is that I am analyzing targeted sequencing data, so I am expecting to have a lot of duplications, what I don't clearly see is whether or not to take the result of FastQC as correct based on the standard they are publishing on their website (how a good report should look like and how a bad one should look like), so I am expecting the GC content to go crazy with the amount of duplication because of the type of experiment (deep sequencing), now based on the information provided in the example above, do you think fastq post processing like clipping and trimming would correct the reads or is it failing in the level of the MiSeq machine already (experimental contamination ?)
Rad