FastQC Per sequence GC count always fails after trimmomatic?
22 months ago
DNAngel

HI all,

Just a quick question on theory I suppose. After running trimmomatic on my fastq files with the following, generally used parameters such as:

ILLUMINACLIP:adapters:TruSeq2-PE.fa:2:30:10
SLIDINGWINDOW:4:20
MINLEN:36


And then running FastQC on the trimmed, paired dataset - most quality checks seem to be okay except the Per sequence GC content goes from being okay (untrimmed dataset) to failing it. Now, I think it is because after trimming poor quality bases and such I do get a variety of sequence lengths from 36-146 and I believe this is driving the sudden increase in GC content in especially short sequences.

My question here is, is this correct? Is this why it would fail GC content and I can safely ignore this warning? It makes sense to me that shorter sequences would suddenly have high %GC content and this is causing this quality check to fail but I don't know if this is actually something we expect.

Thank you!

EDIT:

Image to show the FastQC results for two samples - it looks like my trimming does not affect GC content at all.

https://i.ibb.co/FKDsWQT/fastqc-results.jpg

EDIT2:

I should mention this is whole genome sequence data not RNAseq data.

Is your organism expected to have a GC rich genome? There is no change essentially after the trimming.

I am not sure actually - first time working with Hymenoptera species. Some quick research says there are GC-rich domains but I think most eukaryotes would have some regions of the genome be more GC rich than others for different purposes.

If you don't have a concrete reason to think that this observation is problematic then you can move ahead with rest of your analysis.

