Question

FastQC Per sequence GC count always fails after trimmomatic?

0

Entering edit mode

4.4 years ago

DNAngel ▴ 250

HI all,

Just a quick question on theory I suppose. After running trimmomatic on my fastq files with the following, generally used parameters such as:

ILLUMINACLIP:adapters:TruSeq2-PE.fa:2:30:10 
SLIDINGWINDOW:4:20
HEADCROP:5
MINLEN:36

And then running FastQC on the trimmed, paired dataset - most quality checks seem to be okay except the Per sequence GC content goes from being okay (untrimmed dataset) to failing it. Now, I think it is because after trimming poor quality bases and such I do get a variety of sequence lengths from 36-146 and I believe this is driving the sudden increase in GC content in especially short sequences.

My question here is, is this correct? Is this why it would fail GC content and I can safely ignore this warning? It makes sense to me that shorter sequences would suddenly have high %GC content and this is causing this quality check to fail but I don't know if this is actually something we expect.

Thank you!

EDIT:

Image to show the FastQC results for two samples - it looks like my trimming does not affect GC content at all.

https://i.ibb.co/FKDsWQT/fastqc-results.jpg

enter image description here EDIT2:

I should mention this is whole genome sequence data not RNAseq data.

fastqc trimmomatic • 2.2k views

ADD COMMENT • link updated 4.4 years ago by GenoMax 152k • written 4.4 years ago by DNAngel ▴ 250

0

Entering edit mode

Is your organism expected to have a GC rich genome? There is no change essentially after the trimming.

ADD REPLY • link 4.4 years ago by GenoMax 152k

0

Entering edit mode

I am not sure actually - first time working with Hymenoptera species. Some quick research says there are GC-rich domains but I think most eukaryotes would have some regions of the genome be more GC rich than others for different purposes.

ADD REPLY • link 4.4 years ago by DNAngel ▴ 250

0

Entering edit mode

If you don't have a concrete reason to think that this observation is problematic then you can move ahead with rest of your analysis.