Question: kmer content changes after trimming and removing adapters from reads
gravatar for serpalma.v
2.3 years ago by
serpalma.v40 wrote:

Dear community

I have a large set of FASTQ files from genomic DNA. I ran them through FastQC and found that the modules "overrepresented sequences" and "Kmer content" failed. The rest of the modules did not fail, except a warning in "Per tile sequence". Such pattern was present in almost all FASTQ files (>1000 files).

The "overrepresented sequences" module pointed out the presence of TruSeq adapters and Illumina PCR Primer 1.

I ran them through Trimmomatic to remove adapters. The module "overrepresented sequences" was fixed, but "Kmer content" failed again, only this time the pattern was different. Moreover, I get a new warning for the "Per sequence GC content" module (please see linked figure).

I have read that this pattern in "Kmer content" before trimming (kmers found at the beginning of the reads) could be due to fragmentation bias.

I worked with the adapter file provided by Trimmomatic (TruSeq3-PE-2.fa)

This are the flags I used for trimmomatic:

java -jar trimmomatic-0.38.jar PE -phred33 ILLUMINACLIP:TruSeq3-PE-2.fa:2:30:10 LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:36

I have two questions:

  • Is the "kmer content" and "Per sequence GC content" profiles after trimming something to worry about?

  • What could be a possible reason for the change in "kmer content" after trimming?

Here you can find the FastQC reports before and after running Trimmomatic:

And here is a comparison of "kmer content" and "Per sequence GC content" before and after trimming:

Thank you very much in advance

fastqc dnaseq trimmomatic • 1.3k views
ADD COMMENTlink written 2.3 years ago by serpalma.v40

Failing k-mer content and GC content in FastQC generally has no immediate adverse effect on your analysis. You should proceed with further analysis and see what you get. In latest FastQC k-mer analysis tool has been turned off by default since it causes more heartaches than necessary.

ADD REPLYlink modified 2.3 years ago • written 2.3 years ago by genomax91k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1519 users visited in the last hour