Question: kmer content changes after trimming and removing adapters from reads
0
gravatar for serpalma.v
17 months ago by
serpalma.v20
Germany
serpalma.v20 wrote:

Dear community

I have a large set of FASTQ files from genomic DNA. I ran them through FastQC and found that the modules "overrepresented sequences" and "Kmer content" failed. The rest of the modules did not fail, except a warning in "Per tile sequence". Such pattern was present in almost all FASTQ files (>1000 files).

The "overrepresented sequences" module pointed out the presence of TruSeq adapters and Illumina PCR Primer 1.

I ran them through Trimmomatic to remove adapters. The module "overrepresented sequences" was fixed, but "Kmer content" failed again, only this time the pattern was different. Moreover, I get a new warning for the "Per sequence GC content" module (please see linked figure).

I have read that this pattern in "Kmer content" before trimming (kmers found at the beginning of the reads) could be due to fragmentation bias.

I worked with the adapter file provided by Trimmomatic (TruSeq3-PE-2.fa)

This are the flags I used for trimmomatic:

java -jar trimmomatic-0.38.jar PE -phred33 ILLUMINACLIP:TruSeq3-PE-2.fa:2:30:10 LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:36

I have two questions:

  • Is the "kmer content" and "Per sequence GC content" profiles after trimming something to worry about?

  • What could be a possible reason for the change in "kmer content" after trimming?

Here you can find the FastQC reports before and after running Trimmomatic:

https://drive.google.com/open?id=1vLY0FsXxnzJYT7d4X1TWZy96cSXu3XGs

https://drive.google.com/open?id=1Tk0GCy_SEz8ZrP2Y_3f_XYs1cnN11ScU

And here is a comparison of "kmer content" and "Per sequence GC content" before and after trimming:

https://drive.google.com/open?id=1YT6zbmKU_3DYlrTX_BLkMBOpGnmqg1Z7

Thank you very much in advance

fastqc dnaseq trimmomatic • 874 views
ADD COMMENTlink written 17 months ago by serpalma.v20

Failing k-mer content and GC content in FastQC generally has no immediate adverse effect on your analysis. You should proceed with further analysis and see what you get. In latest FastQC k-mer analysis tool has been turned off by default since it causes more heartaches than necessary.

ADD REPLYlink modified 17 months ago • written 17 months ago by genomax75k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1252 users visited in the last hour