Question

kmer content changes after trimming and removing adapters from reads

0

Entering edit mode

5.8 years ago

serpalma.v ▴ 80

Dear community

I have a large set of FASTQ files from genomic DNA. I ran them through FastQC and found that the modules "overrepresented sequences" and "Kmer content" failed. The rest of the modules did not fail, except a warning in "Per tile sequence". Such pattern was present in almost all FASTQ files (>1000 files).

The "overrepresented sequences" module pointed out the presence of TruSeq adapters and Illumina PCR Primer 1.

I ran them through Trimmomatic to remove adapters. The module "overrepresented sequences" was fixed, but "Kmer content" failed again, only this time the pattern was different. Moreover, I get a new warning for the "Per sequence GC content" module (please see linked figure).

I have read that this pattern in "Kmer content" before trimming (kmers found at the beginning of the reads) could be due to fragmentation bias.

I worked with the adapter file provided by Trimmomatic (TruSeq3-PE-2.fa)

This are the flags I used for trimmomatic:

java -jar trimmomatic-0.38.jar PE -phred33 ILLUMINACLIP:TruSeq3-PE-2.fa:2:30:10 LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:36

I have two questions:

Is the "kmer content" and "Per sequence GC content" profiles after trimming something to worry about?
What could be a possible reason for the change in "kmer content" after trimming?

Here you can find the FastQC reports before and after running Trimmomatic:

https://drive.google.com/open?id=1vLY0FsXxnzJYT7d4X1TWZy96cSXu3XGs

https://drive.google.com/open?id=1Tk0GCy_SEz8ZrP2Y_3f_XYs1cnN11ScU

And here is a comparison of "kmer content" and "Per sequence GC content" before and after trimming:

https://drive.google.com/open?id=1YT6zbmKU_3DYlrTX_BLkMBOpGnmqg1Z7

Thank you very much in advance

DNAseq trimmomatic fastqc • 2.6k views

ADD COMMENT • link 5.8 years ago by serpalma.v ▴ 80

0

Entering edit mode

Failing k-mer content and GC content in FastQC generally has no immediate adverse effect on your analysis. You should proceed with further analysis and see what you get. In latest FastQC k-mer analysis tool has been turned off by default since it causes more heartaches than necessary.

ADD REPLY • link 5.8 years ago by GenoMax 141k