Should I remove Kmers identified in FastQC?
2
3
Entering edit mode
7.7 years ago
Randomguy ▴ 30

Hi, apologies for this basic questions, I new in NGS quality control. I have been check my NGS data (Illumina - HiSeq 2500 2*100pb) using FastQC after trimming Nextera Adaptater with bbduck (BBTool package) using trimming overlap (ktrim=r k=25 mink=11 hdist=1 tpe tbo). And the checks fails Per base sequence content, Per sequence GC content and Kmer Content. My question is should I be trying remove the first base (~15) of my sequence? When I try this, Kmer Content fails again but at the end of the sequence. Should I remove again at the end? After my goal is to assemble them with a De Bruijn assembler and thus i'm afraid of over-represent kmer.

Thanks for the help.

enter image description here enter image description here

Illumina fastqc NGS • 3.4k views
ADD COMMENT
4
Entering edit mode
7.7 years ago
GenoMax 136k

This post from Dr. Simon Andrews (author of FastQC) should be the last word about this question that keeps coming up in one form or other.

ADD COMMENT
0
Entering edit mode

Thanks a lot for this answer. Definitely the last word about it!

ADD REPLY
0
Entering edit mode

My plots for targeted re-sequencing with Illumina MiSeq are pretty much similar. I guess, Dr. Andrews' recommendation not to trim the first 10 something bases is applicable for my case? –°orrect me if I am wrong.

ADD REPLY
1
Entering edit mode

You should you start looking at those first 10 bases only if you are having trouble with alignments, otherwise do not trim.

ADD REPLY
0
Entering edit mode

The post link is not working. Could you just post the details?

ADD REPLY
1
Entering edit mode

I have informed the owners of the site to fix. But you can view the snapshot captured by internet archive here: https://web.archive.org/web/20210730200456/https://sequencing.qcfail.com/articles/positional-sequence-bias-in-random-primed-libraries/

ADD REPLY
1
Entering edit mode

QCFail site is back in operation.

ADD REPLY
3
Entering edit mode
7.7 years ago
5heikki 11k

In fragmentation the Nextera kit enzyme prefers to cut DNA from certain places, which is probably the reason for the bias you observe. I would let it be..

ADD COMMENT
1
Entering edit mode

Yup, and in NGS in general there are about a million reasons why you get over-representation of certain kmers, etc. The FastQC thresholds in my experience are really based on what you tend to see with pretty basic Whole Genome Sequencing, lots of protocols will tend to introduce biases for certain sequences appearing in reads. Unless you notice what looks like actual contamination from primers, index sequences, etc I wouldn't worry about it much.

ADD REPLY
0
Entering edit mode

Thanks for the details!

ADD REPLY

Login before adding your answer.

Traffic: 1506 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6