Question: Should I remove Kmers identified in FastQC?
3
gravatar for Randomguy
3.1 years ago by
Randomguy30
Randomguy30 wrote:

Hi, apologies for this basic questions, I new in NGS quality control. I have been check my NGS data (Illumina - HiSeq 2500 2*100pb) using FastQC after trimming Nextera Adaptater with bbduck (BBTool package) using trimming overlap (ktrim=r k=25 mink=11 hdist=1 tpe tbo). And the checks fails Per base sequence content, Per sequence GC content and Kmer Content. My question is should I be trying remove the first base (~15) of my sequence? When I try this, Kmer Content fails again but at the end of the sequence. Should I remove again at the end? After my goal is to assemble them with a De Bruijn assembler and thus i'm afraid of over-represent kmer.

Thanks for the help.

enter image description here enter image description here

fastqc ngs illumina • 1.7k views
ADD COMMENTlink modified 3.1 years ago by genomax65k • written 3.1 years ago by Randomguy30
4
gravatar for genomax
3.1 years ago by
genomax65k
United States
genomax65k wrote:

This post from Dr. Simon Andrews (author of FastQC) should be the last word about this question that keeps coming up in one form or other.

ADD COMMENTlink written 3.1 years ago by genomax65k

Thanks a lot for this answer. Definitely the last word about it!

ADD REPLYlink written 3.1 years ago by Randomguy30

My plots for targeted re-sequencing with Illumina MiSeq are pretty much similar. I guess, Dr. Andrews' recommendation not to trim the first 10 something bases is applicable for my case? –°orrect me if I am wrong.

ADD REPLYlink modified 2.4 years ago • written 2.4 years ago by lamteva.vera160
1

You should you start looking at those first 10 bases only if you are having trouble with alignments, otherwise do not trim.

ADD REPLYlink written 2.4 years ago by genomax65k
3
gravatar for 5heikki
3.1 years ago by
5heikki8.4k
Finland
5heikki8.4k wrote:

In fragmentation the Nextera kit enzyme prefers to cut DNA from certain places, which is probably the reason for the bias you observe. I would let it be..

ADD COMMENTlink modified 3.1 years ago • written 3.1 years ago by 5heikki8.4k
1

Yup, and in NGS in general there are about a million reasons why you get over-representation of certain kmers, etc. The FastQC thresholds in my experience are really based on what you tend to see with pretty basic Whole Genome Sequencing, lots of protocols will tend to introduce biases for certain sequences appearing in reads. Unless you notice what looks like actual contamination from primers, index sequences, etc I wouldn't worry about it much.

ADD REPLYlink written 3.1 years ago by Dan Gaston7.1k

Thanks for the details!

ADD REPLYlink written 3.1 years ago by Randomguy30
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1181 users visited in the last hour