Question

Should I remove Kmers identified in FastQC?

3

Entering edit mode

8.2 years ago

Randomguy ▴ 30

Hi, apologies for this basic questions, I new in NGS quality control. I have been check my NGS data (Illumina - HiSeq 2500 2*100pb) using FastQC after trimming Nextera Adaptater with bbduck (BBTool package) using trimming overlap (ktrim=r k=25 mink=11 hdist=1 tpe tbo). And the checks fails Per base sequence content, Per sequence GC content and Kmer Content. My question is should I be trying remove the first base (~15) of my sequence? When I try this, Kmer Content fails again but at the end of the sequence. Should I remove again at the end? After my goal is to assemble them with a De Bruijn assembler and thus i'm afraid of over-represent kmer.

Thanks for the help.

enter image description here

Illumina fastqc NGS • 3.6k views

ADD COMMENT • link updated 5 months ago by GenoMax 142k • written 8.2 years ago by Randomguy ▴ 30

score 4 · Accepted Answer · 2016-03-10

4

Entering edit mode

8.2 years ago

GenoMax 142k

This post from Dr. Simon Andrews (author of FastQC) should be the last word about this question that keeps coming up in one form or other.

ADD COMMENT • link 8.2 years ago by GenoMax 142k

0

Entering edit mode

Thanks a lot for this answer. Definitely the last word about it!

ADD REPLY • link 8.2 years ago by Randomguy ▴ 30

0

Entering edit mode

My plots for targeted re-sequencing with Illumina MiSeq are pretty much similar. I guess, Dr. Andrews' recommendation not to trim the first 10 something bases is applicable for my case? Сorrect me if I am wrong.

ADD REPLY • link 7.4 years ago by lamteva.vera ▴ 220

1

Entering edit mode

You should you start looking at those first 10 bases only if you are having trouble with alignments, otherwise do not trim.

ADD REPLY • link 7.4 years ago by GenoMax 142k

0

Entering edit mode

The post link is not working. Could you just post the details?

ADD REPLY • link 5 months ago by xiaoleiusc ▴ 140

1

Entering edit mode

I have informed the owners of the site to fix. But you can view the snapshot captured by internet archive here: https://web.archive.org/web/20210730200456/https://sequencing.qcfail.com/articles/positional-sequence-bias-in-random-primed-libraries/

ADD REPLY • link 5 months ago by GenoMax 142k

1

Entering edit mode

QCFail site is back in operation.

ADD REPLY • link 5 months ago by GenoMax 142k

score 3 · Accepted Answer · 2016-03-10

3

Entering edit mode

8.2 years ago

5heikki 11k

In fragmentation the Nextera kit enzyme prefers to cut DNA from certain places, which is probably the reason for the bias you observe. I would let it be..

ADD COMMENT • link 8.2 years ago by 5heikki 11k

1

Entering edit mode

Yup, and in NGS in general there are about a million reasons why you get over-representation of certain kmers, etc. The FastQC thresholds in my experience are really based on what you tend to see with pretty basic Whole Genome Sequencing, lots of protocols will tend to introduce biases for certain sequences appearing in reads. Unless you notice what looks like actual contamination from primers, index sequences, etc I wouldn't worry about it much.