Interperting k-mer plot of FastQC, how can I find out where k-mer enrichment comes from?

0

Entering edit mode

8.6 years ago

Niek De Klein ★ 2.6k

I have 8 PRO-seq samples which all have very similar k-mer plots (one of them: http://pasteboard.co/xuO4V7O.png). At the start of the reads they have very high enrichment.

The used adapter is TGGAATTCTCGGGTGCCAAGG, which has been removed using CutAdapt. I don't know how to go from here to find where this k-mer enrichment comes from. Are there any tools available to go deeper into this, or do you have any suggestions where these enriched k-mers could come from?

I just realized that I have been reading the plot incorrectly. Seems only the first 4 bases are enriched, not the complete k-mers.

fastqc next-gen k-mer • 3.5k views

ADD COMMENT • link updated 19 months ago by Ram 43k • written 8.6 years ago by Niek De Klein ★ 2.6k

1

Entering edit mode

Look at the frequency counts on the table under the kmer plots. That will tell you if the enrichments are actually common in the data. Even low counts will come up in the plot, is is scaled to 100% but in fact can be ignored.

ADD REPLY • link updated 19 months ago by Ram 43k • written 8.6 years ago by Istvan Albert 100k

3

Entering edit mode

is scaled to 100% but in fact can be ignored.

This is exactly one of the reasons I made https://github.com/mdshw5/fastqp. Kmer plots are much more interpretable when you can see the absolute fractions and background distribution on the same graph.

ADD REPLY • link updated 19 months ago by Ram 43k • written 8.6 years ago by Matt Shirley 10k

0

Entering edit mode

Looks nifty!

ADD REPLY • link 8.6 years ago by Brian Bushnell 20k

0

Entering edit mode

Looks more informative than fastqc, I'll try to use this. Thanks.

ADD REPLY • link 8.6 years ago by Niek De Klein ★ 2.6k

0

Entering edit mode

Most of them have very low p-values, some have ~200 when 30 is expected but others also have 1000+ when 40 expected. The obs/exp lays around 30-40, with some 50. I think this is high and shows that enrichment is common, but please correct me if I'm wrong.

ADD REPLY • link updated 19 months ago by Ram 43k • written 8.6 years ago by Niek De Klein ★ 2.6k

1

Entering edit mode

It all depends how many reads you start with - for 10 million reads ten thousand is just 0.1 percent - hardly worth looking into.

ADD REPLY • link 8.6 years ago by Istvan Albert 100k

0

Entering edit mode

Is pro-seq RNA seq from 3'-end? No nucleic acid fragmentation involved?

ADD REPLY • link 8.6 years ago by 5heikki 11k

0

Entering edit mode

No nucleic fragmentation, it is from the 5' end but fastqc is done after removing adapters, taking the reverse complement, removing rRNA binding reads and reads mapping to repeat regions.

ADD REPLY • link 8.6 years ago by Niek De Klein ★ 2.6k

Login before adding your answer.