Interperting k-mer plot of FastQC, how can I find out where k-mer enrichment comes from?
0
0
Entering edit mode
8.6 years ago
Niek De Klein ★ 2.6k

I have 8 PRO-seq samples which all have very similar k-mer plots (one of them: http://pasteboard.co/xuO4V7O.png). At the start of the reads they have very high enrichment.

The used adapter is TGGAATTCTCGGGTGCCAAGG, which has been removed using CutAdapt. I don't know how to go from here to find where this k-mer enrichment comes from. Are there any tools available to go deeper into this, or do you have any suggestions where these enriched k-mers could come from?


I just realized that I have been reading the plot incorrectly. Seems only the first 4 bases are enriched, not the complete k-mers.

fastqc next-gen k-mer • 3.5k views
ADD COMMENT
1
Entering edit mode

Look at the frequency counts on the table under the kmer plots. That will tell you if the enrichments are actually common in the data. Even low counts will come up in the plot, is is scaled to 100% but in fact can be ignored.

ADD REPLY
3
Entering edit mode

is scaled to 100% but in fact can be ignored.

This is exactly one of the reasons I made https://github.com/mdshw5/fastqp. Kmer plots are much more interpretable when you can see the absolute fractions and background distribution on the same graph.

ADD REPLY
0
Entering edit mode

Looks nifty!

ADD REPLY
0
Entering edit mode

Looks more informative than fastqc, I'll try to use this. Thanks.

ADD REPLY
0
Entering edit mode

Most of them have very low p-values, some have ~200 when 30 is expected but others also have 1000+ when 40 expected. The obs/exp lays around 30-40, with some 50. I think this is high and shows that enrichment is common, but please correct me if I'm wrong.

ADD REPLY
1
Entering edit mode

It all depends how many reads you start with - for 10 million reads ten thousand is just 0.1 percent - hardly worth looking into.

ADD REPLY
0
Entering edit mode

Is pro-seq RNA seq from 3'-end? No nucleic acid fragmentation involved?

ADD REPLY
0
Entering edit mode

No nucleic fragmentation, it is from the 5' end but fastqc is done after removing adapters, taking the reverse complement, removing rRNA binding reads and reads mapping to repeat regions.

ADD REPLY

Login before adding your answer.

Traffic: 2941 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6