Question: Interperting k-mer plot of FastQC, how can I find out where k-mer enrichment comes from?
0
gravatar for Niek De Klein
3.6 years ago by
Niek De Klein2.5k
Netherlands
Niek De Klein2.5k wrote:

I have 8 PRO-seq samples which all have very similar k-mer plots (one of them: http://pasteboard.co/xuO4V7O.png). At the start of the reads they have very high enrichment. 

 

The used adapter is  TGGAATTCTCGGGTGCCAAGG, which has been removed using CutAdapt. I don't know how to go from here to find where this k-mer enrichment comes from. Are there any tools available to go deeper into this, or do you have any suggestions where these enriched k-mers could come from?

 

------------------------

I just realized that I have been reading the plot incorrectly. Seems only the first 4 bases are enriched, not the complete k-mers.

fastqc next-gen k-mer • 1.7k views
ADD COMMENTlink modified 3.6 years ago • written 3.6 years ago by Niek De Klein2.5k
1

Look at the frequency counts on the table under the kmer plots. That will tell you if the enrichments are actually common in the data.  Even low counts will come up in the plot, is is scaled to 100% but in fact can be ignored.

ADD REPLYlink modified 3.6 years ago • written 3.6 years ago by Istvan Albert ♦♦ 80k
3

is scaled to 100% but in fact can be ignored.

This is exactly one of the reasons I made https://github.com/mdshw5/fastqp. Kmer plots are much more interpretable when you can see the absolute fractions and background distribution on the same graph. 

ADD REPLYlink written 3.6 years ago by Matt Shirley8.9k

Looks nifty!

ADD REPLYlink written 3.6 years ago by Brian Bushnell16k

Looks more informative than fastqc, I'll try to use this. Thanks.

ADD REPLYlink written 3.6 years ago by Niek De Klein2.5k

Most of them have very low p-values, some have ~200 when 30 is expected but others also have 1000+ when 40 expected. The obs/exp lays around 30-40, with some 50. I think this is high and shows that enrichment is common, but please correct me if I'm wrong.
 

ADD REPLYlink written 3.6 years ago by Niek De Klein2.5k
1

It all depends how many reads you start with - for 10 million reads ten thousand is just 0.1 percent - hardly worth looking into.

ADD REPLYlink written 3.6 years ago by Istvan Albert ♦♦ 80k

Is pro-seq RNA seq from 3'-end? No nucleic acid fragmentation involved?

ADD REPLYlink modified 3.6 years ago • written 3.6 years ago by 5heikki8.4k

No nucleic fragmentation, it is from the 5' end but fastqc is done after removing adapters, taking the reverse complement, removing rRNA binding reads and reads mapping to repeat regions.

ADD REPLYlink written 3.6 years ago by Niek De Klein2.5k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1273 users visited in the last hour