How To Interpret The Kmer Enrichment Plot Of A Fastqc Output
10.2 years ago
pmuench ▴ 140

I preprocess my fastq dataset with cutadapt to remove 3' adapters. Because I had problem to align this I took a look on the dataset with FastQC. I am really confused because the FastQC output for my raw dataset (before cutadapt) looks like this:

• is it normal that adapters does't start from the first base on average? On the FastQC output it seems that the adapter starts after the third base?
• for me it looks like that there is a 5' adapter too (or how the k-mers in position > 20 can be explained?)
• whats about the k-mer AAAAA? Is this a sequencing error or contamination?

Thanks!

fastqc illumina next-gen • 13k views
10.2 years ago

Fascinating plot.

This is clearly a small RNA sequencing experiment. That pattern ATGCCGTCT you are seeing is the middle of the Illumina small RNA kit v1.5 adapter:

ATCTCGTATGCCGTCTTCTGCTTG


which is followed by a fake polyA tail designed to work with the RNA-seq kit Bustard no-calls reported as As

Not sure why you are seeing it at the beginning of the sequence like that, perhaps something special was done there like:

barcode-ATGCCGTCT-sequence-ATCTCGTATGCCGTCTTCTGCTTG-fakePolyA


in which case you should trim carefully

Could you please explain fakePolyA issue more? I just learned I have contamination in my Illumina RNASeq dataset which looks like this: GATCGGAAGAGCACACGTCTGAACTCCAGTCACCGATGTATCTCGTATGCCGTCTTCTGCTTGAAAAAAAAAAAA (variable length of polyA). Thanks a lot!

you should ask someone more familiar with those truseq small rna kits - but I don't see how that polyA is biological if it occurs after the 3' adapter, in purified dna no less. i could also buy the Bustard explanation.

10.2 years ago
Gabriel R. ★ 2.9k
• is it normal that adapters does't start from the first base on average? On the FastQC output it seems that the adapter starts after the third base?

I am not sure which adapter you are referring to, the one next to the 5' end ? Yes it should start at the beginning, unless there was an issue in priming it.

• for me it looks like that there is a 5' adapter too (or how the k-mers in position > 20 can be explained?)

That's weird, did you sequence smallRNA or something ? Did you do a gel cut that was very short ?

• whats about the k-mer AAAAA? Is this a sequencing error or contamination?

When Bustard meets bases with no intensity, it produces an 'A' with quality 0

Thank you for the answer! This data is from a RNA seq experiment. My information was there is one adapter in the dataset (which I see on the first 5 peaks on the image). But my main question is: What is with the peaks after base 20? Is this a second adapter on the other end which I have to cut with cutadapt seperately?

I am not sure, do you recognize the sequence in your adapter sequence ?

yes, the kmer composition looks like the adapter sequence. But I expected that the adapter is only ligated to the 3' end. I am not sure if I can conclude from this figure that the same adapter is ligated also to the 5' end.

8.7 years ago
rse ▴ 100

