I preprocess my fastq dataset with cutadapt to remove 3' adapters. Because I had problem to align this I took a look on the dataset with FastQC. I am really confused because the FastQC output for my raw dataset (before cutadapt) looks like this:
- is it normal that adapters does't start from the first base on average? On the FastQC output it seems that the adapter starts after the third base?
- for me it looks like that there is a 5' adapter too (or how the k-mers in position > 20 can be explained?)
- whats about the k-mer AAAAA? Is this a sequencing error or contamination?
Could you please explain fakePolyA issue more? I just learned I have contamination in my Illumina RNASeq dataset which looks like this: GATCGGAAGAGCACACGTCTGAACTCCAGTCACCGATGTATCTCGTATGCCGTCTTCTGCTTGAAAAAAAAAAAA (variable length of polyA). Thanks a lot!
you should ask someone more familiar with those truseq small rna kits - but I don't see how that polyA is biological if it occurs after the 3' adapter, in purified dna no less. i could also buy the Bustard explanation.