I have Illumina sequenced fastq files. Virtually every read (although not all) starts with the triplet "TAA". I assumed these we adapters. However, when I use Trimmomatic with:
ILLUMINACLIP:Trimmomatic-0.39/adapters/TruSeq2-PE.fa:2:30:10:2:True LEADING:3 TRAILING:3 MINLEN:36
These triplets still remain. Can someone please advise what they are and how they should be dealt with.
I see no mention of what adapters were used but the report doc states: "As for the sequencing of GBS library, the sequenced reads of 144 bp at either end are adapter-free, which could be directly subjected to quality control for low quality reads filtration. The retaining sequences in 144 bp length (namely clean data) are qualified for mapping with the reference genome". So I am puzzled why these motifs are so prevalent.