I am new to the field. I am trying to analyze single end 100b FastQ files with ~70million reads/sample. I am trying to determine if adapter sequences are present and if so how to go about them. I ran FastQC on the files and reports show they each have an "overrepresented sequence" of an "illumina index adapter" in them.
I have the following questions:
Does sample1 look like a trimmed file or it requires adapter trimming?
If further trimming is recommended what would be the best seq/adapter option to be used for cutadapt/TrimGalore? [See below for my thoughts so far]
Based on the FastQC report, do I need to worry about presence of any other adapter sequences beside the index?
My thoughts on question 2: The sequences for illumina index adapter format appear to be:
These are the adapter sequences found in my FastQC report for sample 1:
I am thinking of using below options for cutadapt/trimgalore to remove the adapter(s):
trim_galore sample1.fastq.gz -a GATCGGAAGAGCACACGTCTGAACTCCAGTCACNNNNNNATCTCGTATGCCGTCTTCTGCTTG -a AGATCGGAAGAGCACACGTCTGAACTCCAGTCACNNNNNNATCTCGTATGCCGTCTTCTGCTTG -q 20 --length 20 –fastqc
However, it seems that trimmomatics for instance only takes care of the initial sequence of the index adapter (only up to Ns and not after): https://github.com/timflutre/trimmomatic/blob/master/adapters/TruSeq3-SE.fa
Many thanks for your time and reply beforehand.