I need some help regarding QC of NGS Data.
I have some raw NGS data (More than 100 samples) in FASTQ format and I have trimmed adapter sequences using trimmomatic v0.39. The adapter sequences used for trimming were:
>PrefixPE/1 TACACTCTTTCCCTACACGACGCTCTTCCGATCT >PrefixPE/2 GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT
When I ran fastqc after that to check the adapter sequence were cleaned. But when I try to check the presence of index sequences, these index sequences were still in the reads.
For example, I have given below the description line of one read of the forward read file (Read1) from one sample:
As described in the fast file description line, TCCGGAGA+GGCTCTGA dual index (8bp) was used. These index sequences are still in the trimmed reads. I checked it using:
grep "^[^@]" 2B_TruSeqCD_BHCHTCDRXX_R1.fastq| grep GGCTCTGA
grep "^[^@]" 2B_TruSeqCD_BHCHTCDRXX_R1.fastq| grep TCCGGAGA
As these index sequences are not biological origin, how to get ride of it? or we can still go for alignment? Shall I trim these sequences off?
Your expert advice (with or without supporting articles/documentation) will be appreciated.