Hello peers,
I need some help regarding QC of NGS Data.
I have some raw NGS data (More than 100 samples) in FASTQ format and I have trimmed adapter sequences using trimmomatic v0.39. The adapter sequences used for trimming were:
>PrefixPE/1
TACACTCTTTCCCTACACGACGCTCTTCCGATCT
>PrefixPE/2
GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT
When I ran fastqc after that to check the adapter sequence were cleaned. But when I try to check the presence of index sequences, these index sequences were still in the reads.
For example, I have given below the description line of one read of the forward read file (Read1) from one sample:
@F00740:29:HCHTCDRXX:1:1101:10574:1016 1:N:0:TCCGGAGA+GGCTCTGA
As described in the fast file description line, TCCGGAGA+GGCTCTGA dual index (8bp) was used. These index sequences are still in the trimmed reads. I checked it using:
grep "^[^@]" 2B_TruSeqCD_BHCHTCDRXX_R1.fastq| grep GGCTCTGA
Output:
AND
grep "^[^@]" 2B_TruSeqCD_BHCHTCDRXX_R1.fastq| grep TCCGGAGA
Output:
As these index sequences are not biological origin, how to get ride of it? or we can still go for alignment? Shall I trim these sequences off?
Your expert advice (with or without supporting articles/documentation) will be appreciated.
Thank you!