I'm very new to RNA-seq analysis and am currently stuck on the trimming step. The libraries I am trying to analyze were built using the NEXTFLEX Rapid Directional RNA-seq kit with their Unique Dual Index Barcodes (https://perkinelmer-appliedgenomics.com/wp-content/uploads/2022/02/NOVA-51292X-NEXTFLEX-RNA-Seq-2-0-UDI-Barcodes-V22-02-new.pdf). They were submitted 2x111 paired-end sequencing.
I'm trying to use cutadapt to do some pair-end trimming but am struggling to understand what exactly I need to trim here. Each barcode has a unique 8 bp index that corresponds to the P5/P7 regions -- is this what I am supposed to trim off? So in cutadapt: -a XXXXXXXX(P5 index) -A XXXXXXXX (P7index)
Or do I have to trim off the entire udi barcode (which seems rather long since my sequences are only 111bp) like this?: -a AATGATACGGCGACCACCGAGATCTACACXXXXXXXXACACTCTTTCCCTACACGACGCTCTTCCGATCT -A GATCGGAAGAGCACACGTCTGAACTCCAGTCACXXXXXXXXATCTCGTATGCCGTCTTCTGCTTG
The other component that is confusing to me is that when I checked the quality of the FASTQ file on FASTQC, they always point out an overrepresented TruSeq adapter sequence, which is confusing to me since this adapter was not used during library prep
If anyone has experience trimming with these barcodes or have any insights, that would be awesome!
Do you happen to know if the middle part of this udi barcode (GATCGGAAGAGCACACGTCTGAACTCCAGTCACGTCTACATATCTCG) is also part of the TruSeq adapter? This is what is showing up as overrepresented when checking on FASTQC but is trimming this sequencing for read 1 and read 2 sufficient?