Hi all,
I have just started working with RRBS data, and I am having issue understanding the data and need to understand how to trim adapter sequences. This is probably a very naive question, but any help would be appreciated. I am trying to trim paired-end RRBS data using Cutadapt. (I am aware of Trim_galore, but I am using Cutadapt mainly because Cutadapt is already installed on the cluster that I am working on, and since Trim galore uses Cutadapt, I should get the same results).
If I want to trim the Illumina adapter sequence(AGATCGGAGAGC) from the 3' end of my Paired end data using cuadapt, at first I add 2 Ns to the Adapter (NNAGATCGGAGAGC) to remove bases added during end repair. So, my understanding is we are supposed to remove the additional bases and adapter only from 3'end. What happens to the adapters or additional bases at the 5'end? This paper, doi: 10.1186/s12864-016-2494-8, briefly mentions:
Removal of additional bases for pair-end sequencing can be tricky as it can affect subsequent RRBS read alignment. For example, removing two additional bases from the beginning of the read 2 (complementary reads to the original forward and reverse strands) would remove CGG tag that is used to search for indexed CCGG motif in the reference genome causing the reads to remain unaligned in RRBSMAP.
For my paired end reads, how should I trim the adapters using cutadapt? Do I not provide a reverse complement for my Read 2 at all ? or do it like below? Please advise.
cutadapt -a NNAGATCGGAGAGC -A GCTCTCCGATCTNN -o outputR1 -p outputR2 inputR1.fastq inputR2.fastq
Thanks!