Question: Cutadapt for paired end RRBS data
0
gravatar for anikb
18 months ago by
anikb10
anikb10 wrote:

Hi all,

I have just started working with RRBS data, and I am having issue understanding the data and need to understand how to trim adapter sequences. This is probably a very naive question, but any help would be appreciated. I am trying to trim paired-end RRBS data using Cutadapt. (I am aware of Trim_galore, but I am using Cutadapt mainly because Cutadapt is already installed on the cluster that I am working on, and since Trim galore uses Cutadapt, I should get the same results).

If I want to trim the Illumina adapter sequence(AGATCGGAGAGC) from the 3' end of my Paired end data using cuadapt, at first I add 2 Ns to the Adapter (NNAGATCGGAGAGC) to remove bases added during end repair. So, my understanding is we are supposed to remove the additional bases and adapter only from 3'end. What happens to the adapters or additional bases at the 5'end? This paper, doi: 10.1186/s12864-016-2494-8, briefly mentions:

Removal of additional bases for pair-end sequencing can be tricky as it can affect subsequent RRBS read alignment. For example, removing two additional bases from the beginning of the read 2 (complementary reads to the original forward and reverse strands) would remove CGG tag that is used to search for indexed CCGG motif in the reference genome causing the reads to remain unaligned in RRBSMAP.

For my paired end reads, how should I trim the adapters using cutadapt? Do I not provide a reverse complement for my Read 2 at all ? or do it like below? Please advise.

cutadapt -a NNAGATCGGAGAGC -A GCTCTCCGATCTNN -o outputR1 -p outputR2 inputR1.fastq inputR2.fastq

Thanks!

cutadapt rrbs trimming • 1.2k views
ADD COMMENTlink modified 18 months ago by Devon Ryan93k • written 18 months ago by anikb10
2
gravatar for Devon Ryan
18 months ago by
Devon Ryan93k
Freiburg, Germany
Devon Ryan93k wrote:

My suggestion is to not bother removing the repaired bases and to instead simply ignore them during methylation extraction. The procedure is then essentially:

  1. cutadapt -a AGATCGGAGAGC ... (N.B., cutadapt can use multiple threads!)
  2. Align (I usually use bwa-meth from Brent Pedersen)
  3. Look at the methylation bias (e.g., MethylDackel mbias ...)
  4. Extract ignoring bases showing bias (e.g., MethylDackel extract --OT 2,98,2,98 ...)

Then you're maximizing the quality of the alignments but also not including problematic bases (or having your cake and eating it too, if you prefer).

ADD COMMENTlink written 18 months ago by Devon Ryan93k
  1. Hmm, ok. Thank you for your answer! So I will not mess with the repaired bases at this point then and take care of it after creating the M-bias plot. I might have to use Bismark, since that's what is available on our cluster, and it has both the features that you mentioned above(3,4) . Is bwa-meth better for alignment than Bismark, in your experience?

  2. Also, I just want to explicitly mention what I understood for the trimming part and confirm it. With Cutadapt, I should provide the reverse complement of adapter for the R2 file as well, right?

    cutadapt -a AGATCGGAGAGC -A GCTCTCCGATCT -o oR1 -p oR2 inR1.fq inR2.fq

ADD REPLYlink written 18 months ago by anikb10
1
  1. I've typically gotten better and faster results with bwa-meth, yes.
  2. You can use cutadapt -a AGATCGGAAGAGC -A AGATCGGAAGAGC, you don't need to reverse complement the adapter for R2 since the read will have the opposite orientation to begin with.
ADD REPLYlink written 18 months ago by Devon Ryan93k

Thanks a bunch, that really helped!

ADD REPLYlink written 18 months ago by anikb10

Hi Devon, Just a follow up on the last question. I trimmed the files and ran bismark with default options as below:

  1. Genome preparation: bismark_genome_preparation --verbose hg38_gencode
  2. Alignment: bismark reference_genome -1 Read_1.fq -2 Read_2.fq

The M-bias plots look like these (below). In the M-bias plot examples (Google), the CpG plot line looks like a straight line across the length of the read except at ends. Mine are very uneven across the read length. Would you know why that is?

M-bias plot for Read 1

M-bias plot for Read 2

ADD REPLYlink modified 18 months ago • written 18 months ago by anikb10

Yikes! What was your alignment rate? I usually only see M-bias plots like this with targeted amplicon-seq combined with bisulfite treatment (i.e., really really reduced representation BS-seq). Do you have a high degree of duplication? That's the most likely cause of something like this. If that's the case, then mark duplicates and recreate the M-bias plots while ignoring marked duplicates (I presume there's an option for that in bismark, but if not just use MethylDackel).

ADD REPLYlink written 18 months ago by Devon Ryan93k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1569 users visited in the last hour