Question: Trimming Illumina universal adapters using cutadapt proving insufficient
2
gravatar for jleehan
3 months ago by
jleehan40
jleehan40 wrote:

TL;DR: I have high universal Illumina adapter content in my paired-end RNA-seq reads and trimming with both the original sequence and reverse complement of the universal adapter did not completely remove the adapter content and was only effective for the R2 reads.

I am trying to trim adapter sequences from my paired-end RNA-seq data using cutadapt and I am not having a lot of success. When I ran my raw .fastq files through FastQC, it revealed that my sequences had upwards of 35% adapter content in the latter portions of the 150 bp reads.

untrimmed R1 adapter content

untrimmed R2 adapter content

Since the data showed significant universal adapter presence, I decided to use cutadapt to trim these sequences with the universal adapter sequence that Illumina provides. I used the following lines of code in bash:

for i in /blue/nicholson/jleehan/20200921_SMMasn_RNAseq/rawdata/*R1_001.fastq

do

  SAMPLE=$(echo ${i} | sed "s/R1_001\.fastq//")
#  echo ${SAMPLE}R1.fastq ${SAMPLE}R2.fastq
  cutadapt -a AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGATCT -A AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGATCT-q 30 -m 60 -o ${SAMPLE}R1_trm.fastq -p ${SAMPLE}R2_trm.fastq ${SAMPLE}R1_001.fastq ${SAMPLE}R2_001.fastq
  echo ${SAMPLE} trimmed

done

Looking at the summary of this data, it appeared to do something but not much.

cutadapt trim1 summary

Looking at the adapter content graph from FastQC I could not even see a visible difference:

trim1 R1

trim1 R2

Seeing how little this helped, I thought, maybe if I try to trim the reverse complement instead, that will help. So I did the same thing, but modifying the code to trim the reverse complement of the universal adapter sequence. I'm going to include that code as well:

for i in /blue/nicholson/jleehan/20200921_SMMasn_RNAseq/rawdata/*R1_001.fastq

do

  SAMPLE=$(echo ${i} | sed "s/R1_001\.fastq//")
#  echo ${SAMPLE}R1.fastq ${SAMPLE}R2.fastq
  cutadapt -a AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTAGATCTCGGTGGTCGCCGTATCATT -A AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTAGATCTCGGTGGTCGCCGTATCATT -q 30 -m 60 -o ${SAMPLE}R1_trm2.fastq -p ${SAMPLE}R2_trm2.fastq ${SAMPLE}R1_001.fastq ${SAMPLE}R2_001.fastq
  echo ${SAMPLE} trimmed

done

This time, it appeared to do something noticeable, but the effect was significantly more pronounced in the R2 reads than the R1 reads.

trim2 summary

trim2 R1

trim2 R2

It was nice to see some progress being made, but to be honest, I have no idea how I could proceed from here. I still have >30% adapter content in my R1 reads and >10% in my R2 reads. Definitely not what I would consider to be sufficiently trimmed from my past experience. I believe I saw elsewhere in the forum that it may be more effective to trim smaller portions of the adapter sequence but before I went and wasted the 14 hours that it takes for me to run this, I would ask the forum since y'all actually know what you're doing.

ADD COMMENTlink modified 3 months ago • written 3 months ago by jleehan40
1

I would suggest you try bbduk or trimmomatic - in addition to removing adapters by matching their sequence, both can remove adapter by examining the overlap between R1 and R2, thus they are more sensitive for paired-end data.

ADD REPLYlink written 3 months ago by h.mon32k

Thanks for the advice! I'll give those tools a try.

ADD REPLYlink written 3 months ago by jleehan40
3
gravatar for jleehan
3 months ago by
jleehan40
jleehan40 wrote:

Turns out it was the sequence length thing. I tried with trimmomatic and got the same result so I tried changing the sequence again. FastQC has specific sequences that it references when determining adapter content and those sequences do not match sequences that Illumina lists in the adapter documentation that they currently have available on their website in length. I looked up the sequence that FastQC is specifically referring to which was "AGATCGGAAGAG" which is just the first twelve bases of the sequence that I originally tried to use. I used that exact sequence for both R1 and R2 reads and it completely removed all of my adapter content.

ADD COMMENTlink written 3 months ago by jleehan40
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1403 users visited in the last hour
_