Question

Trimming Illumina universal adapters using cutadapt proving insufficient

2

Entering edit mode

4.8 years ago

jleehan ▴ 120

TL;DR: I have high universal Illumina adapter content in my paired-end RNA-seq reads and trimming with both the original sequence and reverse complement of the universal adapter did not completely remove the adapter content and was only effective for the R2 reads.

I am trying to trim adapter sequences from my paired-end RNA-seq data using cutadapt and I am not having a lot of success. When I ran my raw .fastq files through FastQC, it revealed that my sequences had upwards of 35% adapter content in the latter portions of the 150 bp reads.

untrimmed R1 adapter content

untrimmed R2 adapter content

Since the data showed significant universal adapter presence, I decided to use cutadapt to trim these sequences with the universal adapter sequence that Illumina provides. I used the following lines of code in bash:

for i in /blue/nicholson/jleehan/20200921_SMMasn_RNAseq/rawdata/*R1_001.fastq

do

  SAMPLE=$(echo ${i} | sed "s/R1_001\.fastq//")
#  echo ${SAMPLE}R1.fastq ${SAMPLE}R2.fastq
  cutadapt -a AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGATCT -A AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGATCT-q 30 -m 60 -o ${SAMPLE}R1_trm.fastq -p ${SAMPLE}R2_trm.fastq ${SAMPLE}R1_001.fastq ${SAMPLE}R2_001.fastq
  echo ${SAMPLE} trimmed

done

Looking at the summary of this data, it appeared to do something but not much.

cutadapt trim1 summary

Looking at the adapter content graph from FastQC I could not even see a visible difference:

trim1 R1

trim1 R2

Seeing how little this helped, I thought, maybe if I try to trim the reverse complement instead, that will help. So I did the same thing, but modifying the code to trim the reverse complement of the universal adapter sequence. I'm going to include that code as well:

for i in /blue/nicholson/jleehan/20200921_SMMasn_RNAseq/rawdata/*R1_001.fastq

do

  SAMPLE=$(echo ${i} | sed "s/R1_001\.fastq//")
#  echo ${SAMPLE}R1.fastq ${SAMPLE}R2.fastq
  cutadapt -a AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTAGATCTCGGTGGTCGCCGTATCATT -A AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTAGATCTCGGTGGTCGCCGTATCATT -q 30 -m 60 -o ${SAMPLE}R1_trm2.fastq -p ${SAMPLE}R2_trm2.fastq ${SAMPLE}R1_001.fastq ${SAMPLE}R2_001.fastq
  echo ${SAMPLE} trimmed

done

This time, it appeared to do something noticeable, but the effect was significantly more pronounced in the R2 reads than the R1 reads.

trim2 summary

trim2 R1

trim2 R2

It was nice to see some progress being made, but to be honest, I have no idea how I could proceed from here. I still have >30% adapter content in my R1 reads and >10% in my R2 reads. Definitely not what I would consider to be sufficiently trimmed from my past experience. I believe I saw elsewhere in the forum that it may be more effective to trim smaller portions of the adapter sequence but before I went and wasted the 14 hours that it takes for me to run this, I would ask the forum since y'all actually know what you're doing.

cutadapt adapters trimming RNA-Seq • 14k views

ADD COMMENT • link updated 3.7 years ago by Jiacheng ▴ 60 • written 4.8 years ago by jleehan ▴ 120

1

Entering edit mode

I would suggest you try bbduk or trimmomatic - in addition to removing adapters by matching their sequence, both can remove adapter by examining the overlap between R1 and R2, thus they are more sensitive for paired-end data.

ADD REPLY • link 4.8 years ago by h.mon 35k

0

Entering edit mode

Thanks for the advice! I'll give those tools a try.

ADD REPLY • link 4.8 years ago by jleehan ▴ 120

4

Entering edit mode

3.7 years ago

Jiacheng ▴ 60

I'd recommend atria to determine and trim the adapter sequences. It is a newly-published cutting-edge trimmer with exceptional precision and speed. And if you do not know what adapter sequence should be used, Atria can detect adapters if adapter content is higher than 0.04%. (If <0.04%, no need to do adapter trimming.)

Eg: Finding adapters for R1 and R2

atria --detect-adapter -r read1.fastq -R read2.fastq

Adapter trimming

atria -r read1.fastq -R read2.fastq -a ADAPTER_READ1 -A ADAPTER_READ2

ADD COMMENT • link 3.7 years ago by Jiacheng ▴ 60

score 5 · Accepted Answer · 2020-09-27

Turns out it was the sequence length thing. I tried with trimmomatic and got the same result so I tried changing the sequence again. FastQC has specific sequences that it references when determining adapter content and those sequences do not match sequences that Illumina lists in the adapter documentation that they currently have available on their website in length. I looked up the sequence that FastQC is specifically referring to which was "AGATCGGAAGAG" which is just the first twelve bases of the sequence that I originally tried to use. I used that exact sequence for both R1 and R2 reads and it completely removed all of my adapter content.