1
2
Entering edit mode
6 months ago
jleehan ▴ 40

TL;DR: I have high universal Illumina adapter content in my paired-end RNA-seq reads and trimming with both the original sequence and reverse complement of the universal adapter did not completely remove the adapter content and was only effective for the R2 reads.

I am trying to trim adapter sequences from my paired-end RNA-seq data using cutadapt and I am not having a lot of success. When I ran my raw .fastq files through FastQC, it revealed that my sequences had upwards of 35% adapter content in the latter portions of the 150 bp reads.

Since the data showed significant universal adapter presence, I decided to use cutadapt to trim these sequences with the universal adapter sequence that Illumina provides. I used the following lines of code in bash:

for i in /blue/nicholson/jleehan/20200921_SMMasn_RNAseq/rawdata/*R1_001.fastq

do

SAMPLE=$(echo${i} | sed "s/R1_001\.fastq//")
#  echo ${SAMPLE}R1.fastq${SAMPLE}R2.fastq
cutadapt -a AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGATCT -A AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGATCT-q 30 -m 60 -o ${SAMPLE}R1_trm.fastq -p${SAMPLE}R2_trm.fastq ${SAMPLE}R1_001.fastq${SAMPLE}R2_001.fastq
echo ${SAMPLE} trimmed done  Looking at the summary of this data, it appeared to do something but not much. Looking at the adapter content graph from FastQC I could not even see a visible difference: Seeing how little this helped, I thought, maybe if I try to trim the reverse complement instead, that will help. So I did the same thing, but modifying the code to trim the reverse complement of the universal adapter sequence. I'm going to include that code as well: for i in /blue/nicholson/jleehan/20200921_SMMasn_RNAseq/rawdata/*R1_001.fastq do SAMPLE=$(echo ${i} | sed "s/R1_001\.fastq//") # echo${SAMPLE}R1.fastq ${SAMPLE}R2.fastq cutadapt -a AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTAGATCTCGGTGGTCGCCGTATCATT -A AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTAGATCTCGGTGGTCGCCGTATCATT -q 30 -m 60 -o${SAMPLE}R1_trm2.fastq -p ${SAMPLE}R2_trm2.fastq${SAMPLE}R1_001.fastq ${SAMPLE}R2_001.fastq echo${SAMPLE} trimmed

done


This time, it appeared to do something noticeable, but the effect was significantly more pronounced in the R2 reads than the R1 reads.

It was nice to see some progress being made, but to be honest, I have no idea how I could proceed from here. I still have >30% adapter content in my R1 reads and >10% in my R2 reads. Definitely not what I would consider to be sufficiently trimmed from my past experience. I believe I saw elsewhere in the forum that it may be more effective to trim smaller portions of the adapter sequence but before I went and wasted the 14 hours that it takes for me to run this, I would ask the forum since y'all actually know what you're doing.

1
Entering edit mode

I would suggest you try bbduk or trimmomatic - in addition to removing adapters by matching their sequence, both can remove adapter by examining the overlap between R1 and R2, thus they are more sensitive for paired-end data.

0
Entering edit mode

Thanks for the advice! I'll give those tools a try.

3
Entering edit mode
6 months ago
jleehan ▴ 40

Turns out it was the sequence length thing. I tried with trimmomatic and got the same result so I tried changing the sequence again. FastQC has specific sequences that it references when determining adapter content and those sequences do not match sequences that Illumina lists in the adapter documentation that they currently have available on their website in length. I looked up the sequence that FastQC is specifically referring to which was "AGATCGGAAGAG" which is just the first twelve bases of the sequence that I originally tried to use. I used that exact sequence for both R1 and R2 reads and it completely removed all of my adapter content.