Using Cutadapt to trim adapters from paired-end small RNA sequence data
0
0
Entering edit mode
7 days ago

Hi everyone.

I am using cutadapt to trim adapter sequences from my small RNA sequencing reads, but I am struggling to trim adapters from the second of the paired libraries. NGS was done using Illumina Novaseq 6000 100bp PE, after library prep using the SMARTer smRNA-Seq Kit. https://www.takarabio.com/documents/User%20Manual/SMARTer%20smRNA/SMARTer%20smRNA-Seq%20Kit%20for%20Illumina%20User%20Manual.pdf

The SMARTer smRNA-Seq Kit Manual suggests using the following code to trim the first 3 bases of each sequence as well as the 3' adapter which follows the artificial polyA tail:

cutadapt -m 15 -u 3 -a AAAAAAAAAA input.fastq > output.fastq

I modified this code slightly according to my own needs using the cutadapt user guide as follows:

cutadapt  -u 3 -a AAAAAAAAAA -o outputfile_1.fastq.gz  inputfile_1.fastq.gz

This code works perfectly for my "forward" (Read_1) libraries for each sample - FASTQC shows all adapters were removed and trimmed sequence lengths correspond to small RNA sizes.

Unfortunately the SMARTer smRNA-Seq Kit Manual does not explain how to treat the "reverse" (Read_2) libraries.

I am pretty inexperienced with bioinformatics, but my reasoning the Read_2 libraries would be treated as the reverse complement ie. I would need to cut off the 3 bases at the end of each read rather than the start, and trim off everything 5' of the polyT corresponding to the artificial polyA tail. I tried the following code to trim adapter sequences from the Read_2 libraries:

cutadapt  -u -3 -g TTTTTTTTTT -o outputfile_2.fastq.gz  inputfile_2.fastq.gz

The resulting FASTQC output showed that my reads still contained Illumina adapter sequences, and most reads being ~81-87 nt long. (This happens when I tried using cutadapt to trim corresponding paired libraries in a single code as well). The only way I can get rid of the adapter sequences is by using the reverse complement functionality in cutadapt:

cutadapt  -u 3 -a AAAAAAAAAA --rc -o outputfile_2.fastq.gz  inputfile_2.fastq.gz

The FASTQC output then shows no remaining adapters, but most sequences were 81-92 nt long, indicating that the small RNAs were not correctly trimmed, since the output is not made up of RNAs < 30nt.

I've been searching online for solutions, but it seems like most people use single read rather than paired-end sequencing for small RNAs, so I'm not finding an applicable solution.

Does anyone perhaps know how I should be trimming adapters from my Read_2 libraries so that I can used the paired small RNA sequences downstream in my analyses?

Any help would be SO appreciated.

trimming adapters cutadapt sRNA-seq smallRNA • 215 views
ADD COMMENT
0
Entering edit mode

You can simply ignore read 2. It is not adding any information since your small RNA's are going to be small and were completely sequenced by read 1. Follow the protocol described in the manual you linked.

You can't really use read 2 as stand-alone data since if you tried to count the data using those you will be double counting. You could merge read1 and read 2 to get a single representation of the library fragment as an option.

ADD REPLY
0
Entering edit mode

Thank you! This was going to be my next option - to simply use the Read_1 libraries for all samples.

Can I ask about the second option you suggested? How would you merge the paired libraries in a way that allows adapters to be trimmed from both libraries?

ADD REPLY

Login before adding your answer.

Traffic: 2146 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6