cutadapt: how to trim adapter dimers?
1
0
Entering edit mode
11 months ago
Ana • 0

Hi, I have to analyse some smallRNA-seq data, that was perform using Illumina NovaSeq6000 (SE50 sequencing strategy). For what I understand this is a single-end sequence and all reads will have 50-bp.

My goal now was to trim the adapter sequences using cutadapt. Im having some struggles understanding what is the best procedure for our sequencing type. I was thinking of performing only the 3'-adapter sequence trimming, since we are not expecting to see 5' adapter sequences. Thus, I was wondering what is the best cutadapt "command" to trim this adapter: either "-a" or "-b". My concern here is the possibility of existing adapter dimers, where the 3'-adapter sequence is expected to be located at the 5'-end. If I use the "-b" command, for these adapter dimers, for what i understood, the cutadapt will consider these sequences as 5'adapters instead of 3'-adapters and, thus, will NOT trim the sequence that is following the adapter. Is this correct? Could you please tell me what you would recommend me to do?

Thank you in advance

cutadapt small-RNAseq adapters • 786 views
ADD COMMENT
0
Entering edit mode

Generally once a core sequence is found trimming programs will remove the entire sequence 3' of that core including it. So adapter dimers should be addressed that way.

You have tagged this smallRNAseq. Many kits use a specific adapter to ligate to small RNA. Unless that adapter is present (which would be on 3-end) you may not have a real smallRNA.

If you are willing to try a different program then I recommend bbduk.sh (LINK) or fastp.

ADD REPLY
0
Entering edit mode
11 months ago
Jesse ▴ 740

My concern here is the possibility of existing adapter dimers, where the 3'-adapter sequence is expected to be located at the 5'-end. If I use the "-b" command, for these adapter dimers, for what i understood, the cutadapt will consider these sequences as 5'adapters instead of 3'-adapters and, thus, will NOT trim the sequence that is following the adapter. Is this correct?

Let's find out! The documentation for the "5' or 3' adapters" feature says:

The decision which part of the read to remove is made as follows: If there is at least one base before the found adapter, then the adapter is considered to be a 3' adapter and the adapter itself and everything following it is removed. Otherwise, the adapter is considered to be a 5' adapter and it is removed from the read, but the sequence after it remains.

So it sounds like it'll only be one end or the other, like you said. Checking with ATCCCGGATGTT as the the adapter on one/the other/both ends of a random 50 nt sequence, that is the behavior I see:

$ cat input.fa 
>seq1 adapter at 3'
TCGTGAGGCGGCACAAATTGCGCGAGGCAAGAGTATTAGAAGCCTACAGGATCCCGGATGTT
>seq2 adapter at 5'
ATCCCGGATGTTTCGTGAGGCGGCACAAATTGCGCGAGGCAAGAGTATTAGAAGCCTACAGG
>seq3 adapter at both
ATCCCGGATGTTTCGTGAGGCGGCACAAATTGCGCGAGGCAAGAGTATTAGAAGCCTACAGGATCCCGGATGTT
$ cutadapt --quiet -b ATCCCGGATGTT input.fa -o - 
>seq1 (3' adapter removed)
TCGTGAGGCGGCACAAATTGCGCGAGGCAAGAGTATTAGAAGCCTACAGG
>seq2 (5' adapter removed)
TCGTGAGGCGGCACAAATTGCGCGAGGCAAGAGTATTAGAAGCCTACAGG
>seq3 (only 5' adapter removed)
TCGTGAGGCGGCACAAATTGCGCGAGGCAAGAGTATTAGAAGCCTACAGGATCCCGGATGTT

Like GenoMax said, wouldn't the most straightforward approach be to let the program remove the adapter (wherever it's found) and everything 3' of it? For that you just need cutadapt's regular -a flag for a typical 3' adapter:

$ cutadapt --quiet -a ATCCCGGATGTT input.fa -o - 
>seq1
TCGTGAGGCGGCACAAATTGCGCGAGGCAAGAGTATTAGAAGCCTACAGG
>seq2

>seq3

Coupled with a length filter, you could remove the resulting empty sequences:

$ cutadapt --quiet -m 1 -a ATCCCGGATGTT input.fa -o - 
>seq1
TCGTGAGGCGGCACAAATTGCGCGAGGCAAGAGTATTAGAAGCCTACAGG

(See the filtering documention for more options, like --discard-untrimmed.)

ADD COMMENT

Login before adding your answer.

Traffic: 1785 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6