Question

Correct way to remove Nextera adapters from ITS sequences

0

Entering edit mode

10 months ago

mattze731 ▴ 20

Hi everyone,

I think this question has been coming up a few times but it didn't help me solve my issue. I have FastQ files from ITS amplicon based metagenomic sequencing (ITS1/2) (300 bp) and FastQC tells me that they all have Nextera transposase adapters. The adapter contamination starts already relatively early, for example at 50 bp or 100 bp.

I used trimmomatic 0.39 to remove this contamination without any further quality trimming. My settings are:

java -jar trimmomatic-0.39.jar PE {R1_file} {R2_file} {R1_paired} {R1_unpaired} {R2_paired} {R2_unpaired} ILLUMINACLIP:NexteraPE-PE.fa:5:10:5

My next step would be to follow the dada2 ITS pipeline.

However, trimmomatic removes a lot of reads, usually between 40% and 100% per file. dada2 won't even work on my files because some paired R2 files are missing, as all reads were dropped. Similar issue when using cutadapt.

Can anyone explain what went wrong here and how I can fix this? Thank you.

adapters ITS trimmomatic • 1.4k views

ADD COMMENT • link updated 9 months ago by GenoMax 141k • written 10 months ago by mattze731 ▴ 20

0

Entering edit mode

Some of these programs have a minimum size threshold for a read to be retained. You could play around with that to keep all trimmed reads 50nt or longer. You could also try a program that allows you to manually input the adaptor sequence (Cutadapt does this).

ADD REPLY • link 10 months ago by Trivas ★ 1.7k

0

Entering edit mode

Trimmomatic allows to enter custom adapter sequences, but I don't see how this would change the outcome. Similarly, trimmomatic also allows to set minimum length of reads to keep using the setting "MINLEN:50". Previously, I didn't use it and it shouldn't have been active when not using it. Using the setting MINLEN:50, I loose just as many reads as without it:

ILLUMINACLIP: Using 1 prefix pairs, 4 forward/reverse sequences, 0 forward only sequences, 0 reverse only sequences Quality encoding detected as phred33 Input Read Pairs: 522798 Both Surviving: 90331 (17.28%) Forward Only Surviving: 301450 (57.66%) Reverse Only Surviving: 694 (0.13%) Dropped: 130323 (24.93%)

ILLUMINACLIP: Using 1 prefix pairs, 4 forward/reverse sequences, 0 forward only sequences, 0 reverse only sequences Quality encoding detected as phred33 Input Read Pairs: 285721 Both Surviving: 105796 (37.03%) Forward Only Surviving: 168809 (59.08%) Reverse Only Surviving: 117 (0.04%) Dropped: 10999 (3.85%)

etc etc

ADD REPLY • link 10 months ago by mattze731 ▴ 20

score 1 · Accepted Answer · 2023-07-03

Hi all,

I wanted to give an update in case anyone else will find this helpful. I looked again at the original "NexteraPE-PE.fa" which contains:

>PrefixNX/1
AGATGTGTATAAGAGACAG
>PrefixNX/2
AGATGTGTATAAGAGACAG
>Trans1
TCGTCGGCAGCGTCAGATGTGTATAAGAGACAG
>Trans1_rc
CTGTCTCTTATACACATCTGACGCTGCCGACGA
>Trans2
GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAG
>Trans2_rc
CTGTCTCTTATACACATCTCCGAGCCCACGAGAC

I asked ChatGPT for the sequence for Nextera transposase adapters and it returned:

5' TCGTCGGCAGCGTCAGATGTGTATAAGAGACAG 3' (for Read 1)
5' GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAG 3' (for Read 2)

These are corresponding with Trans1 and Trans2. Following, I removed the PrefixN lines and only left Trans1 and Trans2 and their reverse complements:

>Trans1
TCGTCGGCAGCGTCAGATGTGTATAAGAGACAG
>Trans1_rc
CTGTCTCTTATACACATCTGACGCTGCCGACGA
>Trans2
GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAG
>Trans2_rc
CTGTCTCTTATACACATCTCCGAGCCCACGAGAC

I now have 100% survival of reads and only a dozen reads per sample are discarded. FastQC confirms that there is no more adapter contamination. I wonder if adapter removal was really necessary, given that there were so few reads discarded?