Correct way to remove Nextera adapters from ITS sequences
1
0
Entering edit mode
10 months ago
mattze731 ▴ 20

Hi everyone,

I think this question has been coming up a few times but it didn't help me solve my issue. I have FastQ files from ITS amplicon based metagenomic sequencing (ITS1/2) (300 bp) and FastQC tells me that they all have Nextera transposase adapters. The adapter contamination starts already relatively early, for example at 50 bp or 100 bp.

I used trimmomatic 0.39 to remove this contamination without any further quality trimming. My settings are:

java -jar trimmomatic-0.39.jar PE {R1_file} {R2_file} {R1_paired} {R1_unpaired} {R2_paired} {R2_unpaired} ILLUMINACLIP:NexteraPE-PE.fa:5:10:5

My next step would be to follow the dada2 ITS pipeline.

However, trimmomatic removes a lot of reads, usually between 40% and 100% per file. dada2 won't even work on my files because some paired R2 files are missing, as all reads were dropped. Similar issue when using cutadapt.

Can anyone explain what went wrong here and how I can fix this? Thank you.

adapters ITS trimmomatic • 1.4k views
ADD COMMENT
0
Entering edit mode

Some of these programs have a minimum size threshold for a read to be retained. You could play around with that to keep all trimmed reads 50nt or longer. You could also try a program that allows you to manually input the adaptor sequence (Cutadapt does this).

ADD REPLY
0
Entering edit mode

Trimmomatic allows to enter custom adapter sequences, but I don't see how this would change the outcome. Similarly, trimmomatic also allows to set minimum length of reads to keep using the setting "MINLEN:50". Previously, I didn't use it and it shouldn't have been active when not using it. Using the setting MINLEN:50, I loose just as many reads as without it:

ILLUMINACLIP: Using 1 prefix pairs, 4 forward/reverse sequences, 0 forward only sequences, 0 reverse only sequences Quality encoding detected as phred33 Input Read Pairs: 522798 Both Surviving: 90331 (17.28%) Forward Only Surviving: 301450 (57.66%) Reverse Only Surviving: 694 (0.13%) Dropped: 130323 (24.93%)

ILLUMINACLIP: Using 1 prefix pairs, 4 forward/reverse sequences, 0 forward only sequences, 0 reverse only sequences Quality encoding detected as phred33 Input Read Pairs: 285721 Both Surviving: 105796 (37.03%) Forward Only Surviving: 168809 (59.08%) Reverse Only Surviving: 117 (0.04%) Dropped: 10999 (3.85%)

etc etc

ADD REPLY
1
Entering edit mode
9 months ago
mattze731 ▴ 20

Hi all,

I wanted to give an update in case anyone else will find this helpful. I looked again at the original "NexteraPE-PE.fa" which contains:

>PrefixNX/1
AGATGTGTATAAGAGACAG
>PrefixNX/2
AGATGTGTATAAGAGACAG
>Trans1
TCGTCGGCAGCGTCAGATGTGTATAAGAGACAG
>Trans1_rc
CTGTCTCTTATACACATCTGACGCTGCCGACGA
>Trans2
GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAG
>Trans2_rc
CTGTCTCTTATACACATCTCCGAGCCCACGAGAC

I asked ChatGPT for the sequence for Nextera transposase adapters and it returned:

5' TCGTCGGCAGCGTCAGATGTGTATAAGAGACAG 3' (for Read 1)
5' GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAG 3' (for Read 2)

These are corresponding with Trans1 and Trans2. Following, I removed the PrefixN lines and only left Trans1 and Trans2 and their reverse complements:

>Trans1
TCGTCGGCAGCGTCAGATGTGTATAAGAGACAG
>Trans1_rc
CTGTCTCTTATACACATCTGACGCTGCCGACGA
>Trans2
GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAG
>Trans2_rc
CTGTCTCTTATACACATCTCCGAGCCCACGAGAC

I now have 100% survival of reads and only a dozen reads per sample are discarded. FastQC confirms that there is no more adapter contamination. I wonder if adapter removal was really necessary, given that there were so few reads discarded?

ADD COMMENT
0
Entering edit mode

I wonder if adapter removal was really necessary, given that there were so few reads discarded?

Probably not. But reads may certainly have been trimmed. They will only be completely removed when there are either no inserts or really short inserts.

Most modern aligners will soft-clip bases that do not match. So trimming is essential only when you are doing any de novo assembly work. That said trimming the data provides peace of mind that there would be no extraneous sequence left in your data for all downstream manipulations.

ADD REPLY

Login before adding your answer.

Traffic: 1779 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6