Strange Trimmomatic behavior
2
0
Entering edit mode
6 weeks ago
evoclive • 0

Hello

I have Illumina sequenced fastq files. Virtually every read (although not all) starts with the triplet "TAA". I assumed these we adapters. However, when I use Trimmomatic with:

ILLUMINACLIP:Trimmomatic-0.39/adapters/TruSeq2-PE.fa:2:30:10:2:True LEADING:3 TRAILING:3 MINLEN:36


These triplets still remain. Can someone please advise what they are and how they should be dealt with.

Thanks

C

EDIT:

I see no mention of what adapters were used but the report doc states: "As for the sequencing of GBS library, the sequenced reads of 144 bp at either end are adapter-free, which could be directly subjected to quality control for low quality reads filtration. The retaining sequences in 144 bp length (namely clean data) are qualified for mapping with the reference genome". So I am puzzled why these motifs are so prevalent.

Trimmomatic • 580 views
1
Entering edit mode

you do mean 'start' as that it is present on the 5' end of the reads?

If so, it would be really strange to be adapters as they typically do not occur on the 5' end of a read (due to the sequencing protocol it's practically impossible to see them on the 5' end).

So I suspect something else might be going on. Can you provide numbers on this? how many reads do have this ...

Also, are you sure that you will need to use TruSeq2 adapter set? If I'm not mistaken that was the adapter set for illumina GAII sequencers, so unless you have some data coming from (the old) GAII sequencers , it is more likely you'll need to use TruSeq3 set (though that does not make any difference for the 5' end issue)

0
Entering edit mode

Hi so I calculate they're present in around 97% of reads. Yes, at 5' end, I tried with TruSeq3 and they're still there. Thanks

0
Entering edit mode

yes, because adapter trimming will (normally) not remove anything from the 5' end of a read.

Do you have any idea how the fragmentation was done? are you using a random protocol or was there some other manipulation involved?

0
Entering edit mode

I will try to find out - thanks for your input

1
Entering edit mode
6 weeks ago

There are many different adapter sequences, depending on the library preparation protocol and version. Here is an overview of different adapters used by Illumina. I would first check what your protocol was and what the actual adapters are, then make sure you have the correct adapter sequences in the adapter files by manually editing the files. Also, if there is a common motif, this could also be part of an indexing sequence with the adapter sequence only partially removed.

0
Entering edit mode

Hi thanks - it's paired-end, is that what you mean by protocol? I see no mention of what adapters were used but the report doc states: "As for the sequencing of GBS library, the sequenced reads of 144 bp at either end are adapter-free, which could be directly subjected to quality control for low quality reads filtration. The retaining sequences in 144 bp length (namely clean data) are qualified for mapping with the reference genome". So I am puzzled why these motifs are so prevalent.

1
Entering edit mode
6 weeks ago

It's GBS. Look at the cutters.

0
Entering edit mode

how did you derive this?

but indeed, that would make perfect sense and explain the issue here (TAA is likely the remainder of the restriction site )

1
Entering edit mode

From the OP reply to Michael Dondrup

0
Entering edit mode

ah, yeah, indeed (stupid to have missed that)

0
Entering edit mode

Ah OK so there is no need to remove them?

0
Entering edit mode

can you first confirm it is GBS data we're talking about here?

0
Entering edit mode

In my opinion not necessary. If you are not comfortable with that triad, you can remove it with any read trimmer such as cutadapt.

Btw, please post if ApeKI/MseI cutters are used in your GBS experiment.

0
Entering edit mode

sharing that opinion!

they are true sequences from the genome (it's not that they are added artificially or such), so they will all align to the genome as well.

as indicated by cpad0112 , you can remove them (even with trimmomatic, you could bluntly clip off the first 3 nucleotides of every read with it, taking into account of course you will also remove non TAA triplets)