Question: Trimming Adapters
gravatar for newDNASeqer
6.7 years ago by
United States
newDNASeqer680 wrote:

I am trying to do variant calling using exome-sequencing data produced by HiSeq 2000. I think I need to first trim the adapters before doing BWA alignment. I have found cutadapt program and think it is good. However, before I use cutadapt to process a large amount of data, I would like to confirm the settings with this community:

I am not exactly sure what adapters are used, but from an Illumina tech document, I found the following sequences common in all their adapter:


adapter 2 (Reverse 5'). GTAATAACCGGTT

cutadapt -a adapter1 -a adapter2 -m 25 input.fastq.gz > output_trim.fastq.gz

Is cutadapt generally recommended for trimming adapters? and are my adapter sequences used here too long? thanks

adaptor hiseq • 27k views
ADD COMMENTlink modified 4.4 years ago by Shicheng Guo8.1k • written 6.7 years ago by newDNASeqer680
gravatar for Shicheng Guo
4.4 years ago by
Shicheng Guo8.1k
Shicheng Guo8.1k wrote:
  1. Check the adaptor (suppose you know it)

    gunzip -c T21.5.read1.fq.gz | grep AGATCGGAAGAG
  2. Check the adaptor with fastqc (suppose you do not know it, fastqc can recognize them for you)

    fastqc T21.5.read1.fq.gz
  3. trim the adaptors with trim_galore

    trim_galors *.fastq.gz
  4. It's ok. Do the alignment with BWA, BOTIWE or any aligners.

By the way:


because there is a process to add 'A' to the end of the fragment. therefore.


will be right. I think.

TruSeq Universal Adapter


RT is:

ADD COMMENTlink modified 4 months ago by RamRS26k • written 4.4 years ago by Shicheng Guo8.1k
BBduk is also very good at trimming adapters. It is part of BBmap. I have found that it works very well for PE Illumina sequence data, it even has the common adaptets built-in.
ADD REPLYlink written 4.4 years ago by bioguy24190

Hi, I have questions about adaptor trimming. 1. I run fastqc for my sample and the results showed Illumina Universal Adaptor contamination. Should I trim the Illumina Universal Adaptor or find the adaptor sequences based on authors' library preparation method? 2. The Illumina Universal Adaptor sequence is AGATCGGAAGAG. Why does the tutorial suggest to trim AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC instead?

ADD REPLYlink written 2.6 years ago by sophialovechan50

The adapter sequence is much longer than AGATCGGAAGAG and even longer than the second example. Now adapter recognition works by matching the start of the sequences. So both specifications will work close to the same way.

ADD REPLYlink written 2.6 years ago by Istvan Albert ♦♦ 83k
gravatar for Jelena Aleksic
6.7 years ago by
Cambridge, UK
Jelena Aleksic910 wrote:

Cutadapt is great, and it's what most people use (with or without TrimGalore). However, not all Illumina adapters are necessarily the same - e.g. for an sRNA-seq experiment, the sequences would be different from those ones. They should work in most cases though. What I ended up doing is running a script on my raw sequence files to make sure that the adapters I'm trimming are actually there, and then using cutadapt to trim them.

I also wrote a blog post about it, in case it's of interest.

ADD COMMENTlink written 6.7 years ago by Jelena Aleksic910

thanks for the link, these are problems that often bite one unexpectedly and very annoying to track them down - seemingly no one knows what has been put on, and they keep punting the question around

ADD REPLYlink written 6.7 years ago by Istvan Albert ♦♦ 83k

Can I try to remove every adapters from illumina (maybe 100 adapters totally) when the adapter is unknow? I mean in theory. Maybe it would be not reality to do it in the practice.

ADD REPLYlink modified 4 months ago by RamRS26k • written 4.4 years ago by Shicheng Guo8.1k
gravatar for Mikael Huss
6.7 years ago by
Mikael Huss4.7k
Mikael Huss4.7k wrote:

cutadapt is fine. I have recently moved to Trim Galore, which is really just a wrapper around cutadapt which simplifies handling of paired-end reads and some other things. By default, Trim Galore looks for a 13-mer from the Illumina standard: AGATCGGAAGAGC, which is found in your adapter 1 sequence (starting from position 2 in that sequence; I am not sure why the G is not included).

ADD COMMENTlink written 6.7 years ago by Mikael Huss4.7k

Yup I have also found Trim Galore easy to use and it also takes care of the orphan reads (read pair where one read gets discarded as it can't pass the QC step) in case of paired end data. Aligners like BWA will require your forward and reverse read to follow the same order in the fastq1 and fastq 2 files.

ADD REPLYlink modified 6.7 years ago • written 6.7 years ago by Ashutosh Pandey12k

thanks for the reply. So you use the "--paired" option for your trim_galore run ? Do you recommend it for using BWA later? ps: i have paired end reads.

ADD REPLYlink written 6.7 years ago by newDNASeqer680

Yes, I use --paired and yes, I recommend it for BWA (although really I just recommend it in general, including for BWA)

ADD REPLYlink written 6.7 years ago by Mikael Huss4.7k
gravatar for vijay
6.7 years ago by
vijay1.5k wrote:

Cutadapt is fine . You can also try using Fastx or NGSQC toolkits. Fastx allows you to handle with paired end data as well. As rightly pointed out by Jelena, the type and length of adapters depends on the kind of work you are performing. All these tools can effectively help you out in trimming off the adapter sequences.

ADD COMMENTlink written 6.7 years ago by vijay1.5k
gravatar for optimuscoprime
6.4 years ago by
optimuscoprime150 wrote:

You could also try:

It uses FastQC to detect adaptors and primers, and then cuts them with cutadapt (well, in parallel using several cutadapts)

ADD COMMENTlink written 6.4 years ago by optimuscoprime150

this tools needs documentation that describes what the tool actually does, right now is overly generic

ADD REPLYlink written 6.4 years ago by Istvan Albert ♦♦ 83k

you are quite right, I have added some more technical info to the bottom of the readme file. are there any other particular things that you would like to know?

ADD REPLYlink written 6.4 years ago by optimuscoprime150

that looks much better,

other observations, I would move the licensing to the end, it is really not that important, and move what the tool does first, this is what people look for, when I go to a tool I want to know what the tool does right away:

We developed a tool to automatically detect which adaptors and primers are present in a FASTQ file and remove those sequences from the file, as well as detecting the quality score encoding type used and removing low quality sequences.


now the section on how the tools works

now the installation usage and license

ADD REPLYlink modified 6.4 years ago • written 6.4 years ago by Istvan Albert ♦♦ 83k

thanks, I've moved the licensing info down a little

ADD REPLYlink written 6.4 years ago by optimuscoprime150

I have seen in this tutorial from ARK-Genomics that FastQC might get the adapter contaminants wrong. Have you considered these cases and handled them?

ADD REPLYlink written 5.5 years ago by eva10
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 916 users visited in the last hour