Question

Trimming Adapters

4

Entering edit mode

10.8 years ago

newDNASeqer ▴ 760

I am trying to do variant calling using exome-sequencing data produced by HiSeq 2000. I think I need to first trim the adapters before doing BWA alignment. I have found cutadapt program and think it is good. However, before I use cutadapt to process a large amount of data, I would like to confirm the settings with this community:

I am not exactly sure what adapters are used, but from an Illumina tech document, I found the following sequences common in all their adapter:

adapter 1 (Forward 5'). GATCGGAAGAGCACACGTCTGAACTCCAGTCAC

adapter 2 (Reverse 5'). GTAATAACCGGTT

cutadapt -a adapter1 -a adapter2 -m 25 input.fastq.gz > output_trim.fastq.gz

Is cutadapt generally recommended for trimming adapters? and are my adapter sequences used here too long? thanks

adaptor hiseq • 34k views

ADD COMMENT • link updated 8.4 years ago by Shicheng Guo ★ 9.4k • written 10.8 years ago by newDNASeqer ▴ 760

Ram · Answer 1 · 2015-11-15

8

Entering edit mode

8.4 years ago

Shicheng Guo ★ 9.4k

Check the adaptor (suppose you know it)

gunzip -c T21.5.read1.fq.gz | grep AGATCGGAAGAG

Check the adaptor with fastqc (suppose you do not know it, fastqc can recognize them for you)
```
fastqc T21.5.read1.fq.gz
```
trim the adaptors with trim_galore
```
trim_galors *.fastq.gz
```
It's ok. Do the alignment with BWA, BOTIWE or any aligners.

By the way:

Adapter 1 (Forward 5'). GATCGGAAGAGCACACGTCTGAACTCCAGTCAC

because there is a process to add 'A' to the end of the fragment. therefore.

AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC

will be right. I think.

TruSeq Universal Adapter

AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGATCT

RT is:

AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTAGATCTCGGTGGTCGCCGTATCATT

ADD COMMENT • link updated 4.4 years ago by Ram 43k • written 8.4 years ago by Shicheng Guo ★ 9.4k

0

Entering edit mode

BBduk is also very good at trimming adapters. It is part of BBmap. I have found that it works very well for PE Illumina sequence data, it even has the common adaptets built-in.

ADD REPLY • link 8.4 years ago by bioguy24 ▴ 230

0

Entering edit mode

Could you please provide the code you are using? I'm trying to do that but it seems that I have to trim both left and right adaptors (ktril=r ktrim=l). In other case in second read adaptors are still present.

ADD REPLY • link 3.2 years ago by boczniak767 ▴ 850

0

Entering edit mode

Hi, I have questions about adaptor trimming. 1. I run fastqc for my sample and the results showed Illumina Universal Adaptor contamination. Should I trim the Illumina Universal Adaptor or find the adaptor sequences based on authors' library preparation method? 2. The Illumina Universal Adaptor sequence is AGATCGGAAGAG. Why does the tutorial suggest to trim AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC instead?

ADD REPLY • link 6.7 years ago by sophialovechan ▴ 80

1

Entering edit mode

The adapter sequence is much longer than AGATCGGAAGAG and even longer than the second example. Now adapter recognition works by matching the start of the sequences. So both specifications will work close to the same way.

ADD REPLY • link 6.7 years ago by Istvan Albert 100k

Ram · Answer 2 · 2013-07-22

7

Entering edit mode

10.8 years ago

Jelena Aleksic ▴ 920

Cutadapt is great, and it's what most people use (with or without TrimGalore). However, not all Illumina adapters are necessarily the same - e.g. for an sRNA-seq experiment, the sequences would be different from those ones. They should work in most cases though. What I ended up doing is running a script on my raw sequence files to make sure that the adapters I'm trimming are actually there, and then using cutadapt to trim them.

I also wrote a blog post about it, in case it's of interest.

ADD COMMENT • link 10.8 years ago by Jelena Aleksic ▴ 920

2

Entering edit mode

thanks for the link, these are problems that often bite one unexpectedly and very annoying to track them down - seemingly no one knows what has been put on, and they keep punting the question around

ADD REPLY • link 10.7 years ago by Istvan Albert 100k

1

Entering edit mode

Can I try to remove every adapters from illumina (maybe 100 adapters totally) when the adapter is unknow? I mean in theory. Maybe it would be not reality to do it in the practice.

ADD REPLY • link updated 4.4 years ago by Ram 43k • written 8.4 years ago by Shicheng Guo ★ 9.4k

score 3 · Answer 3 · 2013-07-22

3

Entering edit mode

10.8 years ago

Mikael Huss 4.8k

cutadapt is fine. I have recently moved to Trim Galore, which is really just a wrapper around cutadapt which simplifies handling of paired-end reads and some other things. By default, Trim Galore looks for a 13-mer from the Illumina standard: AGATCGGAAGAGC, which is found in your adapter 1 sequence (starting from position 2 in that sequence; I am not sure why the G is not included).

ADD COMMENT • link 10.8 years ago by Mikael Huss 4.8k

0

Entering edit mode

Yup I have also found Trim Galore easy to use and it also takes care of the orphan reads (read pair where one read gets discarded as it can't pass the QC step) in case of paired end data. Aligners like BWA will require your forward and reverse read to follow the same order in the fastq1 and fastq 2 files.

ADD REPLY • link 10.8 years ago by Ashutosh Pandey 12k

0

Entering edit mode

thanks for the reply. So you use the "--paired" option for your trim_galore run ? Do you recommend it for using BWA later? ps: i have paired end reads.

ADD REPLY • link 10.8 years ago by newDNASeqer ▴ 760

0

Entering edit mode

Yes, I use --paired and yes, I recommend it for BWA (although really I just recommend it in general, including for BWA)

ADD REPLY • link 10.8 years ago by Mikael Huss 4.8k

score 1 · Answer 4 · 2013-07-23

Cutadapt is fine . You can also try using Fastx or NGSQC toolkits. Fastx allows you to handle with paired end data as well. As rightly pointed out by Jelena, the type and length of adapters depends on the kind of work you are performing. All these tools can effectively help you out in trimming off the adapter sequences.

score 1 · Answer 5 · 2013-11-08

1

Entering edit mode

10.5 years ago

optimuscoprime ▴ 140

You could also try:

https://github.com/optimuscoprime/autoadapt

It uses FastQC to detect adaptors and primers, and then cuts them with cutadapt (well, in parallel using several cutadapts)

ADD COMMENT • link 10.5 years ago by optimuscoprime ▴ 140

0

Entering edit mode

this tools needs documentation that describes what the tool actually does, right now is overly generic

ADD REPLY • link 10.5 years ago by Istvan Albert 100k

0

Entering edit mode

you are quite right, I have added some more technical info to the bottom of the readme file. are there any other particular things that you would like to know?

ADD REPLY • link 10.5 years ago by optimuscoprime ▴ 140

1

Entering edit mode

that looks much better,

other observations, I would move the licensing to the end, it is really not that important, and move what the tool does first, this is what people look for, when I go to a tool I want to know what the tool does right away:

We developed a tool to automatically detect which adaptors and primers are present in a FASTQ file and remove those sequences from the file, as well as detecting the quality score encoding type used and removing low quality sequences.

...

now the section on how the tools works

now the installation usage and license

ADD REPLY • link 10.5 years ago by Istvan Albert 100k

0

Entering edit mode

thanks, I've moved the licensing info down a little

ADD REPLY • link 10.5 years ago by optimuscoprime ▴ 140

0

Entering edit mode

I have seen in this tutorial from ARK-Genomics that FastQC might get the adapter contaminants wrong. Have you considered these cases and handled them?

ADD REPLY • link 9.6 years ago by eva ▴ 20