Question: Trimming Adapters
3
gravatar for newDNASeqer
5.8 years ago by
newDNASeqer630
United States
newDNASeqer630 wrote:

I am trying to do variant calling using exome-sequencing data produced by HiSeq 2000. I think I need to first trim the adapters before doing BWA alignment. I have found cutadapt program and think it is good. However, before I use cutadapt to process a large amount of data, I would like to confirm the settings with this community:

I am not exactly sure what adapters are used, but from an Illumina tech document, I found the following sequences common in all their adapter:

adapter 1 (Forward 5'). GATCGGAAGAGCACACGTCTGAACTCCAGTCAC

adapter 2 (Reverse 5'). GTAATAACCGGTT

cutadapt -a adapter1 -a adapter2 -m 25 input.fastq.gz > output_trim.fastq.gz

Is cutadapt generally recommended for trimming adapters? and are my adapter sequences used here too long? thanks

adaptor hiseq • 25k views
ADD COMMENTlink modified 3.5 years ago by Shicheng Guo7.5k • written 5.8 years ago by newDNASeqer630
7
gravatar for Shicheng Guo
3.5 years ago by
Shicheng Guo7.5k
Shicheng Guo7.5k wrote:

 

1, Check the adaptor (suppose you know it) 

gunzip -c T21.5.read1.fq.gz | grep AGATCGGAAGAG

2, Check the adaptor with fastqc (suppose you do not know it, fastqc can recognize them for you)

fastqc T21.5.read1.fq.gz

3, trim the adaptors with trim_galore

trim_galors *.fastq.gz

4. It's ok. Do the alignment with BWA, BOTIWE or any aligners. 

 

By the way:

Adapter 1 (Forward 5'). GATCGGAAGAGCACACGTCTGAACTCCAGTCAC

because there is a process to add 'A' to the end of the fragment. therefore. AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC will be right. I think. 

 

 

TruSeq Universal Adapter 

AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGATCT

RT is :

AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTAGATCTCGGTGGTCGCCGTATCATT 

 

 

ADD COMMENTlink modified 3.5 years ago • written 3.5 years ago by Shicheng Guo7.5k
BBduk is also very good at trimming adapters. It is part of BBmap. I have found that it works very well for PE Illumina sequence data, it even has the common adaptets built-in.
ADD REPLYlink written 3.5 years ago by bioguy24190

Hi, I have questions about adaptor trimming. 1. I run fastqc for my sample and the results showed Illumina Universal Adaptor contamination. Should I trim the Illumina Universal Adaptor or find the adaptor sequences based on authors' library preparation method? 2. The Illumina Universal Adaptor sequence is AGATCGGAAGAG. Why does the tutorial suggest to trim AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC instead?

ADD REPLYlink written 21 months ago by sophialovechan40
1

The adapter sequence is much longer than AGATCGGAAGAG and even longer than the second example. Now adapter recognition works by matching the start of the sequences. So both specifications will work close to the same way.

ADD REPLYlink written 21 months ago by Istvan Albert ♦♦ 80k
6
gravatar for Jelena Aleksic
5.8 years ago by
Cambridge, UK
Jelena Aleksic900 wrote:

Cutadapt is great, and it's what most people use (with or without TrimGalore). However, not all Illumina adapters are necessarily the same - e.g. for an sRNA-seq experiment, the sequences would be different from those ones. They should work in most cases though. What I ended up doing is running a script on my raw sequence files to make sure that the adapters I'm trimming are actually there, and then using cutadapt to trim them.

I also wrote a blog post about it, in case it's of interest.

ADD COMMENTlink written 5.8 years ago by Jelena Aleksic900
2

thanks for the link, these are problems that often bite one unexpectedly and very annoying to track them down - seemingly no one knows what has been put on, and they keep punting the question around

ADD REPLYlink written 5.8 years ago by Istvan Albert ♦♦ 80k
1

Can I try to remove every adapters from illumina (maybe 100 adapters totally) when the adapter is unknow? I mean in theory. Maybe it would be not reality to do it in the practice.  

ADD REPLYlink written 3.5 years ago by Shicheng Guo7.5k
2
gravatar for Mikael Huss
5.8 years ago by
Mikael Huss4.6k
Stockholm
Mikael Huss4.6k wrote:

cutadapt is fine. I have recently moved to Trim Galore, which is really just a wrapper around cutadapt which simplifies handling of paired-end reads and some other things. By default, Trim Galore looks for a 13-mer from the Illumina standard: AGATCGGAAGAGC, which is found in your adapter 1 sequence (starting from position 2 in that sequence; I am not sure why the G is not included).

ADD COMMENTlink written 5.8 years ago by Mikael Huss4.6k

Yup I have also found Trim Galore easy to use and it also takes care of the orphan reads (read pair where one read gets discarded as it can't pass the QC step) in case of paired end data. Aligners like BWA will require your forward and reverse read to follow the same order in the fastq1 and fastq 2 files.

ADD REPLYlink modified 5.8 years ago • written 5.8 years ago by Ashutosh Pandey11k

thanks for the reply. So you use the "--paired" option for your trim_galore run ? Do you recommend it for using BWA later? ps: i have paired end reads.

ADD REPLYlink written 5.8 years ago by newDNASeqer630

Yes, I use --paired and yes, I recommend it for BWA (although really I just recommend it in general, including for BWA)

ADD REPLYlink written 5.8 years ago by Mikael Huss4.6k
1
gravatar for vijay
5.8 years ago by
vijay1.5k
Chennai
vijay1.5k wrote:

Cutadapt is fine . You can also try using Fastx or NGSQC toolkits. Fastx allows you to handle with paired end data as well. As rightly pointed out by Jelena, the type and length of adapters depends on the kind of work you are performing. All these tools can effectively help you out in trimming off the adapter sequences.

ADD COMMENTlink written 5.8 years ago by vijay1.5k
1
gravatar for optimuscoprime
5.5 years ago by
optimuscoprime140 wrote:

You could also try:

https://github.com/optimuscoprime/autoadapt

It uses FastQC to detect adaptors and primers, and then cuts them with cutadapt (well, in parallel using several cutadapts)

ADD COMMENTlink written 5.5 years ago by optimuscoprime140

this tools needs documentation that describes what the tool actually does, right now is overly generic

ADD REPLYlink written 5.5 years ago by Istvan Albert ♦♦ 80k

you are quite right, I have added some more technical info to the bottom of the readme file. are there any other particular things that you would like to know?

ADD REPLYlink written 5.5 years ago by optimuscoprime140
1

that looks much better,

other observations, I would move the licensing to the end, it is really not that important, and move what the tool does first, this is what people look for, when I go to a tool I want to know what the tool does right away:

We developed a tool to automatically detect which adaptors and primers are present in a FASTQ file and remove those sequences from the file, as well as detecting the quality score encoding type used and removing low quality sequences.

...

now the section on how the tools works

now the installation usage and license

ADD REPLYlink modified 5.5 years ago • written 5.5 years ago by Istvan Albert ♦♦ 80k

thanks, I've moved the licensing info down a little

ADD REPLYlink written 5.5 years ago by optimuscoprime140

I have seen in this tutorial from ARK-Genomics that FastQC might get the adapter contaminants wrong. Have you considered these cases and handled them?

ADD REPLYlink written 4.7 years ago by eva10
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 734 users visited in the last hour