Question

How To Best Deal With Adapter Contamination (Illumina)?

12

Entering edit mode

13.4 years ago

Gaffa ▴ 500

I've got non-trivial amounts of adapter contamination in a paired-end >100 bp read Illumina run (i.e. the machine reading technical adapters/primers rather than biological sequence). How would I best go about identifying such contaminated reads?

I know the sequence of the adapters used, but because of sequencing errors you can't simply do a straightforward regular expression pattern match. The adapter sequences are about 75 bp and seem to always begin at the very 5' end of the affected reads (though I can't be 100% that this always holds), and the remaining 3' parts of the reads seem to be nonsense low-complexity sequence, lots of homopolymers.

adaptor next-gen sequencing • 20k views

ADD COMMENT • link updated 13.2 years ago by Rna-Seq • 0 • written 13.4 years ago by Gaffa ▴ 500

score 6 · Answer 1 · 2010-11-10

6

Entering edit mode

13.4 years ago

brentp 24k

Brad Chapman has a nice post on doing this here. It checks for exact match first then does a global sequence alignment allowing a specified number of mismatches.

I believe that vectorstrip in the EMBOSS toolkit can also do this.

If you prefer BioPerl, here is one more possible solution.

ADD COMMENT • link 13.4 years ago by brentp 24k

score 3 · Answer 2 · 2010-11-10

3

Entering edit mode

13.4 years ago

Bio_X2Y ★ 4.4k

I've never used this tool, but it claims to address this issue: http://code.google.com/p/cutadapt/

ADD COMMENT • link 13.4 years ago by Bio_X2Y ★ 4.4k

0

Entering edit mode

Thanks - I have now used this program with great success.

ADD REPLY • link 12.9 years ago by Gaffa ▴ 500

0

Entering edit mode

New home for cutadapt: https://cutadapt.readthedocs.io/en/stable/

ADD REPLY • link 5.5 years ago by Christopher Bottoms ▴ 210

score 3 · Answer 3 · 2011-01-21

3

Entering edit mode

13.2 years ago

Ketil 4.1k

Another option might be the FastX toolkit, available from http://hannonlab.cshl.edu/fastx_toolkit. Specifically fastx_clipper should do what you want (although I haven't yet tested this).

ADD COMMENT • link 13.2 years ago by Ketil 4.1k

Ram · Answer 4 · 2011-06-17

I've been very happy with cutadapt - it has a lot of useful options like filtering resulting reads by length, and a straightforward way of controlling the number of mismatches allowed. It's still developed (or at least the author checks the issue-tracker regularly). I hope it'll work for paired-end reads someday!

I also tested fastx_clipper in the FASTX-Toolkit - it works well, my only quibble with it is that it's impossible to require an exact match or control the number of mutations in a direct way. The other tools in that toolkit are very useful as well.

If you're writing your own pipeline in python, you could consider the HTSeq package - it has a huge amount of useful functions such as fastq parsing and quality de-coding, and trim_right_end/trim_left_end functions for adapter stripping. I haven't used that particular one, but I've been happy with HTSeq in general.

If you're using R, the Biostrings package has a trimLRPatterns function. I haven't tried it.

score 0 · Answer 5 · 2010-11-11

For our software we also ended up using an alignment approach. You can read more about the approach in the manual section - http://www.clcbio.com/index.php?id=1330&manual=Adapter_trimming.html Clicking the individual subsections will bring you to the details.

Cheers

Roald

***Disclaimer - I work at CLC bio *****

Ram · Answer 6 · 2010-11-11

0

Entering edit mode

13.4 years ago

Bach ▴ 550

To search for adaptor contamination, I constantly use either SSAHA2 and parse the (pretty simple) result files. Some of my users have proposed SMALT (successor to SSAHA2) as they think it is more sensitive, but I need to confirm that.

Both tools are available from the Sanger Centre.

ADD COMMENT • link updated 4.6 years ago by Ram 43k • written 13.4 years ago by Bach ▴ 550

score 0 · Answer 7 · 2011-08-28

0

Entering edit mode

12.6 years ago

Rna-Seq • 0

All these tools such as cutadapt are useful for single end reads. There are not many tools for trimming adapters from paired end reads. When reads are trimmed the order of the reads will be lost between the paired end sequences if trimming removes some reads.

ADD COMMENT • link 12.6 years ago by Rna-Seq • 0

0

Entering edit mode

That's true, but it's fairly easy to re-sync paired fastq files by going through the them and removing those reads that don't have a matching pair in the other file.

ADD REPLY • link 12.6 years ago by Gaffa ▴ 500