Question: How can I deal with adapter contamination in next-gen sequencing reads?
gravatar for gbdias
3.7 years ago by
gbdias80 wrote:

Hey guys,

After browsing similar questions and trying to use the "friendly" tools available, I concluded that adapter removing is not trivial at all for non-expert users. At least not for some datasets. So I have a few questions, If you could help me with any of those it would be really nice.

  • How do I know what adapters are present in my reads? (Fastqc report shows several hits with Illumina Multiplexing PCR primer 2.0.1, but clipping it's sequence won't clean all reads and reports will keep showing this contamination). Shouldn't I know the adapter just by knowing the library prep kit used?
  • Why don't all reads have adapters?
  • If I use Cutadapt with the first 13bp of Illumina universal adapter (AGATCGGAAGAGC) over half of my dataset is lost in clipping (20Gb to 9Gb). Also, Fastqc will still show adapter contamination. Can I trust this clipping?
ADD COMMENTlink modified 2.7 years ago by Biostar ♦♦ 20 • written 3.7 years ago by gbdias80

I am using Adapter Removal. It identifies adapters on it's own. Also add quality filter, it's worth it.

Why not all reads ahve adapters? Beacause clipping them is part of the instrument software before you get your FASTQs

You can also run prinseq before and after Adapter Removal. By looking at sequences lengths, you should be left with only one peak. Also looking at duplications section gives insight about any adapters that may be present

ADD REPLYlink modified 3.7 years ago • written 3.7 years ago by stolarek.ir580
gravatar for 5heikki
3.7 years ago by
5heikki8.4k wrote:

Try trim galore. It's really a wrapper for cutadapt and fastqc, but IMO does the job very nicely.

ADD COMMENTlink modified 3.7 years ago • written 3.7 years ago by 5heikki8.4k

I've just used trim_galore (arguments below) to trim the adapter sequences off of fastq files from Illumina hiseq 4000 run TruSeq prep. It seemed to work well, running it in the default mode to auto-detect adapters and remove them, as well as remove any bases with phred score < 5, but my fastQC reports for some files show that Illumina Single End PCR primer or TruSeq Adapter, Index 7, remain in certain samples (0.15 % and 0.53 %, respectively).

Do I have to run cutadapt again and feed it these specific sequences to remove? I have many samples and searching through each report for specific adapters to remove in a second cutadapt run is not ideal.

Was I not stringent enough in trimming?

Do I need to get rid of the remaining contaminants to perform differential gene expression analysis? 

trim_galore --paired -q 5 -o /output/path/ --fastqc_args "--outdir /fastqc/output/path/" sample_R1.fastq.gz sample_R2.fastq.gz


ADD REPLYlink written 3.6 years ago by robvanner0
gravatar for Brian Bushnell
3.7 years ago by
Walnut Creek, USA
Brian Bushnell16k wrote:

I suggest you try BBDuk.  It's both more sensitive and more specific than other adapter trimmers, as it can trim by overlap detection in addition to sequence matching, to remove even 1bp of adapter at the very end.  It comes with all of the standard Illumina adapter sequences in /resources/adapters.fa

Usage: in1=r1.fq in2=r2.fq out=trimmed#.fq ref=adapters.fa tbo tpe k=23 mink=11 hdist=1 ktrim=r ftm=5

If you run BBMerge (also included) like this: in1=r1.fq in2=r2.fq ihist=ihist.txt reads=1m xloose

...then you will see the insert size distribution of your reads.  Reads with insert sizes less than the read length contain adapter sequence.  So, that will show you the amount of data you should expect to lose via adapter-trimming, not including adapter-dimers, which will be totally eliminated but don't show up on an insert size plot.

ADD COMMENTlink written 3.7 years ago by Brian Bushnell16k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1837 users visited in the last hour