Question: Why the proportion of reads with adapters in NGS is low?
gravatar for bobbyle0210
9 days ago by
bobbyle021010 wrote:

Dear all,

Currently, I am processing a NGS data using cutadapt to remove 3p and 5p adapter sequences from reads. Normally, as I notice, the percentage of trimmed reads (reads with removed adapters) could reach 95 - 99 %. But for this data, some of my sample only contain 20 - 25 % reads with adapters.

What could be the reasons for this phenomenon? I have carefully check the cutadapt command and everything is fine. Thanks!

rna-seq ngs • 95 views
ADD COMMENTlink modified 8 days ago by ATpoint28k • written 9 days ago by bobbyle021010

Is this smallRNA-seq?

ADD REPLYlink written 8 days ago by ATpoint28k
gravatar for Istvan Albert
8 days ago by
Istvan Albert ♦♦ 82k
University Park, USA
Istvan Albert ♦♦ 82k wrote:

The "normal" expectation for NGS data is that no reads have adapters in them. The explanation for not having adapters is that the DNA fragment is longer than the read length.

The "typical" adapter read-through is an "error" that happens when the instrument gets a DNA fragment that is too short.

Unless you are using a specialized library where the DNA under study is inherently short (like microRNAs), or that you are ligating your own adapters, in which case all reads should have adapters in them.

ADD COMMENTlink written 8 days ago by Istvan Albert ♦♦ 82k
gravatar for ATpoint
8 days ago by
ATpoint28k wrote:

A properly made Illumina library should have low to no adapter contamination. The idea of sequencing is that spend the resources of your run on sequencing the actual "Sequence of Interest". That can be fragmentated genomic DNA in WGS, cDNA in RNA-seq or protein-bound DNA in ChIP-seq. Typical library preparation method produce somewhat normally distributed sequencing libraries in terms of insert size length. You choose your sequencing regime accordingly. If you have 200bp fragments it makes no sense to sequence 2x250bp as 1) the end of the reads will overlap and be therefore be redundant and 2) the last about 50bp will pick up adapter sequences which are of no use.

I in fact have never seen a library with 95-99% adapter content, which as I said is an indication for either a library prep problem or a poor choice of the type sequencing run. Even 20-25% is a lot in my experience. You sometimes have that in assays with very uneven insert size distribution like ATAC-seq in combination with longer reads such as 2x150bp where it would probably be normal to get such a percentage. Still, in line with what Istvan Albert said, you should aim for no adapter content to optimally use your sequencing resources and avoid contamination with "foreign" nucleotides as much as possible.

Edit: The only library type that I know might suffer from high adapter content is smallRNA-seq. Do you have that? smallRNAs should be shorter than even the typical shortest Illumina reads which are ~50bp so there one indeed could pick adapters in almost every read (never analyzed smallRNA-seq so far though so just guessing here). enter image description here

ADD COMMENTlink modified 8 days ago • written 8 days ago by ATpoint28k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1220 users visited in the last hour