Why the proportion of reads with adapters in NGS is low?
2
0
Entering edit mode
4.3 years ago
bobbyle0210 ▴ 10

Dear all,


Currently, I am processing a NGS data using cutadapt to remove 3p and 5p adapter sequences from reads. Normally, as I notice, the percentage of trimmed reads (reads with removed adapters) could reach 95 - 99 %. But for this data, some of my sample only contain 20 - 25 % reads with adapters.


What could be the reasons for this phenomenon? I have carefully check the cutadapt command and everything is fine. Thanks!

RNA-Seq ngs • 2.9k views
ADD COMMENT
0
Entering edit mode

Is this smallRNA-seq?

ADD REPLY
5
Entering edit mode
4.3 years ago

The "normal" expectation for NGS data is that no reads have adapters in them. The explanation for not having adapters is that the DNA fragment is longer than the read length.

The "typical" adapter read-through is an "error" that happens when the instrument gets a DNA fragment that is too short.

Unless you are using a specialized library where the DNA under study is inherently short (like microRNAs), or that you are ligating your own adapters, in which case all reads should have adapters in them.

ADD COMMENT
1
Entering edit mode
4.3 years ago
ATpoint 82k

A properly made Illumina library should have low to no adapter contamination. The idea of sequencing is that spend the resources of your run on sequencing the actual "Sequence of Interest". That can be fragmentated genomic DNA in WGS, cDNA in RNA-seq or protein-bound DNA in ChIP-seq. Typical library preparation method produce somewhat normally distributed sequencing libraries in terms of insert size length. You choose your sequencing regime accordingly. If you have 200bp fragments it makes no sense to sequence 2x250bp as 1) the end of the reads will overlap and be therefore be redundant and 2) the last about 50bp will pick up adapter sequences which are of no use.

I in fact have never seen a library with 95-99% adapter content, which as I said is an indication for either a library prep problem or a poor choice of the type sequencing run. Even 20-25% is a lot in my experience. You sometimes have that in assays with very uneven insert size distribution like ATAC-seq in combination with longer reads such as 2x150bp where it would probably be normal to get such a percentage. Still, in line with what Istvan Albert said, you should aim for no adapter content to optimally use your sequencing resources and avoid contamination with "foreign" nucleotides as much as possible.

Edit: The only library type that I know might suffer from high adapter content is smallRNA-seq. Do you have that? smallRNAs should be shorter than even the typical shortest Illumina reads which are ~50bp so there one indeed could pick adapters in almost every read (never analyzed smallRNA-seq so far though so just guessing here). enter image description here

ADD COMMENT

Login before adding your answer.

Traffic: 2678 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6