Question

When do you strictly need to know the adapter sequence to trim NGS data?

1

Entering edit mode

6.0 years ago

MAPK ★ 2.1k

Hi All, I have a question regarding trimming of adapters in NGS data. I have previously analyzed RNAseq data using brtim (https://www.sciencedirect.com/science/article/pii/S0888754311001339) without using adapter sequence. I am now analyzing smallRNAseq data, and I am using adapter sequence as adapter.fa using BBduk.sh from BBmap tools. Could you please clarify under what circumstances one would need to provide/know adapter sequence and when it is not necessary?

adapter NGS trimming • 4.2k views

ADD COMMENT • link updated 6.0 years ago by Gjain 5.8k • written 6.0 years ago by MAPK ★ 2.1k

score 3 · Accepted Answer · 2018-07-30

One case is where there is a kit/prep specific adapter that is being used. This may require special handling of the downstream data based on the instructions included in the kit.

If you had paired end reads with enough pairs having short inserts then you can detect them by doing this: bbmerge.sh in1=r1.fq in2=r2.fq outa=adapters.fa

score 3 · Accepted Answer · 2018-07-30

3

Entering edit mode

6.0 years ago

Gjain 5.8k

This is a good question for the beginners who are starting with the NGS analysis.

In common short read sequencing, the DNA insert (original molecule to be sequenced) is downstream from the read primer, meaning that the 5' adapters will not appear in the sequenced read. But, if the fragment is shorter than the number of bases sequenced, one will sequence into the 3' adapter. To make it clear: In Illumina sequencing, adapter sequences will only occur at the 3' end of the read and only if the DNA insert is shorter than the number of sequencing cycles (see picture below)!

This is true in case of smallrna sequencing like microRNAs and piRNAs which are between 16-33 nt long. That means even with 50bp single end sequencing, 3' adapter sequences will occur. This is generally not the case with RNAseq where the gene/transcript fragments are way longer than the sequencing length and adapter sequences do not accur.

enter image description here

For details: https://www.ecseq.com/support/ngs/trimming-adapter-sequences-is-it-necessary

Also, if you are not sure which adapters are used for the sequencing, you can use an adapter predicting software like DNApi that will predict the most common adapter sequence which can be used by the tools like cutadapt or STAR before the mapping. enter image description here Paper: Tsuji, J., & Weng, Z. (2016). DNApi: a de novo adapter prediction algorithm for small RNA sequencing data. PloS one, 11(10), e0164228.

link: https://github.com/jnktsj/DNApi

I hope this helps.

ADD COMMENT • link 6.0 years ago by Gjain 5.8k

0

Entering edit mode

Thank you so much for this fantastic answer. I do have a few more questions related to this: 1. Reads in smallRna seq have adapters in 3 prime region only so, is it always right side trimming then? 2. Thus trimmed reads when aligned to the reference genome and saved as a bam file, why do we see both sense and antisense strands. This bam file shows both strands when viewed using IGV. Why? Shouldn't they all be 5' to 3' or unidirectionally aligned?

ADD REPLY • link 6.0 years ago by MAPK ★ 2.1k

2

Entering edit mode

While the adapters may only be on 3'-end one may be asked to remove additional base pairs from 5'end. e.g. This NEXTflex smallRNA kit requires the following.
1. Clip the 3' adapter sequence (TGGAATTCTCGGGTGCCAAGG).
2. Trim the first and last 4 bases from the adapter-clipped reads

They also ask you

We do not recommend enabling the adapter trimming option or an adapter sequencing into Illumina Experiment Manager, as it may lead to unwanted masking of reads.

So always follow instructions included in kit (if one was used) or make yourself familiar with the read structure that would be expected.

Genes are present on both strands of DNA.

ADD REPLY • link 6.0 years ago by GenoMax 144k

0

Entering edit mode

Thanks for explaining.

ADD REPLY • link 6.0 years ago by MAPK ★ 2.1k