Question: When do you strictly need to know the adapter sequence to trim NGS data?
1
gravatar for MAPK
13 months ago by
MAPK1.4k
United States
MAPK1.4k wrote:

Hi All, I have a question regarding trimming of adapters in NGS data. I have previously analyzed RNAseq data using brtim (https://www.sciencedirect.com/science/article/pii/S0888754311001339) without using adapter sequence. I am now analyzing smallRNAseq data, and I am using adapter sequence as adapter.fa using BBduk.sh from BBmap tools. Could you please clarify under what circumstances one would need to provide/know adapter sequence and when it is not necessary?

adapter ngs trimming • 1.1k views
ADD COMMENTlink modified 13 months ago by Gjain5.3k • written 13 months ago by MAPK1.4k
3
gravatar for genomax
13 months ago by
genomax70k
United States
genomax70k wrote:

One case is where there is a kit/prep specific adapter that is being used. This may require special handling of the downstream data based on the instructions included in the kit.

If you had paired end reads with enough pairs having short inserts then you can detect them by doing this: bbmerge.sh in1=r1.fq in2=r2.fq outa=adapters.fa

ADD COMMENTlink modified 13 months ago • written 13 months ago by genomax70k
3
gravatar for Gjain
13 months ago by
Gjain5.3k
Munich, Germany
Gjain5.3k wrote:

This is a good question for the beginners who are starting with the NGS analysis.

In common short read sequencing, the DNA insert (original molecule to be sequenced) is downstream from the read primer, meaning that the 5' adapters will not appear in the sequenced read. But, if the fragment is shorter than the number of bases sequenced, one will sequence into the 3' adapter. To make it clear: In Illumina sequencing, adapter sequences will only occur at the 3' end of the read and only if the DNA insert is shorter than the number of sequencing cycles (see picture below)!

This is true in case of smallrna sequencing like microRNAs and piRNAs which are between 16-33 nt long. That means even with 50bp single end sequencing, 3' adapter sequences will occur. This is generally not the case with RNAseq where the gene/transcript fragments are way longer than the sequencing length and adapter sequences do not accur.

enter image description here

For details: https://www.ecseq.com/support/ngs/trimming-adapter-sequences-is-it-necessary

Also, if you are not sure which adapters are used for the sequencing, you can use an adapter predicting software like DNApi that will predict the most common adapter sequence which can be used by the tools like cutadapt or STAR before the mapping. enter image description here Paper: Tsuji, J., & Weng, Z. (2016). DNApi: a de novo adapter prediction algorithm for small RNA sequencing data. PloS one, 11(10), e0164228.

link: https://github.com/jnktsj/DNApi

I hope this helps.

ADD COMMENTlink modified 13 months ago • written 13 months ago by Gjain5.3k

Thank you so much for this fantastic answer. I do have a few more questions related to this: 1. Reads in smallRna seq have adapters in 3 prime region only so, is it always right side trimming then? 2. Thus trimmed reads when aligned to the reference genome and saved as a bam file, why do we see both sense and antisense strands. This bam file shows both strands when viewed using IGV. Why? Shouldn't they all be 5' to 3' or unidirectionally aligned?

ADD REPLYlink modified 13 months ago • written 13 months ago by MAPK1.4k
2
  • While the adapters may only be on 3'-end one may be asked to remove additional base pairs from 5'end. e.g. This NEXTflex smallRNA kit requires the following.

    1. Clip the 3' adapter sequence (TGGAATTCTCGGGTGCCAAGG).
    2. Trim the first and last 4 bases from the adapter-clipped reads

They also ask you

We do not recommend enabling the adapter trimming option or an adapter sequencing into Illumina Experiment Manager, as it may lead to unwanted masking of reads.

So always follow instructions included in kit (if one was used) or make yourself familiar with the read structure that would be expected.

  • Genes are present on both strands of DNA.
ADD REPLYlink modified 13 months ago • written 13 months ago by genomax70k

Thanks for explaining.

ADD REPLYlink written 13 months ago by MAPK1.4k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 669 users visited in the last hour