Hi All,
I have a question regarding adapter trimming process of small RNA-seq data. The library for this dataset was prepared using NEBNext multiplex small RNA sample prep set for illumina (E7300S/L: https://www.neb.com/-/media/catalog/datacards-or-manuals/manuale7300.pdf). So I used bbduk.sh
from BBtools(https://jgi.doe.gov/data-and-tools/bbtools/bb-tools-user-guide/bbduk-guide/) using the following command:
bbduk.sh -Xmx1g in=Ago2_SsHV2L_1_CATGGC_L003_R1_001.fastq out=/media/owner/7ef86942-96a5-48a7-a325-6c5e1aec7408/trimmed_files/bbmap_trimmed/clean_Ago2_SsHV2L_1_CATGGC_L003_R1_001.fastq ref=NEB-SE_5_and_3_Prime.fa ktrim=r k=23 mink=11 hdist=1 tpe tbo
The adapter fileNEB-SE_5_and_3_Prime.fa
contains both 5' and 3' adapters:
>NEB_sRNA_read_1
AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC
>NEB_sRNA_read_2
AGATCGGAA
So the problem I have is with the trimmed file- the trimmed file now got rid of first adapter:
cat clean_Ago2_SsHV2L_1_CATGGC_L003_R1_001.fastq | head -n 20000 | grep AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC
owner@owner-HP-Z840-Workstation[bbmap_trimmed]
but it is still showing the second adapter:
owner@owner-HP-Z840-Workstation[bbmap_trimmed] cat clean_Ago2_SsHV2L_1_CATGGC_L003_R1_001.fastq | head -n 1000 | grep AGATCGGAA
TTTCTCTGAGCACTCCTTAGTACAAGATCGGAAGAGCACACGTCGAACTC
AAATGTTCTGAGGACTGGTTCTAGATCGGAAGAGCACCGTCTGAACTCCA
GATGGGCCCCGGGTTCGATTCCCGGCGAACGCACCAGATCGGAAGAGCCA
TTGGACGTGTTATTTTCAGACAAGATCGGAAGAAGCACACGTCTGAACTC
Can someone please help me understand if I need to remove both of these adapters in order to perform downstream/expression analysis? I have been using btrim to trim adapters from RNAseq data (in this case I never had to provide adapter infile), but this is the first time I am doing it with bbmap (and also with trimmomatic) for smallRNAseq data. In case of smallRNAseq data, do we normally trim both 5' and 3' adapters and have both adapter sequences in infile fortrimming? Can someone please help me understand this process? Thank you for your help in advance.
It is possible that you may need to do two rounds of trimming. Are there specific directions for this kit as far as the bioinformatic analysis is concerned? I have used a BioO kit which required removal of the adapters followed by a hard trim of a certain number of base pairs from one of the ends leaving ~22-25 bp final miRNA read.
If there are no specific data handling directions, it would help if you draw the structure of the fragment that is generated after the adapters are ligated. It would orient you as to what to expect in the actual sequence and the steps (more than one may be needed) to remove the adapters/extraneous sequence.
Other thing to try is to reduce the value of
k
to a smaller numberk=5
and re-do the trimming (removemink
). It should get the remaining pieces at the ends of the reads. This will require more RAM so allocate 10g to be safe.Thank you so much for your help. I changed the
k
parameterk=9
and that removed both adapters.I think this thread here why remove adapter just from 3' of the reads and your answer somewhat answers why trimming 3' adaptor only should be fine.