Question: How do I completely get rid of adapter sequence
1
gravatar for MAPK
7 months ago by
MAPK1.4k
United States
MAPK1.4k wrote:

I have single-end smallRNAseq data and I have to trim the adapter (adapter sequence: AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC). After trimming the adapter using the bbduk command mentioned below, I checked the reads in my data and they still have the overhangs of the adapter as shown below.

The adapter overhangs can be seen below:

cat bbduk_trimmed_small_RNA_001.fastq | grep TCGCAGGGAAATCATCTGATTA

TCGCAGGGAAATCATCTGATTAGATCGGAAGA
TCGCAGGGAAATCATCTGATTAGATCGGAA
TCGCAGGGAAATCATCTGATTAAGATCGGAAGAA
TTCGCAGGGAAATCATCTGATTAGAA

or can be seen here: https://postimg.cc/image/knuvtyv0d/

This trimming was done using bbduk.sh (bbmap tools: https://jgi.doe.gov/data-and-tools/bbtools/bb-tools-user-guide/bbduk-guide/). Here NEB-SE.fa file has the adapter sequence mentioned above.

bbduk.sh -Xmx1g in=small_RNA_001.fastq out=bbduk_trimmed_small_RNA_001.fastq ref=NEB-SE.fa ktrim=r k=13 mink=6 minlength=18 hdist=0

I could reduce the kmer size k=4, but that would risk into trimming the false positives. How can I completely get rid of these adapter sequences from my data without trimming false positives?

adapter smallrnaseq trimming • 306 views
ADD COMMENTlink modified 7 months ago • written 7 months ago by MAPK1.4k
4
gravatar for genomax
7 months ago by
genomax64k
United States
genomax64k wrote:

You can't have it both ways. You can either be strict about trimming and risk losing a few reads or have a few reads left that may have a bit of adapter. Your aligner should deal with any extraneous bases while it does the alignment. Have you tried to align this data to see what fractions aligns? With smallRNA you are looking for a specific length (~21-25 bp). If you reads are failing that length criteria after trimming you may want to discard them since they may not be what you are looking for.

BTW: I am curious as to what the original reads in the example you included look like (or is that the full original sequence)?

ADD COMMENTlink modified 7 months ago • written 7 months ago by genomax64k

Yes, that's the full original sequence in the example. I had separated my data into aligned fastq and unaligned fastq files and randomly selected that read from aligned fasq file to look into unaligned fastq dataset. Turns the unaligned dataset also has this read but with adapter overhangs.

ADD REPLYlink written 7 months ago by MAPK1.4k

I did align the dataset using bowtie with -v 1 option which allows for one mismatch. Is that what you were referring to when you said "Your aligner should deal with any extraneous bases while it does the alignment."?

ADD REPLYlink modified 7 months ago • written 7 months ago by MAPK1.4k

It would appear that bowtie is not able to soft-clip the adapter sequences.

Since you have BBTools installed can you align your data with bbmap.sh using ambig=all vslow perfectmode maxsites=1000 (@Brian recommended these parameters with this note: It should be very fast in that mode (despite the vslow flag). Vslow mainly removes masking of low-complexity repetitive kmers, which is not usually a problem but can be with extremely short sequences like microRNAs)

ADD REPLYlink modified 7 months ago • written 7 months ago by genomax64k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1070 users visited in the last hour