Question: Problems With Analysis Of Small Rnaseq Data - Adapter Trimming
gravatar for Bharat Iyengar
6.8 years ago by
Bombay, India
Bharat Iyengar270 wrote:

I have always faced a problem while analyzing small RNAseq data, at the step of adapter trimming.

I use fastx_clipper to clip adapters. The problem is that you can't be really sure of very small alignments between adapter and reads because they may not really be originating from the adapters, which means that you should specify a lower limit of alignment for clipping. I usually set it as 5 (intuitively).

But if a small piece of sequence really came from the adapter then it will remain and there is no way to clip it without any doubt.

The real problem comes during aligning the reads to the reference sequence. Aligners like bowtie (which i prefer to use), generally have user defined argument for number of allowed mismatches, which generally shouldn't exceed 4 both for performance and stringency.

Subsequently, you might lose a really valuable read.

An example: A certain unknown miRNA (annotated as one in mirbase) is 28 nt long. If I clip adapters with minimum alignment of 5 then I end up having 2-3 nucleotides of adapter in one of the highly expressed miRNA in that tissue, which then goes undetected because of mismatch in alignment.


To avoid this problem I sometimes trim the reads to around 25nt (for miRNA profiling). This creates a new problem: You can't really distinguish whether the read came from a pre-miRNA (a longer RNA) or from mature-miRNA (smaller RNA that arises by processing of pre-miRNA)

If I already have some idea of what I am going to get after analysis, I can tweak it somehow (for e.g by grep-ing the first 2/3/4 letters of the adapter sequence in the miRNA database to find out the chance of random occurrence of adapter like sequences). But if I am looking for something new it wont help.

Does anyone have an experience or an idea about how to solve this problem?

rnaseq • 6.1k views
ADD COMMENTlink modified 6.8 years ago by Jeremy Leipzig18k • written 6.8 years ago by Bharat Iyengar270

One question: How long are the unclipped reads you use? The ones I used till now where something ~36nt. But with that length you would expect an alignment of ~10nt and not 5, or less.

ADD REPLYlink written 6.8 years ago by David Langenberger9.2k

Well are right.. i didnt think about it.. my mistake..

but isnt illumina small RNAseq adapter : TCGTATGCCGTCTTCTGCTTGT.... I sometimes end up getting reads which have some extra nucleotides in the end which are neither from pre-miRNA nor from adapter. Some don't even align back to the genome.

For example this read "AGGCAGTGTAGTTAGCTGATTGCAT" (post clipping) doesn't align to genome. Removing last 2 bases aligns it to mmu-mir-34c-5p. "AT" is not coming from the adapter. If i remove the last three bases this is the top miRNA. If i dont then it drops to 20th. The phred scores in that region are decent.

ADD REPLYlink written 6.8 years ago by Bharat Iyengar270

Perhaps these nucleotides you want to cut are actually correct?!? MicroRNA 3′ nontemplated nucleotide additions are actually quite common and can be found in all short RNAseq experiments. I would recommend you to allow more errors for the mapping and analyse what you get, instead of changing your data (and nature) by hand. :) (read e.g.

ADD REPLYlink written 6.8 years ago by David Langenberger9.2k

thats what i finally did.. i use this algorithm called mirdeep.. i just modified it a little so that it runs bowtie in the n-alignment mode.. considering the biology of miRNAs i allow no mismatches in first 10-15 nt but allow 2-3 mismatches (corresponding to phred score 80-120) in the remaining region.. is this strategy okay ??

ADD REPLYlink written 6.8 years ago by Bharat Iyengar270
gravatar for Sean Davis
6.8 years ago by
Sean Davis25k
National Institutes of Health, Bethesda, MD
Sean Davis25k wrote:

One approach I have seen used is to:

  1. Do minimal trimming
  2. Discard reads shorter than a threshold (perhaps 20bp)
  3. Align with relatively high stringency
  4. Trim one base from each unaligned read
  5. Repeat steps 2-4 until no unaligned reads remain
ADD COMMENTlink written 6.8 years ago by Sean Davis25k
gravatar for Jeremy Leipzig
6.8 years ago by
Philadelphia, PA
Jeremy Leipzig18k wrote:

Novoalign is really good for miRNAs because you can do full needleman-wunsch global alignment and it will handle the adapter trimming as part of the alignment process. The decision of whether to trim a particular base or not is informed by the alignment itself but you're not relying on a local alignment, which would be unwise.

ADD COMMENTlink modified 6.8 years ago • written 6.8 years ago by Jeremy Leipzig18k
gravatar for Martin A Hansen
6.8 years ago by
Martin A Hansen3.0k
Martin A Hansen3.0k wrote:

This exactly the reason for find_adaptor in Biopieces

ADD COMMENTlink written 6.8 years ago by Martin A Hansen3.0k

thanks.. this is what i was looking for i wonder why i nobody told me about that before..

ADD REPLYlink written 6.8 years ago by Bharat Iyengar270
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 720 users visited in the last hour