Question: mapping read is low for miRNA data after trimming
gravatar for cyn.liu
3.4 years ago by
cyn.liu0 wrote:

Hi all,

it is my first time to analyse miRNA data. I have some miRNA data , species: bos_turus, single end, read length is 75bp, I double checked with sequencing guy, they said I should trim adapter of the Illumina HiSeq 2000 miRNA protocol, 3' trimmed. I have tried to trim adapter using this command:

trim_galore -a AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC --stringency 6 001.fastq

Then I mapped my 001_trimmed.fastq using hisat2 . I got only 44.23% overall alignment rate. I have checked my read length distribution after trimming, the read peak is at 44bp.

I have no idea why I got so low mapping read. Could anyone please help me with this?

Many thanks~

cutadapt mirna • 1.8k views
ADD COMMENTlink modified 3.4 years ago by liuwei0 • written 3.4 years ago by cyn.liu0

Isn't 44bp a bit long for miRNAs? I would expect them to be more in the 20-29 range. I sounds like there is still something else in your reads that isn't RNA. Does your protocol have some adaptor other than the sequecing adaptor? Also you should make sure to throw out anything that doesn't contain the adaptor. All reads that contain an miRNA should contain the adaptor.

ADD REPLYlink written 3.4 years ago by i.sudbery7.7k

The Illumina prep cannot enrich for miRNAs per se, it simply targets the 20-40bp range (roughly) in the gel, and you get what you get. miRNAs will be in there, but are not guaranteed to be the dominant species.

ADD REPLYlink written 3.4 years ago by apa@stowers470

Its been a while since I last did miRNA analysis, but when I did, the size distribution was pretty tight around 22nt. Maybe we didn't use the Illumina prep. This was the size distribution on the last analysis I did.

In fact, I highly recommend the SequenceImp miRNA precessing pipeline. It's suite of QC plots would be very helpful in solving problems like this.

ADD REPLYlink modified 3.4 years ago • written 3.4 years ago by i.sudbery7.7k
gravatar for apa@stowers
3.4 years ago by
Kansas City
apa@stowers470 wrote:

There are many, many possible reasons why mapping is low. Kit / reagent / amplification / library construction issues, contamination, sample degradation, too-stringent aligner settings, etc, etc.

You may want to run something like FastQC on the trimmed fastq files, which could give a lot more insight into what is going on. You could also take the top 100 most-highly-occurring sequences in the fastq and just look at them, to see if something is obviously wrong.

Or, blast some of the top sequences to NCBI nt. It is also possible that you have somehow enriched for material from some genomic compartment which did not make it into the reference assembly (usually something repetitive) like ribosomes, histones, immunoglobulins, etc. But, this is less common with small-RNA preps.

Alternatively, the Illumina small-RNA library size range also enriches for piRNAs, which are most often from transposons. Depending on your aligner settings, a piRNA-rich library could have many reads which align too many times to the genome and are getting discarded.

ADD COMMENTlink modified 3.4 years ago • written 3.4 years ago by apa@stowers470
gravatar for liuwei
3.4 years ago by
liuwei0 wrote:

You can use fastx_toolkit to filt low quality reads and trim the adaptor,and then mapping to genome. For small RNA,I suggest you use bowtie to map,because it is fast and accurate for short reads. miRNA's length in ~22bp,so a possible reason is that the adaptor haven't be trimmed.You should make sure that the adaptor have be trimmed(maybe adaptor sequence isn't right),otherwize it cann't map to genome.Hope that can help you!

ADD COMMENTlink written 3.4 years ago by liuwei0
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1930 users visited in the last hour