Question

mapping read is low for miRNA data after trimming

0

Entering edit mode

7.5 years ago

cyn.liu • 0

Hi all,

it is my first time to analyse miRNA data. I have some miRNA data , species: bos_turus, single end, read length is 75bp, I double checked with sequencing guy, they said I should trim adapter of the Illumina HiSeq 2000 miRNA protocol, 3' trimmed. I have tried to trim adapter using this command:

trim_galore -a AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC --stringency 6 001.fastq

Then I mapped my 001_trimmed.fastq using hisat2 . I got only 44.23% overall alignment rate. I have checked my read length distribution after trimming, the read peak is at 44bp.

I have no idea why I got so low mapping read. Could anyone please help me with this?

Many thanks~

miRNA cutadapt • 3.4k views

ADD COMMENT • link updated 7.5 years ago by liuwei • 0 • written 7.5 years ago by cyn.liu • 0

0

Entering edit mode

Isn't 44bp a bit long for miRNAs? I would expect them to be more in the 20-29 range. I sounds like there is still something else in your reads that isn't RNA. Does your protocol have some adaptor other than the sequecing adaptor? Also you should make sure to throw out anything that doesn't contain the adaptor. All reads that contain an miRNA should contain the adaptor.

ADD REPLY • link 7.5 years ago by i.sudbery 20k

0

Entering edit mode

The Illumina prep cannot enrich for miRNAs per se, it simply targets the 20-40bp range (roughly) in the gel, and you get what you get. miRNAs will be in there, but are not guaranteed to be the dominant species.

ADD REPLY • link 7.5 years ago by apa@stowers ▴ 600

0

Entering edit mode

Its been a while since I last did miRNA analysis, but when I did, the size distribution was pretty tight around 22nt. Maybe we didn't use the Illumina prep. This was the size distribution on the last analysis I did.

In fact, I highly recommend the SequenceImp miRNA precessing pipeline. It's suite of QC plots would be very helpful in solving problems like this.

ADD REPLY • link 7.5 years ago by i.sudbery 20k

score 1 · Answer 1 · 2017-01-10

There are many, many possible reasons why mapping is low. Kit / reagent / amplification / library construction issues, contamination, sample degradation, too-stringent aligner settings, etc, etc.

You may want to run something like FastQC on the trimmed fastq files, which could give a lot more insight into what is going on. You could also take the top 100 most-highly-occurring sequences in the fastq and just look at them, to see if something is obviously wrong.

Or, blast some of the top sequences to NCBI nt. It is also possible that you have somehow enriched for material from some genomic compartment which did not make it into the reference assembly (usually something repetitive) like ribosomes, histones, immunoglobulins, etc. But, this is less common with small-RNA preps.

Alternatively, the Illumina small-RNA library size range also enriches for piRNAs, which are most often from transposons. Depending on your aligner settings, a piRNA-rich library could have many reads which align too many times to the genome and are getting discarded.

score 0 · Answer 2 · 2017-01-11

You can use fastx_toolkit to filt low quality reads and trim the adaptor,and then mapping to genome. For small RNA,I suggest you use bowtie to map,because it is fast and accurate for short reads. miRNA's length in ~22bp,so a possible reason is that the adaptor haven't be trimmed.You should make sure that the adaptor have be trimmed(maybe adaptor sequence isn't right)，otherwize it cann't map to genome.Hope that can help you!