Question

Best/right way to quantify small RNA transcripts

3

Entering edit mode

6.2 years ago

Chirag Nepal ★ 2.4k

Hi there,

I want to quantify read counts of small RNAs into reads per million (RPM). Which is the correct/best way to do ?

small RNAs map to multiple locations in the genome. Here is the plot to show length of sequence reads, which are of variable length.

https://www.dropbox.com/s/10dn098492zst9i/Length.png?dl=0

I mapped with bowtie2. Let's look at mapping statistics of one of the mapped libraries

2601833 reads; of these: 2601833 (100.00%) were unpaired; of these:

**352919 (13.56%) aligned 0 times
362282 (13.92%) aligned exactly 1 time
1886632 (72.51%) aligned >1 times**

86.44% overall alignment rate

Total_number_mapped_reads= 362282 + 1886632 = 2248914

The problem is that "1886632" reads are mapped multiple times, hence Total_number_mapped_reads(location) will be higher than 2248914. If we consider multi mapped reads, **Total_number_mapped_reads is 5432552.

So, what number should i use for Total_number_mapped_reads to compute reads per million

thanks !!

small RNA-seq RPM multi-mapping • 6.4k views

ADD COMMENT • link updated 11 months ago by Sergio • 0 • written 6.2 years ago by Chirag Nepal ★ 2.4k

score 4 · Answer 1 · 2018-02-21

4

Entering edit mode

6.2 years ago

igor 13k

Most RNA-seq tools aren't really designed for very short sequences that you get from small RNA-seq. I would suggest using a miRNA-specific tool, such as:

miRge - https://baraslab.github.io/miRge/
miRDeep2 - https://github.com/rajewsky-lab/mirdeep2
miRExpress - http://mirexpress.mbc.nctu.edu.tw/

You can check some previous posts for miRNA-seq processing:

ADD COMMENT • link 6.2 years ago by igor 13k

0

Entering edit mode

Thanks Igor !!

I updated in the above post to clarify further.

This is not a typical miRNA-seq data which is enriched for 18-30 nt RNAs. Mapping is fine with bowtie2, get around 85-90% of mapping. The only concern i am having is regarding normalization when converting reads_counts into reads_per_million, because Total_number_mapped_reads will vary depending upon the number_of_mapped_reads or number_of_mapped_locations.

ADD REPLY • link 6.2 years ago by Chirag Nepal ★ 2.4k

0

Entering edit mode

I don't know if you can say the mapping is fine, since the overwhelming majority of reads are multi-mapped. Although these are not miRNAs, it's the same challenge (the fragments are too short to be confidently mapped).

ADD REPLY • link 6.2 years ago by igor 13k

0

Entering edit mode

Hi Chirag,

I wonder which at which conclusion did you arrive after these years. I'm wondering to use featureCounts taking into account multi-mapping reads (featureCounts -M parameter) after aligning the reads with Bowtie2. But I'm not use how to do the RPM normalization for the counts.

ADD REPLY • link 11 months ago by Sergio • 0