How to calculate RPM (reads per million mapped reads)?
0
0
Entering edit mode
7.2 years ago
BehMah ▴ 50

Hi, I have mapped my RNA-seq data using Tophat then Tophat-Fusion to identify circRNA and now looking for a R/perl/python script to calculate RPM (circRNA reads per million mapped reads) and mapped reads should be the mean of tophat and tophat-fusion mapped reads.

I have circRNA identified (bed file) for each sample. Sorry I am new to bioinformatic and your help is appreciated. :)

RNA-Seq • 7.5k views
ADD COMMENT
0
Entering edit mode

What is your goal? compare circRNA between samples? RNA vs circRNA in the same sample? etc.

ADD REPLY
0
Entering edit mode

I am doing differential expression of circRNA in some samples and trying to normalise circRNA to mapped reads from Tophat and Tophat-Fusion.

ADD REPLY
0
Entering edit mode

Are you comparing the circRNAs to their native form?

ADD REPLY
0
Entering edit mode

Hi, there is a simple perl script I wrote for calculating RPKM. You can adjust the code for RPM by removing 'transcript_length' ($len_col=$ARGV[2]) variable.

And also remove it from the final calculation,

Replace,

$array_rpkm[$i]=((1000000000*$array[$i])/($libarray[$i]*$array[$len_col-1]));

With

$array_rpkm[$i]=((1000000000*$array[$i])/($libarray[$i]));

Usage with test data after editing:

perl rpkm_script_beta.pl sample_count_test.count 2:9 > sample_count_test.rpm
ADD REPLY
0
Entering edit mode

Hi EagleEye,

Thank you sooooo much for your help, I will run it and see how it will go.

Just as my tophat & tophat-fusion and output of circ_finder tool (to get circRNA list) are different files, for mean reads of TopH and TopH-F, I assumed I should put the mean number in $libarray[$i] ? Thanks :)

ADD REPLY
0
Entering edit mode

If you can post your few lines of your data, I will be able to tell.

ADD REPLY
0
Entering edit mode

I have the alignment information for both top-h &top-Fusion so I can get the mean of mapped reads for each sample. Also I have circRNA read number for each sample as well (see below). Need this RMP: circ read (in circ file) per mean of mapped reads (from TopH and Top-Fusion) for 40 samples for expression analysis.

circRNA file (below) is a bed file (chrm, read number, host gene, genome coordinate) that I got by running top-Fusion outfile through a circRNA finder script :

chrm start end circ_name read num Host gene 8 2134780 2159644 circular_RNA_1 15 DHR 5 2134780 2293345 circular_RNA_2 30 CSH 12 2821949 2829687 circular_RNA_3 29 ZFY 6 4924929 4925500 circular_RNA_4 21 PCD 11 6863844 6911166 circular_RNA_5 10 TBL

ADD REPLY

Login before adding your answer.

Traffic: 1384 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6