How to calculate RPM (reads per million mapped reads)?
0
0
Entering edit mode
4.2 years ago
BehMah ▴ 40

Hi, I have mapped my RNA-seq data using Tophat then Tophat-Fusion to identify circRNA and now looking for a R/perl/python script to calculate RPM (circRNA reads per million mapped reads) and mapped reads should be the mean of tophat and tophat-fusion mapped reads.

I have circRNA identified (bed file) for each sample. Sorry I am new to bioinformatic and your help is appreciated. :)

RNA-Seq • 5.2k views
0
Entering edit mode

What is your goal? compare circRNA between samples? RNA vs circRNA in the same sample? etc.

0
Entering edit mode

I am doing differential expression of circRNA in some samples and trying to normalise circRNA to mapped reads from Tophat and Tophat-Fusion.

0
Entering edit mode

Are you comparing the circRNAs to their native form?

0
Entering edit mode

Hi, there is a simple perl script I wrote for calculating RPKM. You can adjust the code for RPM by removing 'transcript_length' ($len_col=$ARGV[2]) variable.

And also remove it from the final calculation,

Replace,

$array_rpkm[$i]=((1000000000*$array[$i])/($libarray[$i]*$array[$len_col-1]));


With

$array_rpkm[$i]=((1000000000*$array[$i])/($libarray[$i]));


Usage with test data after editing:

perl rpkm_script_beta.pl sample_count_test.count 2:9 > sample_count_test.rpm

0
Entering edit mode

Hi EagleEye,

Thank you sooooo much for your help, I will run it and see how it will go.

Just as my tophat & tophat-fusion and output of circ_finder tool (to get circRNA list) are different files, for mean reads of TopH and TopH-F, I assumed I should put the mean number in $libarray[$i] ? Thanks :)

0
Entering edit mode

If you can post your few lines of your data, I will be able to tell.

0
Entering edit mode

I have the alignment information for both top-h &top-Fusion so I can get the mean of mapped reads for each sample. Also I have circRNA read number for each sample as well (see below). Need this RMP: circ read (in circ file) per mean of mapped reads (from TopH and Top-Fusion) for 40 samples for expression analysis.

circRNA file (below) is a bed file (chrm, read number, host gene, genome coordinate) that I got by running top-Fusion outfile through a circRNA finder script :

chrm start end circ_name read num Host gene 8 2134780 2159644 circular_RNA_1 15 DHR 5 2134780 2293345 circular_RNA_2 30 CSH 12 2821949 2829687 circular_RNA_3 29 ZFY 6 4924929 4925500 circular_RNA_4 21 PCD 11 6863844 6911166 circular_RNA_5 10 TBL