Question: How to calculate RPM (reads per million mapped reads)?
0
gravatar for BehMah
3.9 years ago by
BehMah40
BehMah40 wrote:

Hi, I have mapped my RNA-seq data using Tophat then Tophat-Fusion to identify circRNA and now looking for a R/perl/python script to calculate RPM (circRNA reads per million mapped reads) and mapped reads should be the mean of tophat and tophat-fusion mapped reads.

I have circRNA identified (bed file) for each sample. Sorry I am new to bioinformatic and your help is appreciated. :)

rna-seq • 4.9k views
ADD COMMENTlink modified 12 months ago by Biostar ♦♦ 20 • written 3.9 years ago by BehMah40

What is your goal? compare circRNA between samples? RNA vs circRNA in the same sample? etc.

ADD REPLYlink written 3.9 years ago by Asaf8.5k

I am doing differential expression of circRNA in some samples and trying to normalise circRNA to mapped reads from Tophat and Tophat-Fusion.

ADD REPLYlink written 3.9 years ago by BehMah40

Are you comparing the circRNAs to their native form?

ADD REPLYlink written 3.9 years ago by Asaf8.5k

Hi, there is a simple perl script I wrote for calculating RPKM. You can adjust the code for RPM by removing 'transcript_length' ($len_col=$ARGV[2]) variable.

And also remove it from the final calculation,

Replace,

$array_rpkm[$i]=((1000000000*$array[$i])/($libarray[$i]*$array[$len_col-1]));

With

$array_rpkm[$i]=((1000000000*$array[$i])/($libarray[$i]));

Usage with test data after editing:

perl rpkm_script_beta.pl sample_count_test.count 2:9 > sample_count_test.rpm
ADD REPLYlink modified 3.9 years ago • written 3.9 years ago by EagleEye6.7k

Hi EagleEye,

Thank you sooooo much for your help, I will run it and see how it will go.

Just as my tophat & tophat-fusion and output of circ_finder tool (to get circRNA list) are different files, for mean reads of TopH and TopH-F, I assumed I should put the mean number in $libarray[$i] ? Thanks :)

ADD REPLYlink written 3.9 years ago by BehMah40

If you can post your few lines of your data, I will be able to tell.

ADD REPLYlink modified 3.9 years ago • written 3.9 years ago by EagleEye6.7k

I have the alignment information for both top-h &top-Fusion so I can get the mean of mapped reads for each sample. Also I have circRNA read number for each sample as well (see below). Need this RMP: circ read (in circ file) per mean of mapped reads (from TopH and Top-Fusion) for 40 samples for expression analysis.

circRNA file (below) is a bed file (chrm, read number, host gene, genome coordinate) that I got by running top-Fusion outfile through a circRNA finder script :

chrm start end circ_name read num Host gene 8 2134780 2159644 circular_RNA_1 15 DHR 5 2134780 2293345 circular_RNA_2 30 CSH 12 2821949 2829687 circular_RNA_3 29 ZFY 6 4924929 4925500 circular_RNA_4 21 PCD 11 6863844 6911166 circular_RNA_5 10 TBL

ADD REPLYlink modified 3.9 years ago • written 3.9 years ago by BehMah40
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2110 users visited in the last hour
_