Question: from mean per-base read coverage to RPKM using the Brawand data
0
lin.pei26 • 100 wrote:
Hi:
I am trying to calculating the RPKM from mean per-base read coverage which was provided by the paper Brawand et al, 2011.
but it seems strange to me that my resulting 'RPKM' value is rather small (each sample has a median value less than 1)
Below is my code[2].
rz2rpkm <- function(expr) { readcount <- expr # expr is a matrix of mean per-base read coverage with gene in rows and sample in columns readcount <- readcount[,6:ncol(readcount)] read.length <- 75 exoniclength <- expr[,"ExonicLength"] for(j in colnames(expr)[6:ncol(expr)]) { readcount[,j] <- (expr[,j]*exoniclength)/read.length } M <- apply(readcount,2,sum) RPKM <- readcount for(j in colnames(readcount)) { RPKM[,j] <- (readcount[,j]/(exoniclength*M[j]))*10^9 } return(RPKM) }
Here I sincerely asking for your help for this problem.
I have no experience with RNA-seq data before.
Many many thanks!
Best,
Can you post a few example values that seem off and the exoniclengths used to derive them? RPKM values can be quite low.
BTW, you can get rid of the for loops and increase performance. The first loop is simply:
The second one can be handled in a similar way
.
Hello lin.pei26!
It appears that your post has been cross-posted to another site: SEQanswers.
This is typically not recommended as it runs the risk of annoying people in both communities.
Thank you Devon
below is the top lines from MOUSE data[1]
I use this matrix as the variable expr
results looks like[2]
link to the Brawand data: http://www.nature.com/nature/journal/v478/n7369/extref/nature10532-s5.zip
[1]
"GeneID" "Chr" "Begin" "End" "Strand" "ExonicLength" "Mouse_Brain_Female" "Mouse_Brain_Male1" "Mouse_Brain_Male2" "Mouse_Cerebellum_Female" "Mouse_Cerebellum_Male1" "Mouse_Cerebellum_Male2" "Mouse_Heart_Female" "Mouse_Heart_Male1" "Mouse_Heart_Male2" "Mouse_Kidney_Female" "Mouse_Kidney_Male1" "Mouse_Kidney_Male2" "Mouse_Liver_Female" "Mouse_Liver_Male1" "Mouse_Liver_Male2" "Mouse_Testis_Male1" "Mouse_Testis_Male2"
"ENSMUSG00000030105" "6" 108733076 108773542 1 2730 72.2380952380952 185.524542124542 36.0362637362637 67.6091575091575 29.5714285714286 53.307326007326 31.956043956044 23.4047619047619 27.1358974358974 32.2355311355311 27.9516483516484 28.0758241758242 20.9168498168498 12.4223443223443 18.1003663003663 28.3615384615385 36.5879120879121
"ENSMUSG00000074445" "3" 92054654 92093169 1 722 0 0 0 0.105263157894737 0 0 0 0 0 0.840720221606648 0.421052631578947 1.57756232686981 0 0 0 0 0
"ENSMUSG00000065586" "6" 30119446 30119551 -1 106 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
"ENSMUSG00000065904" "18" 35102298 35102367 -1 70 0.185714285714286 2.18571428571429 0 3.21428571428571 0.328571428571429 1.14285714285714 3.38571428571429 0.0285714285714286 0 0.757142857142857 0 0.7 1.22857142857143 0 0 0 0
"ENSMUSG00000020495" "11" 86891234 86900276 -1 3218 4.8325046612803 15.2973896830329 2.7799875699192 6.60907395898073 1.98104412678682 4.95121193287756 4.10223741454319 2.73213175885643 3.43349906774394 3.46239900559354 3.96643878185208 4.55220633934121 4.47700435052828 2.64170292106899 4.36513362336855 4.3141702921069 6.85922933499068
"ENSMUSG00000085092" "14" 69981561 70162426 -1 3368 31.2678147268409 36.4394299287411 9.57749406175772 36.7541567695962 17.750593824228 13.8067102137767 40.5237529691211 25.0736342042755 28.6576603325416 31.8931116389549 63.8717339667458 26.8788598574822 10.9421021377672 8.63984560570071 9.30255344418052 34.2105106888361 21.5483966745843
"ENSMUSG00000058979" "6" 120459512 120481337 -1 1928 16.8656639004149 41.2095435684647 8.45331950207469 18.4590248962656 10.1841286307054 13.5679460580913 12.1696058091286 10.5238589211618 10.7536307053942 7.74636929460581 13.7214730290456 9.21317427385892 8.28112033195021 8.58869294605809 8.94709543568465 12.6483402489627 11.2422199170124
"ENSMUSG00000045903" "19" 4984355 5038654 -1 3294 8.40255009107468 31.5170006071645 2.15725561627201 4.35033394049788 1.79204614450516 0.835458409228901 0.32301153612629 0.0230722525804493 0.206739526411658 0.0919854280510018 0.0230722525804493 0.0692167577413479 0 0 0 0.505464480874317 0.852459016393443
"ENSMUSG00000038045" "17" 79284237 79306239 -1 1008 0 0.0843253968253968 0 0 0.0406746031746032 0.0753968253968254 0 0.0753968253968254 0.0753968253968254 0.150793650793651 0.0753968253968254 0.108134920634921 0.0694444444444444 0 0 0.461309523809524 0.189484126984127
"ENSMUSG00000049536" "X" 133242521 133246017 1 2803 7.6578665715305 17.9450588655012 3.90082054941134 12.3389225829468 4.31109525508384 7.39778808419551 3.76489475561898 1.78808419550482 2.04566535854442 2.24687834463075 1.19158044951837 1.67998572957545 0.35176596503746 0.106671423474848 0.10845522654299 1.42133428469497 2.73135925793792
"ENSMUSG00000065644" "2" 19856472 19856580 1 109 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
"ENSMUSG00000027333" "2" 131317232 131351658 1 3585 13.8334728033473 36.1715481171548 10.1977684797768 37.7629009762901 12.0253835425384 21.0259414225941 13.6387726638773 5.08870292887029 6.85076708507671 2.4976290097629 1.321059972106 2.31938633193863 1.58940027894003 0.955927475592748 0.530543933054393 3.43291492329149 5.48647140864714
[2]
> RPKM2 <- rz2rpkm(expr2)
> RPKM2 <- as.matrix(RPKM2)
> head(RPKM2[,1:5])
Mouse_Brain_Female Mouse_Brain_Male1 Mouse_Brain_Male2 Mouse_Cerebellum_Female Mouse_Cerebellum_Male1
ENSMUSG00000030105 71.9478984 67.2613894 68.701174 53.98640240 53.5640773
ENSMUSG00000074445 0.0000000 0.0000000 0.000000 0.08405339 0.0000000
ENSMUSG00000065586 0.0000000 0.0000000 0.000000 0.00000000 0.0000000
ENSMUSG00000065904 0.1849682 0.7924244 0.000000 2.56663044 0.5951564
ENSMUSG00000020495 4.8130914 5.5460246 5.299895 5.27739347 3.5883556
ENSMUSG00000085092 31.1422048 13.2110106 18.258971 29.34846063 32.1524602
> apply(RPKM2,2,median)
Mouse_Brain_Female Mouse_Brain_Male1 Mouse_Brain_Male2 Mouse_Cerebellum_Female Mouse_Cerebellum_Male1 Mouse_Cerebellum_Male2
0.45542732 0.46554147 0.43339427 0.42292904 0.41544624 0.39474157
Mouse_Heart_Female Mouse_Heart_Male1 Mouse_Heart_Male2 Mouse_Kidney_Female Mouse_Kidney_Male1 Mouse_Kidney_Male2
0.17738306 0.11659419 0.17509136 0.30791267 0.26455986 0.32425262
Mouse_Liver_Female Mouse_Liver_Male1 Mouse_Liver_Male2 Mouse_Testis_Male1 Mouse_Testis_Male2
0.08091724 0.09642508 0.08004370 0.62388267 0.57606359
> apply(expr2[,6:ncol(expr2)],2,median)
Mouse_Brain_Female Mouse_Brain_Male1 Mouse_Brain_Male2 Mouse_Cerebellum_Female Mouse_Cerebellum_Male1 Mouse_Cerebellum_Male2
0.45726426 1.28408540 0.22733105 0.52964960 0.22935780 0.34670551
Mouse_Heart_Female Mouse_Heart_Male1 Mouse_Heart_Male2 Mouse_Kidney_Female Mouse_Kidney_Male1 Mouse_Kidney_Male2
0.20987654 0.09183673 0.15756080 0.25945017 0.22200957 0.29439830
Mouse_Liver_Female Mouse_Liver_Male1 Mouse_Liver_Male2 Mouse_Testis_Male1 Mouse_Testis_Male2
0.08786127 0.06836162 0.08113917 0.49350649 0.56578171
Those values look about right. Most genes aren't well expressed generally, which will skew the RPKM low.
Careful, I think your read counts are not raw counts, or why are they fractional numbers in [1]?
Those are mean per-base coverage (op is multiplying by length to get average read counts), not raw counts. Having said that, the raw data is available, so it'd be easy enough for lin.pei26 to just reprocess a single sample and check how correct these calculations are.