why the TPM value is not same?
0
0
Entering edit mode
4.7 years ago
star ▴ 350

I would like to do normalizing on my data using TPM methods like what explained https://www.rna-seqblog.com/rpkm-fpkm-and-tpm-clearly-explained/

TPM is very similar to RPKM and FPKM. The only difference is the order of operations. Here’s how you calculate TPM:

1. Divide the read counts by the length of each gene in kilobases. This gives you reads per kilobase (RPK).
2. Count up all the RPK values in a sample and divide this number by 1,000,000. This is your “per million” scaling factor.
3. Divide the RPK values by the “per million” scaling factor. This gives you TPM.

I used the below codes but I do not know why the output is not correct?

CODE:

RPK<- data.matrix(Data [-1] / Data$Length.Kbp) TPM <- t(t(RPK)*1e6 / colSums(RPK))  Data:  Length.Kbp FB_1 FB_2 FB_3 1:15040-15500 0.46 0 4 0 1:108570-109500 0.93 1 5 0 1:248240-249110 0.87 2 1 1  RPK:  FB_1 FB_2 FB_3 1:15040-15500 0 8.695652 0 1:108570-109500 1.075269 5.376344 0 1:248240-249110 2.298851 1.149425 1.149425  TPM:  FB_1 FB_2 FB_3 1:15040-15500 0 2577162.0 0 1:108570-109500 70641.81 353209.1 0 1:248240-249110 2000000.00 1000000.0 1000000.0  while for the first row (related value to FB_2) should be like : 8.695652 * 1000000 / 15.221422 =571277.2 R RPKM TPM edgeR normalizing • 1.7k views ADD COMMENT 0 Entering edit mode Did you try storing colSums2(RPK) in a vector and verifying a few values in it to ensure you're dividing by the right value? There is something odd about the third row - it seems to be exactly 1e6 x original_counts. Also, your datasets don't conform to the code. If RPK <- data.matrix(Data / Data$Length.Kbp) is exactly what was run, then RPK would also have a column titled Length.Kbp with all values = 1. Did you remove that column?

0
Entering edit mode

Thanks for your reply! Yes, I have removed it and Edited the cod now.

In my cod I just used transpose :

TPM <- t(t(RPK)*1e6 / colSums(RPK))

and it looks work. but I don`t know what exactly happens after two times transposing?

0
Entering edit mode

Are you sure you should be using colSums and not rowSums? You're dividing transposed-RPK by per-sample RPK sums, not per-region RPK sums. Try using rowSums instead.

0
Entering edit mode

I want to divide RPK per-sample RPK based on the below explanation:

1) Divide the read counts by the length of each gene in kilobases. This gives you reads per kilobase (RPK).

2) Count up all the RPK values in a sample and divide this number by 1,000,000. This is your “per million” scaling factor.

3) Divide the RPK values by the “per million” scaling factor. This gives you TPM.

0
Entering edit mode

Please read those three statements and interpret them to get to the denominator you need to use. I can help you with specific questions, but I will not read English and translate it to reproducible code for you - you should be able to do that on your own.

0
Entering edit mode

I have edited your post and updated the TPM object with the formula above. Going forward, please give us the exact code you use - it is impossible to help you when you withhold critical information.