Background: I'm working on a project that deals with gene expression in various parts of the body. I'm looking at over 400 different RPKM data values for genes expressed in certain tissues (Data comes from GTEX Portal). I'm looking at 6 different tissues (Caudate, Cortex, Nucleus Accumbens, Putamen, Spinal Cord, Tibial Nerve, Skeletal Muscle) with Skin: Non-sun exposed RPKM expression as a general control expression value (so 7 total different tissues).
Goal: I want to express in numerical form the difference between several genes' expressions in each of the 6 variable tissues. I did it by comparing to its expression in Skin (which acted as the control). For example; we have three genes with arbitrary RPKM values: APP, PSEN1, and APOE.
APP - Caudate = 191 Cortex = 143 Skin = 99.7
PSEN 1 - Caudate = 5.9 Cortex = 4.7 Skin = 9.9
APOE - Caudate = 910.1 Cortex = 879.9 Skin = 174.5
I want to express the difference in expression in Caudate and Cortex, while also comparing those difference between the various genes as well. So, I need to illustrate the difference in expression between caudate and cortex at the same time expressing the difference between APP and PSEN1. Once again, I compared each gene RPKM to Skin RPKM for that gene, but not sure if that's the best way.
Obstacles My first issue is the fact that the data values across the genes are vastly different. For example, APOE expression in the Caudate is in the 900's while PSEN1 expression in the Caudate is below ten. How should I go about normalizing these data so that their all similar in terms of scale (at least viable to be compared accurately).
The second issue is that I need to be able to compare differences in values between tissues, and then compare those differences for that gene between the differences for other genes.
I would like accomplish these tasks and express the results in a single scatter plot. Is this possible?
My Attempt: So my idea was to take the percent difference between the tissue value against the skin value. So basically in excel, I used the percent difference formula (the absolute value of the change in value, divided by the average of the 2 numbers, all multiplied by 100) for each gene for each tissue against the gene's skin tissue expression. This resulted in a normalized scale across the genes (percent instead of RPKM) and also somewhat expressed the difference in different gene expression between genes in different tissues. However, this resulted in a max percent difference of 200% and -200%. I feel as though a max of 200% is not fully reflective of its true differences.
Conclusion: So I'm reaching out to the bioinformatics community to see if anyone else has any suggestions on how I can manipulate the data and express them in a single scatterplot (or some other graph, it doesn't matter). I considered using percent error instead of difference; would this work?
Thanks in advance.