I have a question that how to know how much contribution that each factor makes to the total PCA results, for example, use percentage to explain the contribution?
I have a gene expression matrix from six groups (a,b,c,d,e,f), in each group there are at least two replicates. The six groups are from different donors (For example: replicates in group a from donor A and donor B).
Here is my example data and related sample information:
structure(list(a_1 = c(0.777826793724671, 0.723980296636, 0.242330621229485,
0.74905050941743, 0.293395987479016, 0.508663943270221, 0.568264301633462,
0.71544666425325, 0.22279090131633, 0.162438383558765), a_2 = c(0.840473337331787,
0.750769911101088, 0.824924289714545, 0.451419109478593, 0.67439135722816,
0.968571343459189, 0.657587153837085, 0.449371922994033, 0.460420181043446,
0.506347043439746), b_1 = c(0.606637348420918, 0.0831037482712418,
0.948381775757298, 0.938334624515846, 0.00419102935120463, 0.188286359421909,
0.208198433509097, 0.0177543577738106, 0.718145203776658, 0.0831736491527408
), b_2 = c(0.321765134111047, 0.211961556924507, 0.587257345672697,
0.144032216630876, 0.972890468547121, 0.855327832512558, 0.57097671367228,
0.577104192459956, 0.0781774870119989, 0.114624827634543), c_1 = c(0.391017276560888,
0.590633708517998, 0.795098746661097, 0.728984325425699, 0.493055917322636,
0.369274253724143, 0.773150491062552, 0.975961270509288, 0.813110067509115,
0.0556579001713544), c_2 = c(0.667865458643064, 0.632330810651183,
0.694971590070054, 0.530965579906479, 0.965042072581127, 0.593090373789892,
0.572259798645973, 0.175909371348098, 0.888658297248185, 0.813629044219851
), d_1 = c(0.241499250289053, 0.00761426473036408, 0.461811906658113,
0.109057808294892, 0.866662635235116, 0.454646666068584, 0.744061368750408,
0.863238325342536, 0.640526893315837, 0.868945126188919), d_2 = c(0.319046971853822,
0.91230500722304, 0.501486229710281, 0.364735695533454, 0.575103351147845,
0.466834801947698, 0.793754573678598, 0.851761769503355, 0.967630770988762,
0.154894143110141), e_1 = c(0.816843639593571, 0.803759014001116,
0.960935505107045, 0.574637013487518, 0.173312981147319, 0.0567971160635352,
0.35941729741171, 0.427865072153509, 0.325001205084845, 0.553443755256012
), e_2 = c(0.951808236306533, 0.734539293451235, 0.637795145623386,
0.0906503377482295, 0.307018371066079, 0.837945698061958, 0.575052802218124,
0.149990632198751, 0.740633937064558, 0.614213866414502), e_3 = c(0.655502210138366,
0.303118638927117, 0.754946870729327, 0.973303767153993, 0.15387549251318,
0.727580216247588, 0.133633797988296, 0.990649084327742, 0.508409713627771,
0.291543716564775), f_1 = c(0.722936338512227, 0.43016006750986,
0.668916463851929, 0.597232702188194, 0.566613202914596, 0.492413811851293,
0.841789987403899, 0.991420056903735, 0.654314963845536, 0.361741523491219
), f_2 = c(0.445054198848084, 0.561434917617589, 0.00869911885820329,
0.193016097648069, 0.625879230443388, 0.440140291117132, 0.192036809166893,
0.825253788847476, 0.706002304796129, 0.560118380701169), f_3 = c(0.481232576072216,
0.388952540932223, 0.547279508085921, 0.0684368198271841, 0.211692467099056,
0.0816763381008059, 0.179641566239297, 0.111117320600897, 0.471861350117251,
0.0512471403926611), f_4 = c(0.618208972737193, 0.993882849579677,
0.0283353771083057, 0.699978570453823, 0.377288895659149, 0.486861442681402,
0.55804020841606, 0.609155245823786, 0.911775157321244, 0.616870308062062
), f_5 = c(0.662106775445864, 0.825681922957301, 0.838785516098142,
0.812102147843689, 0.16824291436933, 0.873179373564199, 0.230873316759244,
0.852764319395646, 0.354741029441357, 0.673453942872584), f_6 = c(0.696460561361164,
0.63435076456517, 0.488687434000894, 0.974564600270241, 0.0511445237789303,
0.850382218603045, 0.52776403632015, 0.878997334977612, 0.265999732539058,
0.199336896184832), f_7 = c(0.155626990366727, 0.609529266366735,
0.405262058833614, 0.832081991713494, 0.759743764763698, 0.116353970719501,
0.945955528412014, 0.744843143504113, 0.824364016996697, 0.0772311592008919
)), class = "data.frame", row.names = c(NA, -10L))
structure(list(Group = c("a", "a", "b", "b", "c", "c", "d", "d",
"e", "e", "e", "f", "f", "f", "f", "f", "f", "f"), Rep = c("a_1",
"a_2", "b_1", "b_2", "c_1", "c_2", "d_1", "d_2", "e_1", "e_2",
"e_3", "f_1", "f_2", "f_3", "f_4", "f_5", "f_6", "f_7"), Donor = c("A",
"B", "C", "D", "E", "F", "G", "H", "I", "J", "K", "A", "B", "I",
"J", "C", "L", "J")), class = "data.frame", row.names = c(NA,
-18L))
Based on this data, I produced a PCA plot.
Then I want to know if I can calculate the donor effect (A~L) and group effect(a~f) that contribute the PCA result, especially dim PC1 and dim PC2?
So I can say, 30% of the PC1 variance is contributed by donor effect, the rest PC1 70% variance can be explained by group effect?
Because in my data,I only want to focus on group effect that can distinct different groups by gene expression. But the donor effect also exist.
If there are any better methods to be used to do similar calculation?
I hope you can give me some adivce on this question.
Thanks in advance.
Thanks a lot. It is almost what I need but not all.
From your script, I can get information that how much degree different samples contribute to dim 1 or pc1.
But I also want to know how much degree the donor affect the pc1 or pc2 in pca.
There are 18 samples (columns) from 12 donors are divided into 6 groups.
Samples from the same donor but belong to different groups can have similar expression pattern.
We can distinct 18 samples by their groups in PCA. But the difference among the 6 groups is not quite distinct because of the donor effect.
So question is how much degree the group effect and the donor effect have, respectively.
Your script can be used to calculate how much degree each sample contributes to the PCA. It's great. But not so necessary for my core question.
Thanks.
<h6>##### update:</h6>I ask chat-gpt for this question. And it gives me a suitable method. I am having a try.
I will post message here if I have any new questions.
Thanks.
ok, then you should add the donor information to the PCA analysis. First create a separate dataframe for the donors, then add with
factor
the donor information into the PCA results. Try chat-gpt if so write here, let me know!Ok, I will response to you here as soon as possible.
You can have a look on my answer. I have post it below.
great team work! why can't you access and do the calculation for the other components too? can't you extract the information from the other components and add to your dataset?
Yes, I can. But there are only two factors that affect my data PCA distribution: groups and donor. And the rest is regarded as residual.
I think it is suitable for me. However the factors contribution that I got was not assistant with my hypothesis. It is a sad thing.
what a pity...