Question: How to plot correlation graphs with R^2 ?
2
gravatar for Wuschel
17 months ago by
Wuschel270
HUJI
Wuschel270 wrote:

I have a proteomics data matrix. In the data matrix, I have detected a different number of peptides for each protein (detectable peptides numbers vary on the protein).

Q1. How can I plot correlation graphs for each protein(gene) to compare how its' peptides behave. i.e. For protein A, I have peptides a1-a3, I want to compare a1 vs a2, a1 vs a3, and a2 vs a3.

Sample data

structure(list(Protein = c("A", "A", "A", "A", "B", "C", "C", "D", "D", "D"), Peptide = c("a1", "a2", "a3", "a4", "b1", "c1", "c2", "d1", "d2", "d3"), Sample1 = c(0.275755732, 0.683048798, 1.244604878, 0.850270313, 0.492175199, 0.269651338, 0.393004954, 0.157966662, 1.681672581, 0.298308801), Sample2 = c(0.408992244, 0.172488244, 1.749247694, 0.358172308, 0.142129982, 0.158636283, 0.243500648, 0.095019037, 0.667928805, 0.572162278), Sample3 = c(0.112265765, 0.377174168, 2.430040623, 0.497873323, 0.141136584, 0.250330266, 0.249783164, 0.107188279, 0.173623439, 0.242298602), Sample4 = c(0.87688073, 0.841826338, 0.831376575, 0.985900966, 0.891632525, 1.016533723, 0.292048735, 0.776351689, 0.800070173, 1.161882923), Sample5 = c(1.034093889, 0.304305772, 0.616445765, 1.000820463, 1.03124071, 0.995897846, 0.289542364, 0.578721727, 0.672592766, 1.168944588), Sample6 = c(1.063124715, 0.623917522, 0.613196611, 0.990921045, 1.014340981, 0.965631141, 0.316793011, 1.02220535, 1.182063616, 1.41196421), Sample7 = c(1.335677026, 0.628621656, 0.411171453, 1.050563412, 1.290233552, 1.1603839, 0.445372411, 1.077192698, 0.726669337, 1.09453338), Sample8 = c(1.139360562, 0.404024829, 0.263714711, 0.899959209, 1.356913804, 1.246338203, 0.426568548, 1.104988267, 0.964924824, 1.083654341), Sample9 = c(1.38146599, 0.582817437, 0.783698738, 1.118948066, 1.010795866, 1.277086848, 0.434025911, 1.238871048, 1.201184368, 1.476478831), Sample10 = c(1.111486801, 0.60513273, 0.460680037, 1.385702246, 1.448873253, 1.364329784, 0.375032044, 1.382750002, 0.741842319, 1.035657705)), row.names = c(NA, -10L), class = c("tbl_df", "tbl", "data.frame"), spec = structure(list( cols = list(Protein = structure(list(), class = c("collector_character", "collector")), Peptide = structure(list(), class = c("collector_character", "collector")), Sample1 = structure(list(), class = c("collector_double", "collector")), Sample2 = structure(list(), class = c("collector_double", "collector")), Sample3 = structure(list(), class = c("collector_double", "collector")), Sample4 = structure(list(), class = c("collector_double", "collector")), Sample5 = structure(list(), class = c("collector_double", "collector")), Sample6 = structure(list(), class = c("collector_double", "collector")), Sample7 = structure(list(), class = c("collector_double", "collector")), Sample8 = structure(list(), class = c("collector_double", "collector")), Sample9 = structure(list(), class = c("collector_double", "collector")), Sample10 = structure(list(), class = c("collector_double", "collector"))), default = structure(list(), class = c("collector_guess", "collector"))), class = "col_spec"))

Expected kind of graph 776bE

Hence peptide number varies for each protein, how can I compare each peptide and save the faceted graph into single plots, by this, I can select only required graphs.

Q2. What is another possible way to present this correlation?

gene R genome • 474 views
ADD COMMENTlink modified 17 months ago by pbpanigrahi190 • written 17 months ago by Wuschel270
2
gravatar for pbpanigrahi
17 months ago by
pbpanigrahi190
pbpanigrahi190 wrote:

I will try to answer for Q2

Simple way is to generate a correlation matrix protein wise

# Load libraries
library(dplyr);
library(ggplot2);
# Lets assume data is stored in data variable 
# Store correlation value
cormat = data.frame(protein = "A", corval = 0);  # Dummy row, later remove
for(x in unique(data$Protein))
{
    print(x);
    tempind = which(data$Protein==x);
    if(length(tempind) > 1)
    {
    tempval = cor(t(data[tempind,c(-1,-2)])) %>% .[upper.tri(.)] %>% unlist;
    tempval=cbind(protein = x, corval = tempval);
    cormat=rbind(cormat,tempval)
    }
}
cormat= cormat[-1,];
ggplot(cormat, aes(x=protein, y=corval, col=protein))+geom_point();

You need to beautify the ggplot.

What the code does Since number of peptides vary, the plot calculate pair wise correlation of all peptides and stores in a matrix. So 6 rows for A protein since 4 peptide and 6 unique pairing, B singe one peptide skip correlation, C 2 peptide so one correlation values and so on. Plot all of them on single plot.

Output https://ibb.co/mSErEo

trial

Hope there is alternative ways to do

Thanks

Priyabrata

ADD COMMENTlink modified 17 months ago by genomax75k • written 17 months ago by pbpanigrahi190
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 743 users visited in the last hour