Network/Pathway Analysis from Mass Spec data
1
1
Entering edit mode
4.0 years ago
Mozart ▴ 320

Hello there, a colleague asked me to analyse some mass spec data for him. the idea is to do the analysis of pathways/networks to get some results. I have a really tiny experience on transcriptomics and thus I am not sure if the term "score" and "emPAI" can be used as a "p value" and a "fold-change", respectively, as reported below:

Family  - Member - Database - Accession - Score - Mass - Num. of matches - Num. of significant matches - Num. of sequences - Num. of significant sequences - emPAI - Description


we have samples from 2 different tissues at different time points of infection.

Any help?

Kind regards, Mozart

proteomics • 2.1k views
1
Entering edit mode

Hey Mozart, still playing good music with that violin?

What is the source of your mass spec data, do you know?

0
Entering edit mode

Hey Kevin, glad to see you there...yeah I am keep doing amazing stuff with this piece of wood...!!!

so I am not entirely sure I got it right but probably you mean some extra information about the machine and like...so:

Mass spectrometer is a Thermo Velos Pro

Sample time run was 1hr

reaction volume: 10µl (desalted ziptip) buffer used: 5% acetonitrile; 0.1% formic acid

does it help??

thanks!

1
Entering edit mode

Okay, thank you! Is it not proteomics data that you have? Can you paste a few rows of the data? You could also plot as histogram of the data to see how it looks. I do not know which fields / columns contain the expression values.

0
Entering edit mode

thanks to you, Kevin!

so what we got is:

Family (149) Member (1) Database (SwissProt) Accession (KDM4A_MOUSE) Score (18) Mass (3933362) Num.of matches (534) Num.of significant matches (1) Num of sequences (544) Num of significant sequences (1) emPAI (100) Description (Lysine-specific demethylase 4A OS=Mus musculus GN=Kdm4a PE=2 SV=6)

hope it helps...I have no idea of the information included in the Description section, I am afraid. Neither the guy here, who provides the data, seems to be very convincing to me.

1
Entering edit mode

Thanks. It is making more sense, now. The emPAI is a relative measure of protein expression. Please see here: Exponentially modified protein abundance index (emPAI) for estimation of absolute protein amount in proteomics by the number of sequenced peptides per protein.

I presume that you can use the emPAI values for analysis. Usually, it would be, for example, a Mann-Whitney test (non-parametric). Can you plot the distribution of the emPAI values?

0
Entering edit mode

thanks a lot for your (very much appreciated) help, Kevin. You are the best...

hope that make any sense

0
Entering edit mode

Is there only one column of values for emPAI? - you mentioned 2 tissue at different time-points (?)

0
Entering edit mode

Sorry, I didn't get it!

I actually have just one tissue at different time points at the moment...but, I mean, I just want to know the time effects in these two systems. so I guess it is the same identical thing.

so time 0 1 day 2 days, respectively

0
Entering edit mode

I posted an answer. Forse aiutará! / Maybe it will help!

0
Entering edit mode

I don't really know how big was your help...as usual you solved my problem in a matter of secs; you are incredible Kevin!!

Thanks a lot, I guess that's all I need..

I will dedicate one of my composition to you! ahahah

Thanks again.

0
Entering edit mode

Anything for the magnificent Mozart!

9
Entering edit mode
4.0 years ago

I see... aspetta... if you can get your data like this, then you can do different tests:

data
Day Protein1 Protein2 Protein3 Protein4 Protein5 Protein6
Day1     0.01     0.04     0.05     0.06     0.34     0.04
Day2     0.05     0.45     0.06     0.04     0.03     0.04
Day3     0.56     0.07     0.34     0.56     0.05     0.04


# melt the data

require(reshape2)
data.melt <- melt(data)
colnames(data.melt) <- c("Day", "Protein", "Expression")
data.melt$Day <- factor(data.melt$Day, levels = c("Day1","Day2","Day3"))


# stat test

Then, perform a non-parametric Kruskal-Wallis and Dunn's Test to compare the time-points. This test determines if there exists a global difference across the time-points.

require(FSA)
dunnTest(Expression ~ Day, data = data.melt, method = 'bh')

Dunn (1964) Kruskal-Wallis multiple comparison
p-values adjusted with the Benjamini-Hochberg method.

Day1 - Day2 -0.1371082 0.89094528 0.8909453
Day1 - Day3 -1.6727199 0.09438245 0.2831473
Day2 - Day3 -1.5356117 0.12463364 0.1869505


# plot data per protein

    require(ggplot2)

ggplot(data.melt, aes(x=Day, y=Expression)) +

stat_summary(geom = "bar", fill="forestgreen", fun.y = mean, position = "dodge") +

facet_wrap(~ Protein, ncol = 3) +

stat_summary(geom = "errorbar", fun.data = mean_se, position = "dodge") +

xlab("") +

ylab("emPAI") +

scale_y_continuous(breaks = round(seq(0, max(data.melt\$Expression), by = 0.1), 1)) +

#geom_text(size=4, y=-20) +

#Set the size of the plotting window
theme_bw(base_size=24) +

#Modify various aspects of the plot text and legend
theme(
legend.position='none',
legend.background=element_rect(),
plot.title=element_text(angle=0, size=18, face="bold", vjust=1),

axis.text.x=element_text(angle=90, size=18, face="bold", hjust=1.10),
axis.text.y=element_text(angle=0, size=18, face="bold", vjust=0.5),
axis.title=element_text(size=18, face="bold"),

#Legend
legend.key=element_blank(),     #removes the border
legend.key.size=unit(1, "cm"),      #Sets overall area/size of the legend
legend.text=element_text(size=18),  #Text size
title=element_text(size=18),

#facet wrap
strip.text.x = element_text(size=18),
strip.text.y = element_text(size=18)) +

#Change the size of the icons/symbols in the legend
guides(colour=guide_legend(override.aes=list(size=2.5)))


# calculate ratios across time-points

tdata <- t(data[,2:ncol(data)])
tdata
Day1 Day2 Day3
Protein1 0.01 0.05 0.56
Protein2 0.04 0.45 0.07
Protein3 0.05 0.06 0.34
Protein4 0.06 0.04 0.56
Protein5 0.34 0.03 0.05
Protein6 0.04 0.04 0.04

ratios <- t(apply(tdata, 1, function(x) c(
x[1] / x[2],
x[2] / x[3],
x[1] / x[3])))
colnames(ratios) <- c("Day1vs2","Day2vs3","Day1vs3")
ratios
Day1vs2    Day2vs3    Day1vs3
Protein1  0.20000000 0.08928571 0.01785714
Protein2  0.08888889 6.42857143 0.57142857
Protein3  0.83333333 0.17647059 0.14705882
Protein4  1.50000000 0.07142857 0.10714286
Protein5 11.33333333 0.60000000 6.80000000
Protein6  1.00000000 1.00000000 1.00000000


## ----------------------------------------

It would be difficult to get a single p-value for each protein because there are only 3 values. Ratios make more sense, in this case.

Kevin