Question: Network/Pathway Analysis from Mass Spec data
1
gravatar for Mozart
2.0 years ago by
Mozart200
Mozart200 wrote:

Hello there, a colleague asked me to analyse some mass spec data for him. the idea is to do the analysis of pathways/networks to get some results. I have a really tiny experience on transcriptomics and thus I am not sure if the term "score" and "emPAI" can be used as a "p value" and a "fold-change", respectively, as reported below:

Family  - Member - Database - Accession - Score - Mass - Num. of matches - Num. of significant matches - Num. of sequences - Num. of significant sequences - emPAI - Description

we have samples from 2 different tissues at different time points of infection.

Any help?

Kind regards, Mozart

proteomics • 1.1k views
ADD COMMENTlink modified 24 months ago • written 2.0 years ago by Mozart200
1

Hey Mozart, still playing good music with that violin?

What is the source of your mass spec data, do you know?

ADD REPLYlink written 2.0 years ago by Kevin Blighe66k

Hey Kevin, glad to see you there...yeah I am keep doing amazing stuff with this piece of wood...!!!

so I am not entirely sure I got it right but probably you mean some extra information about the machine and like...so:

Mass spectrometer is a Thermo Velos Pro

Sample time run was 1hr

reaction volume: 10µl (desalted ziptip) buffer used: 5% acetonitrile; 0.1% formic acid

does it help??

thanks!

ADD REPLYlink written 2.0 years ago by Mozart200
1

Okay, thank you! Is it not proteomics data that you have? Can you paste a few rows of the data? You could also plot as histogram of the data to see how it looks. I do not know which fields / columns contain the expression values.

ADD REPLYlink written 2.0 years ago by Kevin Blighe66k

thanks to you, Kevin!

so what we got is:

Family (149) Member (1) Database (SwissProt) Accession (KDM4A_MOUSE) Score (18) Mass (3933362) Num.of matches (534) Num.of significant matches (1) Num of sequences (544) Num of significant sequences (1) emPAI (100) Description (Lysine-specific demethylase 4A OS=Mus musculus GN=Kdm4a PE=2 SV=6)

hope it helps...I have no idea of the information included in the Description section, I am afraid. Neither the guy here, who provides the data, seems to be very convincing to me.

ADD REPLYlink modified 2.0 years ago • written 2.0 years ago by Mozart200
1

Thanks. It is making more sense, now. The emPAI is a relative measure of protein expression. Please see here: Exponentially modified protein abundance index (emPAI) for estimation of absolute protein amount in proteomics by the number of sequenced peptides per protein.

I presume that you can use the emPAI values for analysis. Usually, it would be, for example, a Mann-Whitney test (non-parametric). Can you plot the distribution of the emPAI values?

ADD REPLYlink written 2.0 years ago by Kevin Blighe66k

thanks a lot for your (very much appreciated) help, Kevin. You are the best...

enter image description here

hope that make any sense

ADD REPLYlink modified 2.0 years ago • written 2.0 years ago by Mozart200

Is there only one column of values for emPAI? - you mentioned 2 tissue at different time-points (?)

ADD REPLYlink written 2.0 years ago by Kevin Blighe66k

Sorry, I didn't get it!

I actually have just one tissue at different time points at the moment...but, I mean, I just want to know the time effects in these two systems. so I guess it is the same identical thing. time 0

1 day

2 days

so time 0 1 day 2 days, respectively

ADD REPLYlink modified 2.0 years ago • written 2.0 years ago by Mozart200

I posted an answer. Forse aiutará! / Maybe it will help!

ADD REPLYlink written 2.0 years ago by Kevin Blighe66k

I don't really know how big was your help...as usual you solved my problem in a matter of secs; you are incredible Kevin!!

Thanks a lot, I guess that's all I need..

I will dedicate one of my composition to you! ahahah

Thanks again.

ADD REPLYlink written 24 months ago by Mozart200

Anything for the magnificent Mozart!

ADD REPLYlink written 24 months ago by Kevin Blighe66k
4
gravatar for Kevin Blighe
2.0 years ago by
Kevin Blighe66k
Kevin Blighe66k wrote:

I see... aspetta... if you can get your data like this, then you can do different tests:

data
      Day Protein1 Protein2 Protein3 Protein4 Protein5 Protein6
     Day1     0.01     0.04     0.05     0.06     0.34     0.04
     Day2     0.05     0.45     0.06     0.04     0.03     0.04
     Day3     0.56     0.07     0.34     0.56     0.05     0.04

melt the data

require(reshape2)
data.melt <- melt(data)
colnames(data.melt) <- c("Day", "Protein", "Expression")
data.melt$Day <- factor(data.melt$Day, levels = c("Day1","Day2","Day3"))

stat test

Then, perform a non-parametric Kruskal-Wallis and Dunn's Test to compare the time-points. This test determines if there exists a global difference across the time-points.

require(FSA)
dunnTest(Expression ~ Day, data = data.melt, method = 'bh')

Dunn (1964) Kruskal-Wallis multiple comparison
  p-values adjusted with the Benjamini-Hochberg method.

   Comparison          Z    P.unadj     P.adj
  Day1 - Day2 -0.1371082 0.89094528 0.8909453
  Day1 - Day3 -1.6727199 0.09438245 0.2831473
  Day2 - Day3 -1.5356117 0.12463364 0.1869505

plot data per protein

    require(ggplot2)

ggplot(data.melt, aes(x=Day, y=Expression)) +

    stat_summary(geom = "bar", fill="forestgreen", fun.y = mean, position = "dodge") +

    facet_wrap(~ Protein, ncol = 3) +

    stat_summary(geom = "errorbar", fun.data = mean_se, position = "dodge") +

    xlab("") +

    ylab("emPAI") +

    scale_y_continuous(breaks = round(seq(0, max(data.melt$Expression), by = 0.1), 1)) +

    #geom_text(size=4, y=-20) +

    #Set the size of the plotting window
    theme_bw(base_size=24) +

    #Modify various aspects of the plot text and legend
    theme(
        legend.position='none',
        legend.background=element_rect(),
        plot.title=element_text(angle=0, size=18, face="bold", vjust=1),

        axis.text.x=element_text(angle=90, size=18, face="bold", hjust=1.10),
        axis.text.y=element_text(angle=0, size=18, face="bold", vjust=0.5),
        axis.title=element_text(size=18, face="bold"),

        #Legend
        legend.key=element_blank(),     #removes the border
        legend.key.size=unit(1, "cm"),      #Sets overall area/size of the legend
        legend.text=element_text(size=18),  #Text size
        title=element_text(size=18),

        #facet wrap
        strip.text.x = element_text(size=18),
        strip.text.y = element_text(size=18)) +

        #Change the size of the icons/symbols in the legend
        guides(colour=guide_legend(override.aes=list(size=2.5)))

f

calculate ratios across time-points

tdata <- t(data[,2:ncol(data)])
tdata
         Day1 Day2 Day3
Protein1 0.01 0.05 0.56
Protein2 0.04 0.45 0.07
Protein3 0.05 0.06 0.34
Protein4 0.06 0.04 0.56
Protein5 0.34 0.03 0.05
Protein6 0.04 0.04 0.04

ratios <- t(apply(tdata, 1, function(x) c(
  x[1] / x[2],
  x[2] / x[3],
  x[1] / x[3])))
colnames(ratios) <- c("Day1vs2","Day2vs3","Day1vs3")
ratios
             Day1vs2    Day2vs3    Day1vs3
Protein1  0.20000000 0.08928571 0.01785714
Protein2  0.08888889 6.42857143 0.57142857
Protein3  0.83333333 0.17647059 0.14705882
Protein4  1.50000000 0.07142857 0.10714286
Protein5 11.33333333 0.60000000 6.80000000
Protein6  1.00000000 1.00000000 1.00000000

----------------------------------------

It would be difficult to get a single p-value for each protein because there are only 3 values. Ratios make more sense, in this case.

Kevin

ADD COMMENTlink modified 24 months ago • written 2.0 years ago by Kevin Blighe66k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1614 users visited in the last hour