Proteomics data analysis
1
0
Entering edit mode
5 weeks ago
Ariadna ▴ 20

I have a dataset which consists of around 300 plasma samples taken from patients diagnosed with breast cancer (of 5 different subtypes). There are 11 proteins which were measured using label free LC-MS for those samples. My goal is to find proteins with are distributed differently among any pair of breast cancer subtypes.

I performed 2 different procedures in parallel.

Procedure 1: log transform the data, apply median transformation (from each protein value within a sample the median of all proteomic values in that sample is subtracted). The ComBat is applied to remove batch effects (based on PCA which shows clustering by plate). The Kruskal-Wallis test is applied. I get no groups which exhibit any difference in distributions.

Procedure 2: log transform the data, apply quantile transformation [using normalize.quantiles function where samples are columns and each row corresponds to a protein], perform ComBat. The Kruskal-Wallis test is applied. I get 10 pairs of groups for which significance is yielded.

How do i determine the correct normalisation method given such different results?

Many thanks in advance

data-analysis biostatistics statistics proteomics • 986 views
ADD COMMENT
0
Entering edit mode

can you tell me the how did you start from the raw input, since I do have raw files which I have converted them into mzML files. Now i did see few R workflows but I'm not clear what values should be taken out for the differential analysis as I also have two groups.

Can you tell what workflow should i follow to which can take the mzML files and I can get the input data to run it in R for various EDA and differential analysis.

How do i determine the correct normalization method given such different results?

For this context I would say go ahead with edgeR

ADD REPLY
1
Entering edit mode

I work with files received from another team, which are in txt format. There are some packages in R which help to handle mzML files (e.g. https://lgatto.github.io/RforProteomics/articles/RforProteomics.html).

As for edgeR, i am not sure, generally workflow includes something which is described here for instance : https://www.embopress.org/doi/10.15252/msb.202110240?__cf_chl_tk=tYXsFyxHm2gCX9n9IleCnZRgUzRI_HDNlEekj26tK6Y-1736619325-1.0.1.1-EweDmTcurrV0MUEopdWAIarPS_uVl7quj4qF7djAjds

ADD REPLY
0
Entering edit mode

"I work with files received from another team, which are in txt format"

what these values are can you show me sample input so that I have idea what to extract from them mzML files.

ADD REPLY
0
Entering edit mode

1769mkc You can use > (or the " icon in edit window) to quote parts of a post that you want to respond to with a comment.

ADD REPLY
0
Entering edit mode

"This workflow starts with a raw data matrix, for which initial steps such as peptide‐spectrum matching, quantification, and FDR control have been completed. Data are assumed to be log‐transformed unless the variance stabilizing transformation (Durbin et al, 2002) is used. In the latter case, the data transformation is included in the normalization procedure." so how do you get the raw matrix is it like gene count file if yes that you can use deseq2 or edger which can take input and you can do the batch correction there it self by specifiying th

ADD REPLY
3
Entering edit mode
5 weeks ago
ATpoint 87k

Are there any reference proteins to base normalization on? Median normalization of just 11 proteins is tricky in the sense that it implies that at least 50% of proteins do not change between conditions. You need controls for such small amounts of proteins. For DE I would use limma which is more powerful than KS. You can include the plate information into the design to correct batch.

ADD COMMENT
1
Entering edit mode

By reference proteins do you mean those which have similar distribution among cancer subtypes or those which might have been/might be calculated using internal standard (with synthetic peptide to get absolute quantification)?

ADD REPLY
0
Entering edit mode

I mean one or many proteins you know are not DE. Like a known constant baseline.

ADD REPLY
0
Entering edit mode

Only based on the literature review can I state that we expect Protein X, Y, Z not to be DE . Do you recommend using these three, that is to take median of these three proteins to get each sample median and use it to subtract from all proteomic values? or..?

ADD REPLY
1
Entering edit mode

Yes, you could scale all data in a way that the median of these three genes is the same across all samples. However, no evidence for DE does not mean evidence for no DE, so it is on you to decide whether you trust these genes to be a good reference.

ADD REPLY
0
Entering edit mode

Thank you a lot! I was using this tutorial at some point, which describes normalisation by median entering as a subtraction of medianhttps://statomics.github.io/PDA/pda_quantification_preprocessing.html. So subtraction is unfavourable way of handling this data ? And what about quantile normalisation?

ADD REPLY
1
Entering edit mode

Sorry I removed the sentence about subtraction, I misread your sentence. Normally you would calculate a scaling factor that you multiply or divide your data by so that after this operation the medians are the same. The thing with QC is that it also assumes that the overall distributions of proteins are the same across all samples. It's a strong assumtpion for such few proteins.

ADD REPLY
0
Entering edit mode

Many thanks. That was very helpful!

ADD REPLY
0
Entering edit mode

To formally test whether quantile normalisation (QN) is applicable I applied Kolmogorov-Smirnov test (H0:two dataset values are from the same continuous distribution) and it holds for those samples which were not shifted by the batch effect ( samples on two plates whose values are shifted due to the plate effect have similar shape to those which were not affected by the effect). I am wondering does it make sense doing this testing or there are other methods used to test these assumptions (of QN) in proteomics?

Also, in papers, I have mostly seen that they do normalisation and then apply ComBat. But for this case it particular, would it not be more appropriate to remove batch effect first and then decide on what transformation to use?

ADD REPLY

Login before adding your answer.

Traffic: 4583 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6