How to calculate median peptide intensity of each protein in mass-spec dataset in R
0
0
Entering edit mode
11 weeks ago
Sean • 0

Hello,

I did a quantitative proteomics experiment to measure the differential expression of proteins in cells between two conditions. The output is a list of peptides, the protein they map to, and the their abundance for the experimental and control condition. Each protein has several detected peptides, and I need to pull out the median peptide abundance per protein, per condition into a new data frame. A simple version is as follows below:

| protein       | peptide |condition 1 abundance | condition 2 abundance |
| --------       | ------------| ---------------------| --------------------- |
| protein 1   | A APGSR           | 1                    |          4
| protein 1   | ASTGR           | 2                     |          5
| protein 2   | ASTTGAR          | 3                    |      6
| protein 2   | PAGPAPTR          | 3.5                  |       7
| protein 2   | VPSTR           |                      |       5


Is there a way to write code for this in R? Note that I have about 6000 proteins, and about 60,000 detected peptides. Not all peptides were detected in both condition 1 and 2, but I would still need to take the median of all peptides per protein for each condition separately.

The goal is to do statistical analysis between the median peptide abundance for each protein so I can see if the values are significantly different.

bioconductor r transcriptomics proteomics • 153 views
0
Entering edit mode

The goal is to do statistical analysis between the median peptide abundance for each protein so I can see if the values are significantly different.

Please be sure to use specialized software for this. In Bioconductor there are e.g. the DEqMS or DEP packages wich provide sound statistical frameworks. Don't start with putting these medians into custom tests such as t-tests or anything like that. If you really need the averages I suggest you look e.g. at dplyr tutorials, on how to do medians per group. Hint, it will come down to the group_by argument to group the data.frame by peptide (and protein), and then run median on it. Try something please, these kinds of coding skills are essential for a bioinformatician. Happy to help once you get stuck.