Question

Multiple protein ids issue- which protein IDs should be selected for downstream analysis?

0

Entering edit mode

4.6 years ago

harelarik ▴ 90

Inspecting proteinGroups file (results from MaxQuant): There is column named "Majority protein ids". According to Tyanova et al., (Nature_protocols_VOL_1_2016) this column contains at least half of the peptides assigned to a protein group. Thus, this column often contains multiple protein IDs per entry.

In case of multiple protein IDs per one table cell, which protein ID should be selected for downstream analysis? Mainly for assignment of GO ids, and calculation of GO enrichment. 1. Is it better to take the first protein ID in each table cell, which should be the best one, as they are sorted according to the total number of identified peptides? OR 2. Is it better to take all protein IDs in each table cell? This way we are accounting for simultaneous translation of paralogs (while over representing those that were not translated).

Thank you,

Arik

Proteomics enrichment persus • 985 views

ADD COMMENT • link 4.6 years ago by harelarik ▴ 90

score 0 · Answer 1 · 2020-03-05

I was adviced by an expert in the field that: All protein IDs in the "Majority protein IDs" column should be used. If only one protein id is selected, the concern is that if we select a poorly annotated protein we will miss many of the annotations.

IF we want to assign annotations (e.g., GO ids) to an entry in the ProteinGroups file (i.e., one row in the table) which has multiple protein ids, we should take all GOids associated with all of the protein ids in the "Majority protein IDs" table cell. Than, each GO id is counted only once.