Resolving Protein Identification Conflicts
1
0
Entering edit mode
10.1 years ago
Graslevy ▴ 240

Hey Guys, I am trying to solve the problem of resolving protein identification conflicts. After searching Mascot to identify peptides, some peptides are often identified as evidence for various/multiple proteins (some due to homology, others due to errors in DB); the task is to resolve such conflicts by excluding false positive identifications.

In my case, I have proteomics data from C. difficile. To carry out a logistic regression analysis, I have ascribed peptides mapped to proteins from this organism as TRUE and others as FALSE. I am achieving ~94% prediction accuracy overall, with half the false positives correctly excluded..this is unsatisfactory.

Please highlight any problems with my approach and can you suggest alternative approaches to solving this problem? I am aware of the PeptideProphet solution (described mainly for SEQUEST data) but will like a different approach.

proteomics • 1.6k views
ADD COMMENT
0
Entering edit mode
10.0 years ago
Graslevy ▴ 240

I acknowledge there is no easy solution to this problem and some conflicts will persist. I had to choose between using a supervised learning solution (the one described in the question) and a soft classifier (as previously described by others including PeptideProphet). I ended up implementing an expectation maximization classifier; although this solution excludes more false positives, it inadvertently excludes ~20% of the true positives. Compared to the other approach, prediction accuracy is less but it is more dynamic and adaptive.

In the absence of any suggestions, I settled for the soft classifier.

ADD COMMENT

Login before adding your answer.

Traffic: 2485 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6