Hey Guys, I am trying to solve the problem of resolving protein identification conflicts. After searching Mascot to identify peptides, some peptides are often identified as evidence for various/multiple proteins (some due to homology, others due to errors in DB); the task is to resolve such conflicts by excluding false positive identifications.
In my case, I have proteomics data from C. difficile. To carry out a logistic regression analysis, I have ascribed peptides mapped to proteins from this organism as TRUE and others as FALSE. I am achieving ~94% prediction accuracy overall, with half the false positives correctly excluded..this is unsatisfactory.
Please highlight any problems with my approach and can you suggest alternative approaches to solving this problem? I am aware of the PeptideProphet solution (described mainly for SEQUEST data) but will like a different approach.