Suggestion for workflow to predict pathogenicity of SNV and INDELs
0
0
Entering edit mode
3.5 years ago
luffy ▴ 110

I. Prediction using ML medthods

Methods to use Support vector machine (Random Forest?)

  • Download variant data from Clinvar, other DBs and from literature
  • Remove unnecessary columns, missing data and keep c.change, p.change, clinical significance, etc.
  • Use sklearn/pandas(onehotencoder/getdummies) to convert data into binary
  • for Position Specific Method, take c.change/p.change remove c/p and make 3 columns(wild,loc,new) and convert all other charaters into strings
  • split data into train and test and classifiy using SVC
  • draw confusion matrix and check
  • determine cross vaildation using gridsearchcv
  • draw confusion matrix and check
  • classifiy with the newly determined c and gamma
  • classifiy with patients variants

II. Prediction of effects of variants with known prediction tools

  • compare with results of previous step (tools eg. SIFT, polyphen etc)

III. Compare and corelate both results with disease severity - compare and corelate results first two steps with the patient's disease severity and infarance

Questions

  • Does this workflow make sense?
  • Any suggestion/advice/opinion to imporve the workflow
  • For small datasets SVM and random forest(since it uses decision tree) is better when compared to others?
  • Can i use only pathogenic and likely pathogenic or use variants of uncertain significance too for training?

Objective is to predict/corelate pathogenicity of variant with patient's phenotype/severity

why i am predicting rather then using known prediction tools is because i would be using SVM with Position Specific Method on a rare germline disorder with specific genes which are known to cause the disorder.

Thanks in advance for your time

machine learning python pathogenicity SVM workflow • 842 views
ADD COMMENT

Login before adding your answer.

Traffic: 2861 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6