Question

Machine learning with whole genome sequence data in cancer research?

0

Entering edit mode

5.7 years ago

dan ▴ 20

What are some use cases for applying machine learning techniques in cancer research with whole genome sequence data? I'm not interested in variant calling or analysis of images (if a tumour is malign or not). Just in analysis of whole genome sequence data (tumour & normal) for cancer research.

cancer genome • 1.4k views

ADD COMMENT • link 5.7 years ago by dan ▴ 20

2

Entering edit mode

Well you might not be explicitly interested in variant calling and such, but that's what whatever method you use will end up doing under the hood. The whole point of doing genomic sequencing in cancer is to find what's different and whatever you end up predicting/classifying/etc. will be dependent upon that.

ADD REPLY • link 5.7 years ago by Devon Ryan 104k

0

Entering edit mode

yes, I agree that it will depend on that. but I'm looking for something further down the line, once the calling has been done.

ADD REPLY • link 5.7 years ago by dan ▴ 20

1

Entering edit mode

I would think that classifying subtypes would be useful, for example into "currently druggable" and not.

ADD REPLY • link 5.7 years ago by Devon Ryan 104k

0

Entering edit mode

Can you provide a link or 2 of some examples please?

ADD REPLY • link 5.7 years ago by dan ▴ 20

1

Entering edit mode

I don't know if such examples even exist, that's a project idea (I'm doubtful that it'd go anywhere, but then I think much of the machine learning stuff in biology is going no where).

ADD REPLY • link 5.7 years ago by Devon Ryan 104k

2

Entering edit mode

but then I think much of the machine learning stuff in biology is going no where

...and I independently agree with Devon here. I write more, here: A: What is the best way to combine machine learning algorithms for feature selectio

It already feels as if the 'wave' and hype of machine learning has already passed, with some remnants remaining. Maybe we can now get back to actually being serious about solving issues that we face in health sciences instead of jumping from one trend to another and always avoiding the issues.

ADD REPLY • link 5.7 years ago by Kevin Blighe 87k

1

Entering edit mode

You may consider some of the classification algorithms that have been done in the realm of non-coding pathogenicity predictors. I developed a very long presentation on these algorithms, but cannot share it. Nevertheless, the work was interesting enough to be noteworthy for future reference:

CADD (germline variants)
DANN (germline variants)
FATHMM-MKL (germline variants)
GWAVA (germline variants | somatic mutations)
Funseq2 (somatic mutations)
SurfR (rare variants | complex disease variants | all other variants)

These tools mostly used 'machine learning' algorithms. Some, ironically, prove that standard logistic regression is comparable to or better than the very tool that they are reporting, yet these were still published.

ADD REPLY • link 5.7 years ago by Kevin Blighe 87k