Question

How to identify if certain recurrent SNPs in a given cancer are associated with the down regulation of a given gene?

0

Entering edit mode

6.4 years ago

JJ ▴ 680

Hi all,

I am looking for some tools to identify if certain recurrent SNPs (not SNPs in the gene itself but in other genes) in a given cancer are associated with the down regulation of a given gene.

So I have a cancer dataset comprising of SNPs for each patient (MAF) and expression data (RSEM RNA-seq data) for each patient. I have a particular gene of interest and I want to associate SNPs in other genes with its down regulation. Any ideas how to associate the two? Can anyone point me in the right direction?

Any advise is very much appreciated.

RNA-Seq SNP • 1.5k views

ADD COMMENT • link updated 6.4 years ago by Asaf 10k • written 6.4 years ago by JJ ▴ 680

score 1 · Answer 1 · 2017-11-15

1

Entering edit mode

6.4 years ago

Asaf 10k

You have so much noise in the system, you should take good care or you'll end up with nothing. A few questions you might want to ask yourself:

Is all the expression data generated in the same way? Is there a batch effect?
Do you have a reference tissue to compare expression to or are you just looking at the expression level in the tumor? How would you normalize the expression in either of the options?
Are the SNPs cancer specific? Does it matter to you? (again, reference).

Your major goal is to "align" the data between patients, when you'll have a matrix of SNPs vs patients with data inside and a table of genes (transcripts?) vs patients with expression levels inside most of the work will be behind you and you'll just have to do some relatively simple statistics.

ADD COMMENT • link 6.4 years ago by Asaf 10k

0

Entering edit mode

Thank you very much for your reply!

Yes I am also worried that I will end up with nothing....

1) Generally it's high quality data - the data is generated the same way. I am using normalised data. No batch effect.

2) I have no normals - just expression levels in the tumors (RSEM). I am planning on using the median to define up/down-regulation.

3) The SNPs are cancer-specfic (somatic).

So the first step would be to identify "hotspots" - genes that are mutated multiple times in different patients. Then simply do a Fisher exact test to see if it's significant?

Ay other suggestions? Thanks!!

ADD REPLY • link 6.4 years ago by JJ ▴ 680

1

Entering edit mode

I would imagine you will need to group somatic mutations together in a reasonable way. Otherwise you will be restricted to certain highly prevalent driver genes which have very highly recurrent hotspots, such as V600 in BRAF or G12 in KRAS. Or based on this comment ("So the first step would be to identify "hotspots" - genes that are mutated multiple times in different patients."), are you just using all somatic mutations within driver genes? The latter will definitely have a mixture of passenger mutations that would add substantial noise to any association.

ADD REPLY • link 6.4 years ago by Collin ▴ 1000

0

Entering edit mode

Thanks for your input. originally I was thinking of using all non synonymous SNPs. But yes, you are right. Do you have any suggestions how to do this? I read about MutSig - this appears to be a good option.

ADD REPLY • link 6.4 years ago by JJ ▴ 680

0

Entering edit mode

Sounds like your data is good. Why Fisher? Don't you want to use the actual expression levels for a t-test or Wilcoxon test?

ADD REPLY • link 6.4 years ago by Asaf 10k

0

Entering edit mode

Yes you are right. After a voom transformation of the RSEM values, I could do a t test, correct?

I was first thinking of generating a contingency table like this:

            low exp    high exp
mut            a         b
not mut        c         d

Do you think a t test would be the better choice here?

ADD REPLY • link 6.4 years ago by JJ ▴ 680