Question

How to estimate whether more than one copy of a gene is mutated?

2

Entering edit mode

7.7 years ago

neo.karoshi ▴ 20

I hoping that someone can provide shed some light on this issue.

If from your analysis of sequencing a given cancer cell line, you detect a somatic mutation in a particular gene (e.g. V600E in the BRAF gene):

1) Is there any way to establish whether this mutation is present in both copies of the gene?

2) the latter assuming that the chromosome containing the gene is still diploid: what if the gene is amplified?

3) What if instead a homogeneous cell lines we have instead a primary tumour cultures: how its heterogeneity would affect this determination?

Many thanks in advance for any insights or pointers you can provide!

SNP genome sequencing • 1.8k views

ADD COMMENT • link updated 7.7 years ago by DG 7.3k • written 7.7 years ago by neo.karoshi ▴ 20

score 1 · Answer 1 · 2016-08-03

Hi,

I'm not a specialist on the issue but maybe I can give you hints. To have an idea if the mutation is present on both copies you need the allelic frequencies of ref and alt (based on the number of reads supporting this position).

If you have only reads with the mutation, you can infer that the mutation is present on both copies (or that one copy is deleted).

If you have both reads with alt and ref you can guess that both copies are present, one mutated, the other not, but it doesn't tell you anything about the number of copies.

To have information about copy number variation you have to use tools dedicated to this task:
https://omictools.com/cnv-detection2-category

https://omictools.com/cnv-detection3-category

With primary tumour cultures it's much more difficult to determine ploidy with ngs data. Indeed the heterogeneity impacts the allelic ratios, and it has to be take into account in the analysis. I think the best way to have an accurate idea of CNV is still to use CGH.

Hope this helps,

score 1 · Answer 2 · 2016-08-03

1

Entering edit mode

7.7 years ago

DG 7.3k

Its a little hard to do it directly since you have so many factors influencing your allelic rations as Guillaume points out. In the simplest case if you essentially have 100% of the sequencing reads contain the mutation and a "pure" tumour sample then you have a good idea that all copies of the gene present in the sample have the mutation. But this could be due to hemizygosity as easily as the mutation being present homozygously. If you are lucky enough that there is a nearby SNP that would be present in the same sequencing read you can interrogate using that as well, but that situation isn't that common.

If you are more interested in whether there are functional "normal" transcripts in the tumour than RNA-Seq is a much better way of evaluating that. This is one of those situations where knowing the biological question you are actually trying to answer really helps, because there may be other ways of addressing it.

ADD COMMENT • link 7.7 years ago by DG 7.3k

0

Entering edit mode

Thanks Dan. The biological question would be how many variants of the gene are there in a heterogeneous tumour sample. RNA-seq sounds suitable on paper, but it is not a mature technology and I don't know to which extent it does what it says in the can. For instance, drug sensitivity models based on gene expression are more predictive than those based on RNA-seq (http://www.nature.com/nbt/journal/v32/n12/fig_tab/nbt.2877_F4.html), but in theory it shouldn't be like this. Any recommended reading on the issue is welcome.

ADD REPLY • link 7.7 years ago by neo.karoshi ▴ 20

1

Entering edit mode

Its always a bit tricky. Perhaps because I do it quite frequently I wouldn't describe the technology as immature, but genomics as a whole is a fast moving target with constant development, particularly on the software side. After all, in that data, from a biological perspective the anti-body based RPPA data should be the best and most relevant versus mRNA levels.

I'd be curious if a similar study would have the same results today. Most of that work was done in 2012/2013 and the software side has changed quite a lot, typical depths of sequencing have changed, etc.

ADD REPLY • link 7.7 years ago by DG 7.3k

0

Entering edit mode

I am curious too.

Please excuse my choice of words. What I meant is that RNA-seq seems a less mature technology than DNA microarrays when applied to this problem given these results (my definition of "technology" includes the software that interprets the signal from the device).

ADD REPLY • link 7.7 years ago by neo.karoshi ▴ 20

0

Entering edit mode

Microarrays have certainly been around longer, so that always helps. But are also known to have their own issues, particularly when it comes to dynamic range. And a lot of the historical microarray data analyses had been done with bad statistical comparisons but that doesn't pertain to this particular case. My guess from looking at this data is that the RNA-Seq was done at fairly low depth and likely without good replicate data, which is why it wasn't as accurate or predictive in that case. It is also confounded by the set up of the comparison. People wrote individual predictive algorithms and what we are seeing are the results of the best algorithm. If they happened to write a predictor that better modelled the gene expression array data than RNA-Seq data, it stands to reason it would have been more predictive. So most of those comparisons are essentially two-factor data where you are seeing the joint predictive effects of the prediction algorithm and the data views used.

ADD REPLY • link 7.7 years ago by DG 7.3k