Role of amino-acid change number
5
0
Entering edit mode
6.5 years ago
mangfu100 ▴ 780

Hi all.

I am wondering the meaning of the number of amino-acid change.

For example, I found the somatic mutation that has p.P123F.

so the amino acid change notation is P123F, indicating that the protein change occurs from P to F.

but what does 123 mean? does it refer to iso-form pattern?

my final questions is as below:

Can I ignore the number and group all the mutations as same sets for analysis?

genome sequencing alignment • 4.5k views
0
Entering edit mode

The number refers to the position of this mutation in the sequence.

4
Entering edit mode
6.5 years ago
Ram 35k

P123F --> Amino Acid #123 in the sequence changes from Proline to Phenylalanine. 123 is the residue number and is a critical component of the mutation notation - if not for that, you would not know where the change happens.

No, you cannot ignore this number. There may be exceptions where you could, but your question only says "analysis", not a specific type where this could be possible. If you were to be more specific, we could tell you if your analysis does not depend on position specific differences in mutations.

0
Entering edit mode

I have a set of nonsynonymous mutations from 10 patients with bladder cancer.

Since my cancer type is a very specific subtype of bladder cancer, I focused more on the recurrently mutated gene in my cohort to identify driver mutations and finally I finalized the set of recurrent mutations, resulting in less than 50 genes in the 10 samples.

And then, what I would like to do is that I want to add more information to see if my recurrent mutations have been previously reported in other cancer data. In this regard, I collected the public data on TCGA and tried to see whether they have same position or same effects with my recurrent mutations.

To group the mutations according to the same effects, I came up with the ideas to group them as amino-acid change.

Of course, it would be better that all the information of amino-acid change including number is same, but I found that it is hard to find the genes containing mutations have the identical amino-acid change including number in some lesser-known genes. so I want to compare them without number. This is why I questioned on Biostar to ask you a comment.

Can you suggest any advice for my analysis?  : )

1
Entering edit mode

This is a real issue with this kind of nomenclature - it's specific to a protein of a transcript, so you need to specific which protein isoform the coordinates refer to. As there is no standardisation around this, you cannot rely on everyone talking about the same mutation. If a mutation is commonly clinically known - then people will tend to standardise the use. But automatically generating these identifiers is not a trivial problem. Of course it would be nice if everyone referred to a specific nucleotide change on a specific chromosome of a specific genome build, but they don't ;)

1
Entering edit mode

Just to get the most obvious point out of the way, the numbers are a critical differentiating factor. p.P123F is NOT the same as p.P124F or p.P122F. If the underlying reference sequence were to change, then yes, this is a possibility that the same amino acid has changed, but the mutations are not identical. Identical mutations are when the reference sequence, the residue and the actual amino acid change (as well as the causative DNA/RNA change) - all match up.

NP_xyz.2:p.P123F is globally unique - across time and space. There is no way any other mutation can be identical to it, except another of the same mutation seen in another sample.

1
Entering edit mode

With regards to the "same effects" part, you should look at novel mutations that affect the same secondary structure element as the known mutations. These could then possibly result in similar effects, although that's a long shot. Maybe a structural biologist can help you better, but ignoring the residue number is not the solution. That's like getting into a city subway/metro with the assumption that getting off at any station will have the same effect on time taken to reach your destination.

2
Entering edit mode
6.5 years ago
PoGibas 5.0k
original aa
|position of mutation
| |
p.P123F
|     |
|    aa after mutation
protein*
----
*
p - protein
g - genomic sequence
r - RNA

0
Entering edit mode

Let's not forget good old c.

0
Entering edit mode
6.5 years ago
User 59 13k

This is HGVS nomenculature, see the guidelines here: http://www.hgvs.org/mutnomen/recs-prot.html

0
Entering edit mode
6.2 years ago
Reece ▴ 310

It's also worth pointing out that a location in a sequence, including in the form above, is nearly useless when not associated with a sequence accession (e.g., NP_012345.6 or ENSP012345678). Not having an accession is like giving your address without a street name. Very many genes have multiple transcripts, which means they often have multiple isoforms, so the intended accession is rarely unique when written, and never guaranteed to be unique in the future.

0
Entering edit mode

I like to call it "uniquely identifiable across space (acc no) and time (version)" :)

0
Entering edit mode
6.2 years ago

As far as I can see you have a set of nonsynonymous recurrent mutations in a group of genes. The reasons why you should also include the amino acid number is explained by the colleagues here.

I wanted to little bit help about the comparison you wanted to make. You mentioned earlier that you wanted to compare them in means of amino acid type. One way I can think of this is to look at the distribution of both amino acids along your protein, let's say in windows of 10 residues. And then you can look at their log2 ratio near the site of your mutation. If the ratio is high (let's say 1.5/-1.5) then you can now look at the blosum62 (the matrix choice here can be changed) scores for those two amino acids giving you an idea (only an idea!) about their interchangibility. You can repeat this procedure for the other genes and take a look at your results. If the nearby log2 ratios are similar in all the genes harboring mutations involving those 2 amino acids than you have a hypothesis to buildup. But It is not clear to me whether these alone would be enough. I suggest you back up your work with additional data.

You can do the aforementioned work with a software we have recently published (http://i-pv.org/). I give an example below from FOXP2, ratio of arginines versus lysines. The green graph is the blosum62 agreement graph, the orange one is the log2 ratios. You can check the gif file below:

http://i-pv.org/gifs/AAratio.gif

To generate these graphs for your protein, you will need perl and circos locally installed with all dependent libraries. Then extract the ipv.rar file to anywhere in your PC. If you are lost with the installation let me know, I will generate the html file for your gene of interest.

I hope this helps,