Question: Role of amino-acid change number
gravatar for mangfu100
4.8 years ago by
Korea, Republic Of
mangfu100730 wrote:

Hi all.

I am wondering the meaning of the number of amino-acid change.

For example, I found the somatic mutation that has p.P123F.

so the amino acid change notation is P123F, indicating that the protein change occurs from P to F.

but what does 123 mean? does it refer to iso-form pattern?

my final questions is as below:

Can I ignore the number and group all the mutations as same sets for analysis?

sequencing alignment genome • 3.3k views
ADD COMMENTlink modified 2.3 years ago by Biostar ♦♦ 20 • written 4.8 years ago by mangfu100730

The number refers to the position of this mutation in the sequence.

ADD REPLYlink modified 4.8 years ago • written 4.8 years ago by a.zielezinski9.1k
gravatar for RamRS
4.8 years ago by
Houston, TX
RamRS27k wrote:

P123F --> Amino Acid #123 in the sequence changes from Proline to Phenylalanine. 123 is the residue number and is a critical component of the mutation notation - if not for that, you would not know where the change happens.

No, you cannot ignore this number. There may be exceptions where you could, but your question only says "analysis", not a specific type where this could be possible. If you were to be more specific, we could tell you if your analysis does not depend on position specific differences in mutations.

ADD COMMENTlink written 4.8 years ago by RamRS27k

Thanks for your reply.

I have a set of nonsynonymous mutations from 10 patients with bladder cancer.

Since my cancer type is a very specific subtype of bladder cancer, I focused more on the recurrently mutated gene in my cohort to identify driver mutations and finally I finalized the set of recurrent mutations, resulting in less than 50 genes in the 10 samples.

And then, what I would like to do is that I want to add more information to see if my recurrent mutations have been previously reported in other cancer data. In this regard, I collected the public data on TCGA and tried to see whether they have same position or same effects with my recurrent mutations.

To group the mutations according to the same effects, I came up with the ideas to group them as amino-acid change.

Of course, it would be better that all the information of amino-acid change including number is same, but I found that it is hard to find the genes containing mutations have the identical amino-acid change including number in some lesser-known genes. so I want to compare them without number. This is why I questioned on Biostar to ask you a comment.

Can you suggest any advice for my analysis?  : ) 

ADD REPLYlink modified 4.8 years ago • written 4.8 years ago by mangfu100730

This is a real issue with this kind of nomenclature - it's specific to a protein of a transcript, so you need to specific which protein isoform the coordinates refer to. As there is no standardisation around this, you cannot rely on everyone talking about the same mutation. If a mutation is commonly clinically known - then people will tend to standardise the use. But automatically generating these identifiers is not a trivial problem. Of course it would be nice if everyone referred to a specific nucleotide change on a specific chromosome of a specific genome build, but they don't ;)

ADD REPLYlink written 4.8 years ago by Daniel Swan13k

Just to get the most obvious point out of the way, the numbers are a critical differentiating factor. p.P123F is NOT the same as p.P124F or p.P122F. If the underlying reference sequence were to change, then yes, this is a possibility that the same amino acid has changed, but the mutations are not identical. Identical mutations are when the reference sequence, the residue and the actual amino acid change (as well as the causative DNA/RNA change) - all match up.

NP_xyz.2:p.P123F is globally unique - across time and space. There is no way any other mutation can be identical to it, except another of the same mutation seen in another sample.

ADD REPLYlink written 4.8 years ago by RamRS27k

With regards to the "same effects" part, you should look at novel mutations that affect the same secondary structure element as the known mutations. These could then possibly result in similar effects, although that's a long shot. Maybe a structural biologist can help you better, but ignoring the residue number is not the solution. That's like getting into a city subway/metro with the assumption that getting off at any station will have the same effect on time taken to reach your destination.

ADD REPLYlink written 4.8 years ago by RamRS27k
gravatar for PoGibas
4.8 years ago by
PoGibas4.8k wrote:
original aa 
  |position of mutation
  | |
|     |   
|    aa after mutation
p - protein
g - genomic sequence
r - RNA
ADD COMMENTlink modified 6 months ago by RamRS27k • written 4.8 years ago by PoGibas4.8k

Let's not forget good old c.

ADD REPLYlink written 4.6 years ago by RamRS27k
gravatar for Daniel Swan
4.8 years ago by
Daniel Swan13k
Aberdeen, UK
Daniel Swan13k wrote:

This is HGVS nomenculature, see the guidelines here:

ADD COMMENTlink written 4.8 years ago by Daniel Swan13k
gravatar for Reece
4.6 years ago by
United States
Reece270 wrote:

It's also worth pointing out that a location in a sequence, including in the form above, is nearly useless when not associated with a sequence accession (e.g., NP_012345.6 or ENSP012345678). Not having an accession is like giving your address without a street name. Very many genes have multiple transcripts, which means they often have multiple isoforms, so the intended accession is rarely unique when written, and never guaranteed to be unique in the future.

ADD COMMENTlink written 4.6 years ago by Reece270

I like to call it "uniquely identifiable across space (acc no) and time (version)" :)

ADD REPLYlink written 4.6 years ago by RamRS27k
gravatar for Ibrahim Tanyalcin
4.6 years ago by
Ibrahim Tanyalcin1.1k wrote:

As far as I can see you have a set of nonsynonymous recurrent mutations in a group of genes. The reasons why you should also include the amino acid number is explained by the colleagues here.

I wanted to little bit help about the comparison you wanted to make. You mentioned earlier that you wanted to compare them in means of amino acid type. One way I can think of this is to look at the distribution of both amino acids along your protein, let's say in windows of 10 residues. And then you can look at their log2 ratio near the site of your mutation. If the ratio is high (let's say 1.5/-1.5) then you can now look at the blosum62 (the matrix choice here can be changed) scores for those two amino acids giving you an idea (only an idea!) about their interchangibility. You can repeat this procedure for the other genes and take a look at your results. If the nearby log2 ratios are similar in all the genes harboring mutations involving those 2 amino acids than you have a hypothesis to buildup. But It is not clear to me whether these alone would be enough. I suggest you back up your work with additional data.

You can do the aforementioned work with a software we have recently published ( I give an example below from FOXP2, ratio of arginines versus lysines. The green graph is the blosum62 agreement graph, the orange one is the log2 ratios. You can check the gif file below:

To generate these graphs for your protein, you will need perl and circos locally installed with all dependent libraries. Then extract the ipv.rar file to anywhere in your PC. If you are lost with the installation let me know, I will generate the html file for your gene of interest.

I hope this helps,

good luck with your research,

ADD COMMENTlink written 4.6 years ago by Ibrahim Tanyalcin1.1k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1528 users visited in the last hour