Question: Confusion : Why and how does a nucleotide in Reference Genome is differenet from the nucleotide in Refseq mRNA ?
gravatar for gsr9999
2.9 years ago by
United States
gsr9999120 wrote:

Dear BioStars Leaders,

I was under the impression that the nucleotide sequence of an mRNA from Refseq would match identically to the Reference Genome assembly. Please correct me if my understanding is incorrect.

At this location : chr1:877831 (GRCh37), the reference nucleotide is "T"

The transcript(gene) at this above genomic location is NM_152486.2(SAMD11).

The genomic coordinate chr1:877831 corresponds to following positions in transcript NM_152486.2:

  1. Transcript Position : 1107 (starts from the first utr)
  2. Codon Position(c) : 1027 (starts from first start codon)

I looked up for the nucleotide at position 1107 in refseq(NM_152486.2), and it is "C".$=seqview

I have plugged in a mutation "chr1:g.877831T>C" in Mutalyzer(Position Convertor tool) and it translates it to "NM_152486.2:c.1027T>C"

After that, I plugged in the converted transcript position "NM_152486.2:c.1027T>C" in Mutalyzer(Name Checker tool), and then is gives an error "T not found at position 1107, found C instead".

I am really confused how the reference genome says the nucleotide is "T" , but the mRNA says it is "C". It would be great if someone could explain this?

Addition to my original question : There is a SNP record for this mutation in dbSNP : rs6672356

Thanks, gsr

gene sequence assembly genome • 1.2k views
ADD COMMENTlink modified 2.9 years ago by Christian2.8k • written 2.9 years ago by gsr9999120
gravatar for Petr Ponomarenko
2.9 years ago by
United States / Los Angeles /
Petr Ponomarenko2.6k wrote:

RefSeq can be different from the reference genome. These two have a completely different origin. RefSeq is a curated database of transcripts, while reference genomes are results of a genome assembly. Sets of reads/methods are very different. RefSeq is a stable thing, It does not depend on the reference update and this is why HGVS notation for mutation reporting tells to use RefSeq instead of reference genome.

In my practice, I have seen many differences between RefSeq and genomic references. The craziest and most unexpected difference was the difference is single nucleotide delition in the coding region!

ADD COMMENTlink written 2.9 years ago by Petr Ponomarenko2.6k

thank you for your answer

ADD REPLYlink written 2.9 years ago by gsr9999120

I just wanted to add that RefSeq is a curated database of not only transcripts but genomes, proteins as well

ADD REPLYlink written 4 months ago by ensakz0
gravatar for Christian
2.9 years ago by
Cambridge, US
Christian2.8k wrote:

Most likely reference genome and RefSeq represent different alleles. Less likely but possible are sequencing errors and RNA editing.

ADD COMMENTlink written 2.9 years ago by Christian2.8k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1145 users visited in the last hour