Question: Confusion : Why and how does a nucleotide in Reference Genome is differenet from the nucleotide in Refseq mRNA ?
23 months ago by
United States
Dear BioStars Leaders,

I was under the impression that the nucleotide sequence of an mRNA from Refseq would match identically to the Reference Genome assembly. Please correct me if my understanding is incorrect.

At this location : chr1:877831 (GRCh37), the reference nucleotide is "T"

The transcript(gene) at this above genomic location is NM_152486.2(SAMD11).

The genomic coordinate chr1:877831 corresponds to following positions in transcript NM_152486.2:

  1. Transcript Position : 1107 (starts from the first utr)
  2. Codon Position(c) : 1027 (starts from first start codon)

I looked up for the nucleotide at position 1107 in refseq(NM_152486.2), and it is "C".$=seqview

I have plugged in a mutation "chr1:g.877831T>C" in Mutalyzer(Position Convertor tool) and it translates it to "NM_152486.2:c.1027T>C"

After that, I plugged in the converted transcript position "NM_152486.2:c.1027T>C" in Mutalyzer(Name Checker tool), and then is gives an error "T not found at position 1107, found C instead".

I am really confused how the reference genome says the nucleotide is "T" , but the mRNA says it is "C". It would be great if someone could explain this?

Addition to my original question : There is a SNP record for this mutation in dbSNP : rs6672356

Thanks, gsr

gene sequence assembly genome • 921 views
23 months ago by
United States / Los Angeles /
RefSeq can be different from the reference genome. These two have a completely different origin. RefSeq is a curated database of transcripts, while reference genomes are results of a genome assembly. Sets of reads/methods are very different. RefSeq is a stable thing, It does not depend on the reference update and this is why HGVS notation for mutation reporting tells to use RefSeq instead of reference genome.

In my practice, I have seen many differences between RefSeq and genomic references. The craziest and most unexpected difference was the difference is single nucleotide delition in the coding region!

thank you for your answer

23 months ago by
Cambridge, US
Most likely reference genome and RefSeq represent different alleles. Less likely but possible are sequencing errors and RNA editing.

