Question: Confusion : Why and how does a nucleotide in Reference Genome is differenet from the nucleotide in Refseq mRNA ?
1
gravatar for gsr9999
23 months ago by
gsr9999100
United States
gsr9999100 wrote:

Dear BioStars Leaders,

I was under the impression that the nucleotide sequence of an mRNA from Refseq would match identically to the Reference Genome assembly. Please correct me if my understanding is incorrect.

At this location : chr1:877831 (GRCh37), the reference nucleotide is "T"

https://genome.ucsc.edu/cgi-bin/hgTracks?db=hg19&lastVirtModeType=default&lastVirtModeExtraState=&virtModeType=default&virtMode=0&nonVirtPosition=&position=chr1%3A877831%2D877831&hgsid=581652983_N7DHUPIeX4dCBoDQHXu8dfrKaa0U

The transcript(gene) at this above genomic location is NM_152486.2(SAMD11).

The genomic coordinate chr1:877831 corresponds to following positions in transcript NM_152486.2:

  1. Transcript Position : 1107 (starts from the first utr)
  2. Codon Position(c) : 1027 (starts from first start codon)

I looked up for the nucleotide at position 1107 in refseq(NM_152486.2), and it is "C".

https://www.ncbi.nlm.nih.gov/nuccore/NM_152486.2?report=gbwithparts&log$=seqview

I have plugged in a mutation "chr1:g.877831T>C" in Mutalyzer(Position Convertor tool) and it translates it to "NM_152486.2:c.1027T>C" https://www.mutalyzer.nl/position-converter?assembly_name_or_alias=GRCh37&description=chr1%3Ag.877831T%3EC

After that, I plugged in the converted transcript position "NM_152486.2:c.1027T>C" in Mutalyzer(Name Checker tool), and then is gives an error "T not found at position 1107, found C instead". https://www.mutalyzer.nl/name-checker?description=NM_152486.2%3Ac.1027T%3EC

I am really confused how the reference genome says the nucleotide is "T" , but the mRNA says it is "C". It would be great if someone could explain this?

Addition to my original question : There is a SNP record for this mutation in dbSNP : rs6672356 https://www.ncbi.nlm.nih.gov/projects/SNP/snp_ref.cgi?rs=6672356

Thanks, gsr

gene sequence assembly genome • 921 views
ADD COMMENTlink modified 23 months ago by Christian2.7k • written 23 months ago by gsr9999100
3
gravatar for Petr Ponomarenko
23 months ago by
United States / Los Angeles / ALAPY.com
Petr Ponomarenko2.6k wrote:

RefSeq can be different from the reference genome. These two have a completely different origin. RefSeq is a curated database of transcripts, while reference genomes are results of a genome assembly. Sets of reads/methods are very different. RefSeq is a stable thing, It does not depend on the reference update and this is why HGVS notation for mutation reporting tells to use RefSeq instead of reference genome.

In my practice, I have seen many differences between RefSeq and genomic references. The craziest and most unexpected difference was the difference is single nucleotide delition in the coding region!

ADD COMMENTlink written 23 months ago by Petr Ponomarenko2.6k

thank you for your answer

ADD REPLYlink written 23 months ago by gsr9999100
0
gravatar for Christian
23 months ago by
Christian2.7k
Cambridge, US
Christian2.7k wrote:

Most likely reference genome and RefSeq represent different alleles. Less likely but possible are sequencing errors and RNA editing.

ADD COMMENTlink written 23 months ago by Christian2.7k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 827 users visited in the last hour