Chromosomal location not matching the mRNA sequence
1
0
Entering edit mode
22 months ago

Hello everyone, I am trying to figure out about rs6165 (ref allele C and alt allele T/A/G) and rs6166 (ref allele C and alt allele T) variants.

When I go for the mRNA rs6165 (c.919G>A) and I try to locate it shows A rather than getting a G.

and similarly, for rs6166 (c.2039G>A) it shows T as a reference, on the transcript mrna (NM_000145.4).

The mutations are of the FSHR gene, is it because the gene is located on the negative strand? But still, I don't understand the purine and pyrimidine change. And all the literature survey on FSHR shows 919A>G (instead of 919G>A) and 2039A>G (not 2039G>A).

The chromosomal position shows C for both.

chromosomal location of rs6165 (Shows C)

chromosomal location of rs6166 (Shows C)

mRNA location of c.919G>A (Shows A)

mRNA location of c.2039G>A (Shows T)

transcription NCBI mutation Variant ClinVar • 443 views
ADD COMMENT
0
Entering edit mode
22 months ago

The rs6166 (c.2039G>A) nomenclature refers to coding sequence coordinate (note the c there).

That being said HGVS nomenclature is difficult to read and verify without proper tooling - which sadly did not exists ... until the magical bio package came along ... :-)

Let's investigate with bio:

Get the data:

  bio fetch NM_000145.4 > NM_000145.gb

see what the data contains as FASTA file, it is the entire transcript:

cat NM_000145.gb | bio fasta | head -2

prints:

>NM_000145.4 {"title": "Homo sapiens follicle stimulating hormone receptor (FSHR), transcript variant 1, mRNA", "type": "source"}
AGATCTCTTCTCATAAGGGCACTGTGTGGAGCTTCTGAGATCTGTGGAGGTTTTTCTCTG

note how it says mRNA there, so it is a transcript of some sorts. Now let's print the CDS region only:

cat NM_000145.gb | bio fasta -type CDS | head -2

it now prints:

>NP_000136.2 {"type": "CDS", "gene": "FSHR", "product": "follicle-stimulating hormone receptor isoform 1 precursor", "locus": ""}
ATGGCCCTGCTCCTGGTCTCTTTGCTGGCATTCCTGAGCTTGGGCTCAGGATGTCATCAT

you see how the transcript NM_000145 contains a coding sequence with accession NP_000136. Now let's check position 2039G>A on this coding sequence:

cat NM_000145.gb | bio fasta -type CDS -start 2039 -end 2039 

now it prints the G that we were looking for all along:

>NP_000136.2 {"type": "CDS", "gene": "FSHR", "product": "follicle-stimulating hormone receptor isoform 1 precursor", "locus": ""}
G
ADD COMMENT

Login before adding your answer.

Traffic: 1478 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6