Question: Identifying Chromosomal Position from Gene ID, Mutations, and their Locations
1
gravatar for alger_fredericks
5.0 years ago by
United States
alger_fredericks10 wrote:

Hi,

I have a large list of mutations that contains the gene name, the EntrezID, the mutation(e.g. G>A), and the mutation location (e.g. c.230+1). I have even gone as far as to convert the Entrez ID to Ensembl Id, and RefSeq mRNA ID. Is there a way to convert any combination of this information in batches to identify the hg19 chromosomal position? I have tried several methods (Bioconductor, mutalyzer, etc.) and none seem to have a straightforward way to obtain the chromosomal position for both exonic and intronic mutations in batches. 

Thanks,

AF

sequence genome • 5.9k views
ADD COMMENTlink modified 4.8 years ago by P.Taschner10 • written 5.0 years ago by alger_fredericks10
2

If you don't have a transcript ID then you're pretty much screwed. It's often the case that a given gene will have multiple transcripts and since c.X coordinates are transcript-centric, it can be completely ambiguous which position is actually mutated. If all of the genes only have 1 transcript, then you could convert things.

ADD REPLYlink written 5.0 years ago by Devon Ryan90k

I don't have the specific transcript ID but I do have all RefSeq transcript ID's for each gene. I was thinking along the lines of using all transcript IDs for each gene, and then comparing the sequences to narrow it down. I know it wouldn't give me a definitive answer as to which transcript was used to name the mutation, but it would be a starting point. Given that I have a transcript ID to accompany the mutation location, is there anyway to convert that to chromosomal position?

ADD REPLYlink written 5.0 years ago by alger_fredericks10

That could work. There's surprisingly no obvious function to convert from transcript to genomic coordinates in R/Bioconductor (I found a discussion about writing one, but it looks like the person asking was able to just use a bioperl function). This wouldn't actually be difficult to write. The steps would be something like (this is for transcripts on the + strand):

  1. Make/load a TranscriptDb (see GenomicFeatures)
  2. Extract the transcript of interest from step 1
  3. Get a running sum of the exon widths as a vector
  4. Get the index of the first non-negative value of "position-step 3", which is the exon number (and the value of "position-step 3" is the offset into that exon.
  5. Add the value from step 4 to the start of the index from step 4 and then you have your position.

As mentioned, that wouldn't work as-is for transcripts on the - strand, but gives you the idea. For the +1 or similar intronic coordinates, it'd just be a slight tweak to what I wrote. There, you'd find the end of the exon from step 4 and then add the offset into the intron to it.

ADD REPLYlink modified 5.0 years ago • written 5.0 years ago by Devon Ryan90k
1

Is GenomicFeatures::transcriptLocs2RefLocs in the right direction?

ADD REPLYlink written 5.0 years ago by Martin Morgan1.6k
1

What specifically are the coordinates 'c.230+1' relative to?

ADD REPLYlink written 5.0 years ago by Martin Morgan1.6k

The nucleotide position at which the mutation occurs relative to the pre mRNA transcript.

ADD REPLYlink written 5.0 years ago by alger_fredericks10

Can you give the complete info you have for two or three of your mutations as an example, so I can try something out?

ADD REPLYlink written 5.0 years ago by Bert Overduin3.6k

Mutalyzer's Position Converter ( https://mutalyzer.nl/batchPositionConverter) should do the job  The batch option accepts a list of NM_ numbers in combination with the variants.

Example:

NM_058195.3:c.193+1G>A will result in NC_000009.11:g.21994137C>T

The Position Converter will not check the description. The Name Checker cannot check intronic variants unless you replace the NM_ by the corresponding RefSeq Gene NG_ in combination with the Gene symbol and the transcript variant number from the NG_ annotation:

NG_007485.1(CDKN2A_v001):c.193+1G>A

 

ADD REPLYlink written 4.8 years ago by P.Taschner10
1
gravatar for P.Taschner
4.8 years ago by
P.Taschner10
Netherlands
P.Taschner10 wrote:

Mutalyzer's Position Converter ( https://mutalyzer.nl/batchPositionConverter) should do the job  The batch option accepts a list of NM_ numbers in combination with the variants.

Example:

NM_058195.3:c.193+1G>A will result in NC_000009.11:g.21994137C>T

The Position Converter will not check the description. The Name Checker cannot check intronic variants unless you replace the NM_ by the corresponding RefSeq Gene NG_ in combination with the Gene symbol and the transcript variant number from the NG_ annotation:

NG_007485.1(CDKN2A_v001):c.193+1G>A

ADD COMMENTlink written 4.8 years ago by P.Taschner10

Seems like a convenient tool, thanks!

ADD REPLYlink written 4.8 years ago by Devon Ryan90k

Ensembl's Variant Effect Predictor basically does the same.

ADD REPLYlink written 4.8 years ago by Bert Overduin3.6k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2252 users visited in the last hour