Question

Getting Genome Coordinates From Refseq Exon Mrna Position Data?

4

Entering edit mode

13.4 years ago

Krisr ▴ 470

I am using bioperl to obtain exon coordinates for a variety of mRNAs.... For example:

#!/usr/bin/perl
use strict;
use Bio::DB::GenBank;
use Data::Dumper;
use Bio::SeqIO;

my @exons;
my $seq;
my $a = Bio::DB::GenBank->new;
my $seq = $a->get_Seq_by_acc('NM_005378');

# Dump Data

for my $feat($seq->get_SeqFeatures) {
  if($feat->primary_tag eq 'exon') {
    push(@exons, $feat->location);
  }
}

I would now like to use Bioperl to obtain the corresponding genomic DNA positions from reference assembly. I am ONLY interested in the corresponding gDNA positions for each reported exon. Does anyone know of a function that could provide this?

conversion refseq transcript coordinates bioperl • 10k views

ADD COMMENT • link updated 13.2 years ago by Reece ▴ 310 • written 13.4 years ago by Krisr ▴ 470

Ram · Answer 1 · 2010-12-02

Not using bioperl , but just mysql. The genes from refSeq have been mapped by the UCSC:

> mysql  -h  genome-mysql.cse.ucsc.edu -A -u genome -D hg18 -e 'select * from refGene where name="NM_005378"\G'
*************************** 1. row ***************************
         bin: 707
        name: NM_005378
       chrom: chr2
      strand: +
     txStart: 15998133
       txEnd: 16004580
    cdsStart: 15999637
      cdsEnd: 16003670
   exonCount: 3
  exonStarts: 15998133,15999520,16003065,
    exonEnds: 15998316,16000427,16004580,
          id: 0
       name2: MYCN
cdsStartStat: cmpl
  cdsEndStat: cmpl
  exonFrames: -1,0,1,

The table is available for download at: http://hgdownload.cse.ucsc.edu/goldenPath/hg18/database/refGene.txt.gz

Ram · Answer 2 · 2011-06-02

2

Entering edit mode

12.9 years ago

Reece ▴ 310

I needed something similar. The only way I worked out was to use NCBI Eutilities to search by NM accession for an id, and then use that id to fetch a "full" record from nuccore as xml. I had to reverse engineer the XML format.

The code is here.

And it works something like this:

apt12j$ ~/projects/bio-hgvs-perl/sandbox/ncbi-tx-exons NM_023035.2
NCBI (NM_023035.2; 1 transcripts)
1361674613617274529
1356592113566026106
1356369013563829140
...

This script was just a sketch to see how to do it. Perhaps it'll help you get started.

Also see http://web.archiveorange.com/archive/v/Nz9aur19OzYnfTAGATKr for a discussion on this topic.

-Reece

ADD COMMENT • link updated 4.6 years ago by Ram 43k • written 12.9 years ago by Reece ▴ 310

0

Entering edit mode

please, ask a new question.

ADD REPLY • link 12.9 years ago by Pierre Lindenbaum 161k