Getting Genome Coordinates From Refseq Exon Mrna Position Data?
2
4
Entering edit mode
10.9 years ago
Krisr ▴ 470

I am using bioperl to obtain exon coordinates for a variety of mRNAs.... For example:

#!/usr/bin/perl
use strict;
use Bio::DB::GenBank;
use Data::Dumper;
use Bio::SeqIO;

my @exons;
my $seq;
my $a = Bio::DB::GenBank->new;
my $seq = $a->get_Seq_by_acc('NM_005378');

# Dump Data

for my $feat($seq->get_SeqFeatures) {
  if($feat->primary_tag eq 'exon') {
    push(@exons, $feat->location);
  }
}

I would now like to use Bioperl to obtain the corresponding genomic DNA positions from reference assembly. I am ONLY interested in the corresponding gDNA positions for each reported exon. Does anyone know of a function that could provide this?

conversion refseq transcript coordinates bioperl • 9.1k views
ADD COMMENT
12
Entering edit mode
10.9 years ago

Not using bioperl , but just mysql. The genes from refSeq have been mapped by the UCSC:

> mysql  -h  genome-mysql.cse.ucsc.edu -A -u genome -D hg18 -e 'select * from refGene where name="NM_005378"\G'
*************************** 1. row ***************************
         bin: 707
        name: NM_005378
       chrom: chr2
      strand: +
     txStart: 15998133
       txEnd: 16004580
    cdsStart: 15999637
      cdsEnd: 16003670
   exonCount: 3
  exonStarts: 15998133,15999520,16003065,
    exonEnds: 15998316,16000427,16004580,
          id: 0
       name2: MYCN
cdsStartStat: cmpl
  cdsEndStat: cmpl
  exonFrames: -1,0,1,

The table is available for download at: http://hgdownload.cse.ucsc.edu/goldenPath/hg18/database/refGene.txt.gz

ADD COMMENT
0
Entering edit mode

Thanks Pierre, this really helped me.

ADD REPLY
2
Entering edit mode
10.4 years ago
Reece ▴ 310

I needed something similar. The only way I worked out was to use NCBI Eutilities to search by NM accession for an id, and then use that id to fetch a "full" record from nuccore as xml. I had to reverse engineer the XML format.

The code is here.

And it works something like this:

apt12j$ ~/projects/bio-hgvs-perl/sandbox/ncbi-tx-exons NM_023035.2
NCBI (NM_023035.2; 1 transcripts)
1361674613617274529
1356592113566026106
1356369013563829140
...

This script was just a sketch to see how to do it. Perhaps it'll help you get started.

Also see http://web.archiveorange.com/archive/v/Nz9aur19OzYnfTAGATKr for a discussion on this topic.

-Reece

ADD COMMENT
0
Entering edit mode

please, ask a new question.

ADD REPLY

Login before adding your answer.

Traffic: 1084 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6