Question: Getting Genome Coordinates From Refseq Exon Mrna Position Data?
4
gravatar for Krisr
8.4 years ago by
Krisr460
United States
Krisr460 wrote:

I am using bioperl to obtain exon coordinates for a variety of mRNAs.... For example:

#!/usr/bin/perl
use strict;
use Bio::DB::GenBank;
use Data::Dumper;
use Bio::SeqIO;

my @exons;
my $seq;
my $a = Bio::DB::GenBank->new;
my $seq = $a->get_Seq_by_acc('NM_005378');

# Dump Data

for my $feat($seq->get_SeqFeatures) {
  if($feat->primary_tag eq 'exon') {
    push(@exons, $feat->location);
  }
}

I would now like to use Bioperl to obtain the corresponding genomic DNA positions from reference assembly. I am ONLY interested in the corresponding gDNA positions for each reported exon. Does anyone know of a function that could provide this?

ADD COMMENTlink modified 8.2 years ago by Reece250 • written 8.4 years ago by Krisr460
12
gravatar for Pierre Lindenbaum
8.4 years ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum119k wrote:

Not using bioperl , but just mysql. The genes from refSeq have been mapped by the UCSC:

> mysql  -h  genome-mysql.cse.ucsc.edu -A -u genome -D hg18 -e 'select * from refGene where name="NM_005378"\G'
*************************** 1. row ***************************
         bin: 707
        name: NM_005378
       chrom: chr2
      strand: +
     txStart: 15998133
       txEnd: 16004580
    cdsStart: 15999637
      cdsEnd: 16003670
   exonCount: 3
  exonStarts: 15998133,15999520,16003065,
    exonEnds: 15998316,16000427,16004580,
          id: 0
       name2: MYCN
cdsStartStat: cmpl
  cdsEndStat: cmpl
  exonFrames: -1,0,1,

the table is available for download at: http://hgdownload.cse.ucsc.edu/goldenPath/hg18/database/refGene.txt.gz

ADD COMMENTlink written 8.4 years ago by Pierre Lindenbaum119k

Thanks Pierre, this really helped me.

ADD REPLYlink written 7.6 years ago by A.L0
2
gravatar for Reece
7.9 years ago by
Reece250
United States
Reece250 wrote:

I needed something similar. The only way I worked out was to use NCBI Eutilities to search by NM accession for an id, and then use that id to fetch a "full" record from nuccore as xml. I had to reverse engineer the XML format.

The code is here: https://bitbucket.org/reece/bio-hgvs-perl/src/84181f38d092/sandbox/ncbi-tx-exons

And it works something like this:

apt12j$ ~/projects/bio-hgvs-perl/sandbox/ncbi-tx-exons NM_023035.2
NCBI (NM_023035.2; 1 transcripts)
1361674613617274529
1356592113566026106
1356369013563829140
...

This script was just a sketch to see how to do it. Perhaps it'll help you get started.

Also see http://web.archiveorange.com/archive/v/Nz9aur19OzYnfTAGATKr for a discussion on this topic.

-Reece

ADD COMMENTlink written 7.9 years ago by Reece250

please, ask a new question.

ADD REPLYlink written 7.9 years ago by Pierre Lindenbaum119k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1068 users visited in the last hour