Question: Question: From A List Of Gene Symbols To A Bed File With name of the chromosome and Start/end position
0
gravatar for 11yj3312
2.6 years ago by
11yj33120
11yj33120 wrote:

Hi,y'all, I have a list of Gene Symbols,How can i transform Gene Symbols to a .bed file with name of the chromosome and Start/end position

ADD COMMENTlink modified 2.6 years ago by Alex Reynolds30k • written 2.6 years ago by 11yj33120

You should add more information, such as the genome build in which you want co-ordinates (hg19?; hg38?; mm9?; mm10?). Also, are these HGNC gene symbols? Are you only interested in the co-ordinates of the canonical isoform?

You could quite easily just download the GENCODE GTF annotation files from here and then extract the information from these using grep

There is most likely a more automated solution.

ADD REPLYlink written 2.6 years ago by Kevin Blighe63k
0
gravatar for Devon Ryan
2.6 years ago by
Devon Ryan96k
Freiburg, Germany
Devon Ryan96k wrote:

A simple method would be to go to Ensembl biomart, select the relevant organism, select what you want (chromosome and start/end position) and then upload the list of gene symbols you have.

ADD COMMENTlink written 2.6 years ago by Devon Ryan96k
0
gravatar for Alex Reynolds
2.6 years ago by
Alex Reynolds30k
Seattle, WA USA
Alex Reynolds30k wrote:

If you want to do things in a more automated fashion, you could install the Ensembl Perl API and then run a Perl script (like the one posted below) to grab exons.

#!/usr/bin/env perl

use strict;
use warnings;
use Data::Dumper;
use Bio::EnsEMBL::DBSQL::DBAdaptor;

my $host    = 'ensembldb.ensembl.org';
my $user    = 'anonymous';
my $dbname  = 'homo_sapiens_core_89_38';
my $port    = '3306';
my $species = 'homo_sapiens';
my $group   = 'core';
my $db = new Bio::EnsEMBL::DBSQL::DBAdaptor(-host =>   $host,
                                            -user =>   $user,
                                            -dbname => $dbname,
                                            -port =>   $port);

my $slice_adaptor = $db->get_SliceAdaptor();

my $slices = $slice_adaptor->fetch_all('chromosome');
foreach my $slice (@{$slices}) {
    my $chr = "chr".$slice->seq_region_name();
    my $genes = $slice->get_all_Genes();
    foreach my $gene (@{$genes}) {
        my $exons = $gene->get_all_Exons();
        my $id = $gene->external_name();
        my $exon_index = 1;
        my $exon_number = $exon_index;
        my $exon_count = scalar(@{$exons});        
        foreach my $exon (@{$exons}) {
            my $start = $exon->start();
            my $end = $exon->end();
            if ($start < $end) {
                my $stable_id = $exon->stable_id();
                my $strand = $exon->strand();
                if ($strand == 1) { 
                    $strand = "+";
                    $exon_number = $exon_index;
                } 
                elsif ($strand == -1) { 
                    $strand = "-";
                    $exon_number = $exon_count - $exon_index + 1;
                } 
                else { 
                    die "unknown value for strand\n"; 
                }
                print STDOUT join("\t", ($chr, $start, $end, $id, $exon_number, $strand))."\n";
                $exon_index++;
            }
        }
    }
}

Be sure to change the dbname and species variables depending on your needs.

Once you have exons with a Ensembl names, you can use a Python script like the following to make a translation table to map Ensembl names to HGNC symbol names.

#!/usr/bin/env python

import sys
from mygene import MyGeneInfo

hgnc_names = []
for line in sys.stdin:
    hgnc_names.append('%s' % (line.strip()))

mg = MyGeneInfo()
results = mg.querymany(hgnc_names, scopes='symbol', species='human', verbose=False)

for result in results:
    sys.stdout.write("%s\t%s\n" % (result['symbol'], result['name']))

From here, if you're working with HGNC names, you can process the Perl script output to include HGNC symbols for all exons, and then use grep to find matches for your genes of interest.

ADD COMMENTlink written 2.6 years ago by Alex Reynolds30k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1685 users visited in the last hour