Question: Question: From A List Of Gene Symbols To A Bed File With name of the chromosome and Start/end position
0
gravatar for 11yj3312
21 months ago by
11yj33120
11yj33120 wrote:

Hi,y'all, I have a list of Gene Symbols,How can i transform Gene Symbols to a .bed file with name of the chromosome and Start/end position

ADD COMMENTlink modified 21 months ago by Alex Reynolds28k • written 21 months ago by 11yj33120

You should add more information, such as the genome build in which you want co-ordinates (hg19?; hg38?; mm9?; mm10?). Also, are these HGNC gene symbols? Are you only interested in the co-ordinates of the canonical isoform?

You could quite easily just download the GENCODE GTF annotation files from here and then extract the information from these using grep

There is most likely a more automated solution.

ADD REPLYlink written 21 months ago by Kevin Blighe48k
0
gravatar for Devon Ryan
21 months ago by
Devon Ryan92k
Freiburg, Germany
Devon Ryan92k wrote:

A simple method would be to go to Ensembl biomart, select the relevant organism, select what you want (chromosome and start/end position) and then upload the list of gene symbols you have.

ADD COMMENTlink written 21 months ago by Devon Ryan92k
0
gravatar for Alex Reynolds
21 months ago by
Alex Reynolds28k
Seattle, WA USA
Alex Reynolds28k wrote:

If you want to do things in a more automated fashion, you could install the Ensembl Perl API and then run a Perl script (like the one posted below) to grab exons.

#!/usr/bin/env perl

use strict;
use warnings;
use Data::Dumper;
use Bio::EnsEMBL::DBSQL::DBAdaptor;

my $host    = 'ensembldb.ensembl.org';
my $user    = 'anonymous';
my $dbname  = 'homo_sapiens_core_89_38';
my $port    = '3306';
my $species = 'homo_sapiens';
my $group   = 'core';
my $db = new Bio::EnsEMBL::DBSQL::DBAdaptor(-host =>   $host,
                                            -user =>   $user,
                                            -dbname => $dbname,
                                            -port =>   $port);

my $slice_adaptor = $db->get_SliceAdaptor();

my $slices = $slice_adaptor->fetch_all('chromosome');
foreach my $slice (@{$slices}) {
    my $chr = "chr".$slice->seq_region_name();
    my $genes = $slice->get_all_Genes();
    foreach my $gene (@{$genes}) {
        my $exons = $gene->get_all_Exons();
        my $id = $gene->external_name();
        my $exon_index = 1;
        my $exon_number = $exon_index;
        my $exon_count = scalar(@{$exons});        
        foreach my $exon (@{$exons}) {
            my $start = $exon->start();
            my $end = $exon->end();
            if ($start < $end) {
                my $stable_id = $exon->stable_id();
                my $strand = $exon->strand();
                if ($strand == 1) { 
                    $strand = "+";
                    $exon_number = $exon_index;
                } 
                elsif ($strand == -1) { 
                    $strand = "-";
                    $exon_number = $exon_count - $exon_index + 1;
                } 
                else { 
                    die "unknown value for strand\n"; 
                }
                print STDOUT join("\t", ($chr, $start, $end, $id, $exon_number, $strand))."\n";
                $exon_index++;
            }
        }
    }
}

Be sure to change the dbname and species variables depending on your needs.

Once you have exons with a Ensembl names, you can use a Python script like the following to make a translation table to map Ensembl names to HGNC symbol names.

#!/usr/bin/env python

import sys
from mygene import MyGeneInfo

hgnc_names = []
for line in sys.stdin:
    hgnc_names.append('%s' % (line.strip()))

mg = MyGeneInfo()
results = mg.querymany(hgnc_names, scopes='symbol', species='human', verbose=False)

for result in results:
    sys.stdout.write("%s\t%s\n" % (result['symbol'], result['name']))

From here, if you're working with HGNC names, you can process the Perl script output to include HGNC symbols for all exons, and then use grep to find matches for your genes of interest.

ADD COMMENTlink written 21 months ago by Alex Reynolds28k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2064 users visited in the last hour