Question: Gene Coordinates For A Gene Family Members From The Mouse Genome
1
gravatar for Anima Mundi
7.6 years ago by
Anima Mundi2.5k
Italy
Anima Mundi2.5k wrote:

Hello,

how could I extract all the gene coordinates (i.e. in BED format) for a given gene family from the mouse genome? I would like to start from the Ensembl mm9 version.

gene bed coordinates ensembl mouse • 2.3k views
ADD COMMENTlink written 7.6 years ago by Anima Mundi2.5k
2
gravatar for Javier Herrero
7.6 years ago by
Javier Herrero290 wrote:

If you are not afraid of a little Perl, you can use the Ensembl API for this. When you say you want all the mouse genes in a given family, do you refer to the Ensembl families or to the Ensembl GeneTrees? Families are clusters of Ensembl and UniProt proteins and GeneTrees are phylogenetic trees build using all Ensembl genes. See http://www.ensembl.org/info/docs/compara/family.html and http://www.ensembl.org/info/docs/compara/homology_method.html for a description of both pipelines.

If you want to get the coordinates for the Ensembl families, these few lines of code would do the work: [?] use Bio::EnsEMBL::Registry;

my $url = 'mysql://anonymous@ensembldb.ensembl.org'; my $gene_stable_id = "ENSMUSG00000056602"; my $species_name = "mus_musculus";

my $reg = "Bio::EnsEMBL::Registry";

$reg->load_registry_from_url($url); my $compara_dba = $reg->get_DBAdaptor("Multi", "compara");

my $genome_db_adaptor = $compara_dba->get_GenomeDBAdaptor(); my $member_adaptor = $compara_dba->get_MemberAdaptor(); my $family_adaptor = $compara_dba->get_FamilyAdaptor();

my $genome_db = $genome_db_adaptor->fetch_by_registry_name($species_name);

my $member = $member_adaptor->fetch_by_source_stable_id("ENSEMBLGENE", $gene_stable_id); my $families = $family_adaptor->fetch_all_by_Member($member);

foreach my $family (@$families) { foreach my $member (@{$family->get_all_Members}) { next if ($member->source_name ne "ENSEMBLGENE"); next if ($member->genome_db ne $genome_db); print join("t", 'chr'.$member->chr_name, ($member->chr_start-1), $member->chr_end, $member->stable_id, ".", $member->chr_strand==1?"+":"-"), "n"; } } [?]

If you want the genes from the Ensembl GeneTrees, use this bit of code instead: [?] [...] my $genome_db = $genome_db_adaptor->fetch_by_registry_name($species_name);

my $member = $member_adaptor->fetch_by_source_stable_id("ENSEMBLGENE", $gene_stable_id);

my $gene_tree = $protein_tree_adaptor->fetch_by_Member_root_id($member); foreach my $leaf (@{$gene_tree->get_all_leaves}) { next if (!$leaf->genome_db_id or $leaf->genome_db ne $genome_db); my $member = $leaf->gene_member; print join("t", 'chr'.$member->chr_name, ($member->chr_start-1), $member->chr_end, $member->stable_id, ".", $member->chr_strand==1?"+":"-"), "n"; } [?]

To install the Ensembl Perl API, follow the instructions at http://www.ensembl.org/info/docs/api/api_installation.html

You can use external identifiers or names. Note that the method (fetch_all_by_external_name) can potentially return more than one gene as there is no guarantee that the name is unique

[?]

ADD COMMENTlink modified 7.6 years ago • written 7.6 years ago by Javier Herrero290
1

Hello, thanks for the precious and detailed answer. What I cannot understand (probably it is a silly point, as I am a beginner in scripting), is how could I get the IDs of the families I am interested in. For example, where could I retrieve the ID for the UBF/HMG family?

ADD REPLYlink written 7.6 years ago by Anima Mundi2.5k

You can use external identifiers or names. Note that the method (fetchallbyexternalname) can potentially return more than one gene as there is no guarantee that the name is unique

my $geneadaptor = $reg->getadaptor("mouse", "core", "Gene"); my $genes = $geneadaptor->fetchallbyexternal_name("Ubtf");

foreach my $thisgene (@$genes) { my $member = $memberadaptor->fetchbysourcestableid("ENSEMBLGENE", $thisgene->stableid); [...] }

ADD REPLYlink written 7.6 years ago by Javier Herrero290

You can use external identifiers or names. Note that the method (fetchallbyexternalname) can potentially return more than one gene as there is no guarantee that the name is unique

my $gene_adaptor = $reg->get_adaptor("mouse", "core", "Gene");
my $genes = $gene_adaptor->fetch_all_by_external_name("Ubtf");

foreach my $this_gene (@$genes) {
  my $member = $member_adaptor->fetch_by_source_stable_id(
        "ENSEMBLGENE", $this_gene->stable_id);
  [...]
}
ADD REPLYlink written 7.6 years ago by Javier Herrero290

I have edited the answer to show how to get an Ensembl stable ID from an external name or identifier.

ADD REPLYlink written 7.6 years ago by Javier Herrero290

Perfect, thanks again.

ADD REPLYlink written 7.6 years ago by Anima Mundi2.5k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2068 users visited in the last hour