Get Ensembl mapping between the chromosome and scaffold names
Entering edit mode
4.7 years ago
rubic ▴ 270


I'm trying to follow up on this post, for mapping between Ensembl's scaffold names and NCBI's assembly scaffold/chromosome names.

Like in the previous post, as an example, I'm trying to that for the Marmoset genome.

I'm a newbie to Ensembl's API, so this is as much as tried so far:

my $registry->load_registry_from_db(
    -host => '', # alternatively ''
    -user => 'anonymous',
    -species => 'callithrix_jacchus' );

my $slice_adaptor = $registry->get_adaptor( 'Marmoset', 'Core', 'Slice' );

my @slices = @{ $slice_adaptor->fetch_all('scaffold') };

foreach my $slice (@slices) { 
  my $coord_sys  = $slice->coord_system()->name(); 
  my $seq_region = $slice->seq_region_name(); 
  my $start      = $slice->start(); 
  my $end        = $slice->end(); 
  my $strand     = $slice->strand();
  print "Slice: $coord_sys $seq_region $start-$end ($strand)\n"; }

Thinking that this would lead the way to finding out how to print the scaffold synonyms. But actually, @slices ends up being empty.

So my questions are:

  1. Can anyone provide an example of how to obtain the synonyms of a genome's scaffold names so that it maps between Ensembl and NCBI?
  2. Any idea why I'm not getting any Marmoset scaffolds?
ensembl Assembly scaffold chromosome • 1.4k views
Entering edit mode
4.7 years ago
Emily 23k

The highest level of sequence for the genome is the contig. There are no scaffolds.

Entering edit mode

Thanks @Emily_Ensembl. Do you mean specifically for the Marmoset genome? This Ensembl page says that the genome assembly is at the scaffold level.

Entering edit mode
4.7 years ago
crisime ▴ 290

Hi rubic,

1: Bio::EnsEMBL::Slice::get_all_synonyms() is where I would look for your mapping. But I have not tried it.

2: For Ensembl Versions 91 and backwards your code gives me results:

Slice: scaffold ACFV01196627.1 1-2362 (1)
Slice: scaffold ACFV01194171.1 1-3646 (1)
Slice: scaffold GL286765.1 1-12076 (1)
Slice: scaffold ACFV01196229.1 1-2329 (1)
Slice: scaffold ACFV01200623.1 1-2661 (1)
Slice: scaffold ACFV01197136.1 1-3219 (1)
Slice: scaffold ACFV01185903.1 1-9016 (1)

Are this the scaffolds your looking for? There are 16399 in my @slices-array. In later API versions there are no results on scaffold level.

I hope that helps

Entering edit mode

No, the scaffolds I see in the GTF of the latest marmoset genome assembly - Callithrix_jacchus.ASM275486v1.95.gtf, are: NTIC01000001.1, NTIC01000002.1, NTIC01001061.1, ..

The assembly information page also mentions that this genome is in scaffolds. That page also refers to its NCBI assembly accession (GCA_002754865.1), and its NCBI genome page shows that the assembly is in chromosomes.

So basically I'm trying to map between Ensembl's NTIC01000001.1, NTIC01000002.1, NTIC01001061.1, .. scaffolds and NCBI's chromosomes

Entering edit mode

Hi rubic, you are right. The assembly information page states, that the genome is on scafflod level. But if I change your line of code in ensembl 96 to contig level:

my @slices = @{ $slice_adaptor->fetch_all('contig') };

I get 39944 results in my sclices-array. It is the same number as the number of scaffolds according to the information page. Looks like something got mixed up here.

I think your mapping data might just not exist in ensembl. I can't get it from the get_all_synonyms() function. For the ASM275486v1 assembly which Ensembl uses if you Download the full sequence report, you get the NTIC... style IDs on the NCBI-site. So I guess that is just what ensembl used for its genebuild. The chromosomes you see on the site you linked in the other post are from another (older) assembly Callithrix jacchus Callithrix jacchus-3.2. I think what you are looking for is not a mapping between Ensembl identifiers and NCBI identifiers, but between two different assemblies.


Login before adding your answer.

Traffic: 1719 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6