Question: Availability of pre-ensembl information in biomart
gravatar for longoka
3.3 years ago by
longoka40 wrote:

Is anyone aware whether pre-ensembl information is available through a biomart query (I often use the biomaRt::biomaRt function in R to retrieve information)?

Specifically, I'm interested in using the biomaRt function to retrieve info on the crab-eating macaque, M fascicularis. But I've been unable to see a "preensembl_" attribute in the long list of attributes, and I'm unsure if it's either not a possibility or I'm simply looking in the wrong place.

This is listed as a pre-ensembl (Mfac5.0 pre-ensmbl) and info can be downloaded, etc, but I'm trying to avoid the extra processing of these files.

Another point regarding the 'biomaRt' function in the R package biomaRt; the code for accessing a particular mart ensemble is show in the snippet below; note that the ensemble specified in this case is for 'hsapiens_gene_ensembl'.

mart <- useMart("ensembl")
datasets <- listDatasets(mart)
mart <- useDataset("hsapiens_gene_ensembl",mart)

To the best of my knowledge, pre-ensembl datasets are not available; I'm wondering if they are available, but I'm using the wrong nomenclature (e.g., 'mfascicularis_gene_ensembl' should be 'mfac_gene_preensembl')?

As an alternative, I may just download the gff3 file from NCBI (Mfac5.0 gff) and use the makeTxDbFromGFF function in GenomicFeatures to make the TxDb object:

txdb <- makeTxDbFromGFF(file,

This works nicely, but unfortunately deviates from all of my other code for several other genomes I'm looking at.

> genes(txdb)
GRanges object with 32733 ranges and 1 metadata column:
             seqnames                 ranges strand |     gene_id
                <Rle>              <IRanges>  <Rle> | <character>
     A1BG NC_022290.1 [ 58937416,  58951837]      - |        A1BG
     A1CF NC_022280.1 [ 85376919,  85454067]      + |        A1CF
    A2ML1 NC_022282.1 [  9098456,   9153857]      + |       A2ML1
  A3GALT2 NC_022272.1 [194630652, 194644767]      + |     A3GALT2
   A4GALT NC_022281.1 [  8344430,   8348390]      + |      A4GALT
      ...         ...                    ...    ... .         ...
   ZYG11A NC_022272.1 [174567376, 174637485]      - |      ZYG11A
   ZYG11B NC_022272.1 [174660062, 174713613]      - |      ZYG11B
      ZYX NC_022274.1 [176564635, 176574909]      + |         ZYX
    ZZEF1 NC_022287.1 [  3946682,   4101428]      - |       ZZEF1
     ZZZ3 NC_022272.1 [149532207, 149664372]      + |        ZZZ3
  seqinfo: 655 sequences from an unspecified genome; no seqlengths
ADD COMMENTlink modified 3.3 years ago by Emily_Ensembl21k • written 3.3 years ago by longoka40

I took a look at all available datasets under ensembl using listDatasets(useMart("ensembl")), and found Macaca mulatta, but this is not the exact species you need.

I have never heard of pre-ensembl datasets being made available through biomaRt. You may have to go down the manual route and build your own function in R that accepts M. fascicularis gene IDs and returns what ever you want to to return. I presume that you downloaded the GFT here: ?

ADD REPLYlink written 3.3 years ago by Kevin Blighe69k
gravatar for Emily_Ensembl
3.3 years ago by
Emily_Ensembl21k wrote:

We don't have BioMart for genomes in pre. These are genomes that have not yet been fully processed and do not have the full file structure, which means we don't have the BioMart tables. Crab eating macaque is due to appear in the next Ensembl release, due in December.

ADD COMMENTlink written 3.3 years ago by Emily_Ensembl21k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1134 users visited in the last hour