Is anyone aware whether pre-ensembl information is available through a biomart query (I often use the biomaRt::biomaRt function in R to retrieve information)?
Specifically, I'm interested in using the biomaRt function to retrieve info on the crab-eating macaque, M fascicularis. But I've been unable to see a "preensembl_" attribute in the long list of attributes, and I'm unsure if it's either not a possibility or I'm simply looking in the wrong place.
This is listed as a pre-ensembl (Mfac5.0 pre-ensmbl) and info can be downloaded, etc, but I'm trying to avoid the extra processing of these files.
Another point regarding the 'biomaRt' function in the R package biomaRt; the code for accessing a particular mart ensemble is show in the snippet below; note that the ensemble specified in this case is for 'hsapiens_gene_ensembl'.
mart <- useMart("ensembl")
datasets <- listDatasets(mart)
mart <- useDataset("hsapiens_gene_ensembl",mart)
To the best of my knowledge, pre-ensembl datasets are not available; I'm wondering if they are available, but I'm using the wrong nomenclature (e.g., 'mfascicularis_gene_ensembl' should be 'mfac_gene_preensembl')?
As an alternative, I may just download the gff3 file from NCBI (Mfac5.0 gff) and use the makeTxDbFromGFF function in GenomicFeatures to make the TxDb object:
txdb <- makeTxDbFromGFF(file,
format=c("gff3"),
dataSource=NA,
organism=NA,
taxonomyId=NA,
circ_seqs=DEFAULT_CIRC_SEQS,
chrominfo=NULL,
miRBaseBuild=NA,
metadata=NULL,
dbxrefTag)
This works nicely, but unfortunately deviates from all of my other code for several other genomes I'm looking at.
> genes(txdb)
GRanges object with 32733 ranges and 1 metadata column:
seqnames ranges strand | gene_id
<Rle> <IRanges> <Rle> | <character>
A1BG NC_022290.1 [ 58937416, 58951837] - | A1BG
A1CF NC_022280.1 [ 85376919, 85454067] + | A1CF
A2ML1 NC_022282.1 [ 9098456, 9153857] + | A2ML1
A3GALT2 NC_022272.1 [194630652, 194644767] + | A3GALT2
A4GALT NC_022281.1 [ 8344430, 8348390] + | A4GALT
... ... ... ... . ...
ZYG11A NC_022272.1 [174567376, 174637485] - | ZYG11A
ZYG11B NC_022272.1 [174660062, 174713613] - | ZYG11B
ZYX NC_022274.1 [176564635, 176574909] + | ZYX
ZZEF1 NC_022287.1 [ 3946682, 4101428] - | ZZEF1
ZZZ3 NC_022272.1 [149532207, 149664372] + | ZZZ3
-------
seqinfo: 655 sequences from an unspecified genome; no seqlengths
I took a look at all available datasets under ensembl using
listDatasets(useMart("ensembl"))
, and found Macaca mulatta, but this is not the exact species you need.I have never heard of pre-ensembl datasets being made available through biomaRt. You may have to go down the manual route and build your own function in R that accepts M. fascicularis gene IDs and returns what ever you want to to return. I presume that you downloaded the GFT here: ftp://ftp.ensembl.org/pub/pre/gtf/macaca_fascicularis/ ?