Is anyone aware whether pre-ensembl information is available through a biomart query (I often use the biomaRt::biomaRt function in R to retrieve information)?
Specifically, I'm interested in using the biomaRt function to retrieve info on the crab-eating macaque, M fascicularis. But I've been unable to see a "preensembl_" attribute in the long list of attributes, and I'm unsure if it's either not a possibility or I'm simply looking in the wrong place.
This is listed as a pre-ensembl (Mfac5.0 pre-ensmbl) and info can be downloaded, etc, but I'm trying to avoid the extra processing of these files.
Another point regarding the 'biomaRt' function in the R package biomaRt; the code for accessing a particular mart ensemble is show in the snippet below; note that the ensemble specified in this case is for 'hsapiens_gene_ensembl'.
mart <- useMart("ensembl") datasets <- listDatasets(mart) mart <- useDataset("hsapiens_gene_ensembl",mart)
To the best of my knowledge, pre-ensembl datasets are not available; I'm wondering if they are available, but I'm using the wrong nomenclature (e.g., 'mfascicularis_gene_ensembl' should be 'mfac_gene_preensembl')?
As an alternative, I may just download the gff3 file from NCBI (Mfac5.0 gff) and use the makeTxDbFromGFF function in GenomicFeatures to make the TxDb object:
txdb <- makeTxDbFromGFF(file, format=c("gff3"), dataSource=NA, organism=NA, taxonomyId=NA, circ_seqs=DEFAULT_CIRC_SEQS, chrominfo=NULL, miRBaseBuild=NA, metadata=NULL, dbxrefTag)
This works nicely, but unfortunately deviates from all of my other code for several other genomes I'm looking at.
> genes(txdb) GRanges object with 32733 ranges and 1 metadata column: seqnames ranges strand | gene_id <Rle> <IRanges> <Rle> | <character> A1BG NC_022290.1 [ 58937416, 58951837] - | A1BG A1CF NC_022280.1 [ 85376919, 85454067] + | A1CF A2ML1 NC_022282.1 [ 9098456, 9153857] + | A2ML1 A3GALT2 NC_022272.1 [194630652, 194644767] + | A3GALT2 A4GALT NC_022281.1 [ 8344430, 8348390] + | A4GALT ... ... ... ... . ... ZYG11A NC_022272.1 [174567376, 174637485] - | ZYG11A ZYG11B NC_022272.1 [174660062, 174713613] - | ZYG11B ZYX NC_022274.1 [176564635, 176574909] + | ZYX ZZEF1 NC_022287.1 [ 3946682, 4101428] - | ZZEF1 ZZZ3 NC_022272.1 [149532207, 149664372] + | ZZZ3 ------- seqinfo: 655 sequences from an unspecified genome; no seqlengths