Question: Different filters for wheat in biomaRt package and website
gravatar for sandeep.amberkar18
13 months ago by
sandeep.amberkar1840 wrote:


I'm looking to download the Wheat TILLING & SNP data from Ensembl from the biomaRt package in Bioconductor. However, I've noticed that the filters available in biomart website are more than what you have in the Bioconductor package. For e.g.

plantsDatabase <- useMart(biomart = 'plants_variations', host = '')
plantsDatasets <- listDatasets(plantsDatabase)    
mainOrgIndex <- GetDatasetIndex(organism = mainOrganism,plantsDatasets$description)
mainOrgDataset <- useDataset(mart = plantsDatabase,dataset = plantsDatasets$dataset[[mainOrgIndex]])
mainOrgFilters <- listFilters(mainOrgDataset)
mainOrgAttributes <- listAttributes(mainOrgDataset)
values=list('EMS-induced mutation','EMS (Cadenza)','1A')

The GetDatasetIndex is a nifty function to fetch the index of the organism for which you are querying biomart, in this case 'Triticum aestivum'.

I wanted to filter the data by 'Variant consequence', filter available on biomart web portal but not in the biomaRt package (listFilters for this mart doesn't have this filter). Any pointers?

Best, Sandeep

getbm biomart ensembl • 421 views
ADD COMMENTlink modified 12 months ago by Mike Smith1.4k • written 13 months ago by sandeep.amberkar1840

Tagging: Mike Smith

ADD REPLYlink written 13 months ago by genomax78k
gravatar for Emily_Ensembl
13 months ago by
Emily_Ensembl20k wrote:

The filter is "so_mini_parent_name". No, it's not obvious.

A cheat you can use: from the web-based BioMart results page, click on the XML button. The coded versions of the filter names will appear in the XML, eg:

<Filter name = "so_mini_parent_name" value = "feature_ablation"/>
ADD COMMENTlink modified 13 months ago • written 13 months ago by Emily_Ensembl20k

Perfect example of why you are vital to biostars. No chance anyone would have figured that one out.

Is functionality available via web BioMart completely equivalent to biomaRt?

ADD REPLYlink written 13 months ago by genomax78k

Yes, it should be the same.

ADD REPLYlink written 12 months ago by Emily_Ensembl20k

Thanks a lot Emily, I'll try it out!

ADD REPLYlink written 12 months ago by sandeep.amberkar1840
gravatar for Mike Smith
12 months ago by
Mike Smith1.4k
EMBL Heidelberg / de.NBI
Mike Smith1.4k wrote:

Emily's answer is exactly how I go about diagnosing problems with the biomaRt package. Checking the XML via the web interface is always my first port of call for something like this.

I thought I'd advertise the recently added the searchDatasets(), searchFilters() and searchAttributes() functions that try and make finding these a little easier. Rather than simply listing all the available properties for a mart, you can provide a search term and it will find relevant results. For example, to find the name of the dataset you want you could do something like:

> searchDatasets(mart = plantsDatabase, 'aestivum')
            dataset                                                                           description version
12 taestivum_eg_snp Triticum aestivum Short Variants (SNPs and indels excluding flagged variants) (IWGSC)   IWGSC

However they're useless in this instance, since none of the information behind the scenes regarding this filter mentions 'Variant' or 'consequence' so you wouldn't know what to search for!

It's also worth pointing out that the filter you're using isn't a free text filter, but takes a specific set of values (they're provided in a list when using the web interface). You can see the list of possible search terms in R using the function filterOptions() e.g.

filterOptions('so_mini_parent_name', mart = mainOrgDataset)
[1] "[3_prime_UTR_variant,5_prime_UTR_variant,coding_sequence_variant,coding_transcript_variant,downstream_gene_variant,exon_variant,feature_ablation,feature_amplification,feature_elongation,feature_truncation,feature_variant,frameshift_variant,gene_variant,incomplete_terminal_codon_variant,inframe_deletion,inframe_indel,inframe_insertion,inframe_variant,intergenic_variant,internal_feature_elongation,intron_variant,mature_miRNA_variant,missense_variant,NMD_transcript_variant,nonsynonymous_variant,non_coding_transcript_exon_variant,non_coding_transcript_variant,protein_altering_variant,sequence_comparison,sequence_variant,splice_acceptor_variant,splice_donor_variant,splice_region_variant,splice_site_variant,splicing_variant,start_lost,stop_gained,stop_lost,stop_retained_variant,structural_variant,synonymous_variant,terminator_codon_variant,transcript_ablation,transcript_amplification,transcript_variant,upstream_gene_variant,UTR_variant]"

I might need to improve the formatting here!

ADD COMMENTlink written 12 months ago by Mike Smith1.4k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1125 users visited in the last hour