Question: Different filters for wheat in biomaRt package and website
1
gravatar for sandeep.amberkar18
3 months ago by
sandeep.amberkar1810 wrote:

Hello,

I'm looking to download the Wheat TILLING & SNP data from Ensembl from the biomaRt package in Bioconductor. However, I've noticed that the filters available in biomart website are more than what you have in the Bioconductor package. For e.g.

plantsDatabase <- useMart(biomart = 'plants_variations', host = 'plants.ensembl.org')
plantsDatasets <- listDatasets(plantsDatabase)    
mainOrgIndex <- GetDatasetIndex(organism = mainOrganism,plantsDatasets$description)
mainOrgDataset <- useDataset(mart = plantsDatabase,dataset = plantsDatasets$dataset[[mainOrgIndex]])
mainOrgFilters <- listFilters(mainOrgDataset)
mainOrgAttributes <- listAttributes(mainOrgDataset)
attributes=mainOrgAttributes$name[c(1:2,4:6,20:21,24,34:35)]
filter=c('variation_source','variation_set_name','chr_name')
values=list('EMS-induced mutation','EMS (Cadenza)','1A')

The GetDatasetIndex is a nifty function to fetch the index of the organism for which you are querying biomart, in this case 'Triticum aestivum'.

I wanted to filter the data by 'Variant consequence', filter available on biomart web portal but not in the biomaRt package (listFilters for this mart doesn't have this filter). Any pointers?

Best, Sandeep

getbm biomart ensembl • 198 views
ADD COMMENTlink modified 3 months ago by Mike Smith1.2k • written 3 months ago by sandeep.amberkar1810

Tagging: Mike Smith

ADD REPLYlink written 3 months ago by genomax65k
3
gravatar for Emily_Ensembl
3 months ago by
Emily_Ensembl18k
EMBL-EBI
Emily_Ensembl18k wrote:

The filter is "so_mini_parent_name". No, it's not obvious.

A cheat you can use: from the web-based BioMart results page, click on the XML button. The coded versions of the filter names will appear in the XML, eg:

<Filter name = "so_mini_parent_name" value = "feature_ablation"/>
ADD COMMENTlink modified 3 months ago • written 3 months ago by Emily_Ensembl18k

Perfect example of why you are vital to biostars. No chance anyone would have figured that one out.

Is functionality available via web BioMart completely equivalent to biomaRt?

ADD REPLYlink written 3 months ago by genomax65k
1

Yes, it should be the same.

ADD REPLYlink written 3 months ago by Emily_Ensembl18k

Thanks a lot Emily, I'll try it out!

ADD REPLYlink written 11 weeks ago by sandeep.amberkar1810
3
gravatar for Mike Smith
3 months ago by
Mike Smith1.2k
EMBL Heidelberg / de.NBI
Mike Smith1.2k wrote:

Emily's answer is exactly how I go about diagnosing problems with the biomaRt package. Checking the XML via the web interface is always my first port of call for something like this.

I thought I'd advertise the recently added the searchDatasets(), searchFilters() and searchAttributes() functions that try and make finding these a little easier. Rather than simply listing all the available properties for a mart, you can provide a search term and it will find relevant results. For example, to find the name of the dataset you want you could do something like:

> searchDatasets(mart = plantsDatabase, 'aestivum')
            dataset                                                                           description version
12 taestivum_eg_snp Triticum aestivum Short Variants (SNPs and indels excluding flagged variants) (IWGSC)   IWGSC

However they're useless in this instance, since none of the information behind the scenes regarding this filter mentions 'Variant' or 'consequence' so you wouldn't know what to search for!


It's also worth pointing out that the filter you're using isn't a free text filter, but takes a specific set of values (they're provided in a list when using the web interface). You can see the list of possible search terms in R using the function filterOptions() e.g.

filterOptions('so_mini_parent_name', mart = mainOrgDataset)
[1] "[3_prime_UTR_variant,5_prime_UTR_variant,coding_sequence_variant,coding_transcript_variant,downstream_gene_variant,exon_variant,feature_ablation,feature_amplification,feature_elongation,feature_truncation,feature_variant,frameshift_variant,gene_variant,incomplete_terminal_codon_variant,inframe_deletion,inframe_indel,inframe_insertion,inframe_variant,intergenic_variant,internal_feature_elongation,intron_variant,mature_miRNA_variant,missense_variant,NMD_transcript_variant,nonsynonymous_variant,non_coding_transcript_exon_variant,non_coding_transcript_variant,protein_altering_variant,sequence_comparison,sequence_variant,splice_acceptor_variant,splice_donor_variant,splice_region_variant,splice_site_variant,splicing_variant,start_lost,stop_gained,stop_lost,stop_retained_variant,structural_variant,synonymous_variant,terminator_codon_variant,transcript_ablation,transcript_amplification,transcript_variant,upstream_gene_variant,UTR_variant]"

I might need to improve the formatting here!

ADD COMMENTlink written 3 months ago by Mike Smith1.2k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1007 users visited in the last hour