biomaRt error "The number of columns in the result table does not equal the number of attributes in the query."
2
0
Entering edit mode
2.0 years ago
Katarina ▴ 10

Hi,

I am trying to fetch chromosome positions of SNPs (based on SNP ids) in hg19 assembly with biomaRt and it gives me an error which I can't figure out:

Error in .processResults(postRes, mart = mart, hostURLsep = sep, fullXmlQuery = fullXmlQuery, : 
The query to the BioMart webservice returned an invalid result.
The number of columns in the result table does not equal the number of attributes in the query.
Please report this on the support site at http://support.bioconductor.org

This is the query I am using on one SNP:

library(biomaRt)

ensembl <- useEnsembl(biomart = "snps", GRCh = 37, dataset = "hsapiens_snp",
                      host = "https://ensembl.org")

hg19_SNP <- getBM(mart = ensembl,
                  attributes = c("refsnp_id", "chr_name", "chrom_start", "chrom_end"),
                  filters = "snp_filter",
                  values = "rs12081925")

Thanks

R biomaRt • 2.0k views
ADD COMMENT
0
Entering edit mode

Tagging: Mike Smith

ADD REPLY
3
Entering edit mode
2.0 years ago

If you specify the verbose=TRUE attribute to the getBM() function, it will print the XML query that it constructs from your request to the R console:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE Query>
<Query virtualSchemaName="default" uniqueRows="1" count="0" datasetConfigVersion="0.6" header="1" formatter="TSV" requestid="biomaRt">
   <Dataset name="hsapiens_snp">
      <Attribute name="refsnp_id" />
      <Attribute name="chr_name" />
      <Attribute name="chrom_start" />
      <Attribute name="chrom_end" />
      <Filter name="snp_filter" value="rs12081925" />
   </Dataset>
</Query>

You can also submit this query directly, e.g, with wget:

wget -O result.txt 'http://www.ensembl.org/biomart/martservice?query=<?xml version='1.0' encoding='UTF-8'?><!DOCTYPE Query><Query  virtualSchemaName = 'default' uniqueRows = '1' count='0' datasetConfigVersion='0.6' header='1' formatter='TSV' requestid='biomaRt'> <Dataset name = 'hsapiens_snp'><Attribute name = 'refsnp_id'/><Attribute name = 'chr_name'/><Attribute name = 'chrom_start'/><Attribute name = 'chrom_end'/><Filter name = "snp_filter" value = "rs12081925" /></Dataset></Query>'

If you check the response returned from the API, it reads:

Query ERROR: caught BioMart::Exception: non-BioMart die(): XML declaration not well-formed at line 1, column 14, byte 14 at /nfs/public/ro/ensweb-software/sharedsw/2022_01_17_ct7/linuxbrew/Cellar/perl/5.34.0/lib/perl5/site_perl/5.34.0/x86_64-linux-thread-multi/XML/Parser.pm line 187. XML::Simple called at /nfs/public/ro/ensweb/live/mart/www_107/biomart-perl/lib/BioMart/Query.pm line 1935.

This response can evidently not be parsed into a table, and therefore the error message. It is a bit unfortunate that getBM() doesn't return the raw response as a message to the console, if it fails to parse the response into a table.

A working XML query e.g. looks like this:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE Query>
<Query  virtualSchemaName = "default" formatter = "TSV" header = "0" uniqueRows = "0" count = "" datasetConfigVersion = "0.6" >

    <Dataset name = "hsapiens_snp" interface = "default" >
        <Filter name = "snp_filter" value = "rs12081925"/>
        <Attribute name = "refsnp_id" />
        <Attribute name = "chr_name" />
        <Attribute name = "chrom_start" />
        <Attribute name = "chrom_end" />
    </Dataset>
</Query> 

If you send this, you get the proper response from the API:

wget -O result2.txt 'http://www.ensembl.org/biomart/martservice?query=<?xml version="1.0" encoding="UTF-8"?><!DOCTYPE Query><Query  virtualSchemaName = "default" formatter = "TSV" header = "0" uniqueRows = "0" count = "" datasetConfigVersion = "0.6" ><Dataset name = "hsapiens_snp" interface = "default" ><Filter name = "snp_filter" value = "rs12081925"/><Attribute name = "refsnp_id" /><Attribute name = "chr_name" /><Attribute name = "chrom_start" /><Attribute name = "chrom_end" /></Dataset></Query>'

Response in result2.txt:

rs12081925      1       214298841       214298841

So it is a bug within Biomart itself: Either the XML is not correctly generated or not correctly parsed. Until the bug is fixed, I would suggest retrieving the data with a manual XML query (can also be generated e.g. with the Biomart Web Interface on the Ensembl Website) and then read the .tsv file into R.

ADD COMMENT
0
Entering edit mode

Thanks for the answer, this works!
But, this response gives the coordinates from grch38 assembly. Which url shoud I use for grch37?

ADD REPLY
0
Entering edit mode

Sorry, overlooked that. Direct the requests to http://grch37.ensembl.org/biomart/martservice?query= instead to retrieve the coordinates for GRCh37.p13. The full website including the Biomart web interface / query constructor can be found via the Archives pages.

PS: Also mind that Biocondutor itself contains hundreds of annotation packages, so maybe SNPlocs.Hsapiens.dbSNP144.GRCh37 already contains what you need without retrieving the data from Ensembl.

ADD REPLY
0
Entering edit mode

I think your failing wget query is actually a result of using single quotes around both the query and the XML variables. This has the result of truncating the query and leads to the "XML declaration not well-formed at line 1, column 14," error. The first ' appears after 14 characters.

ADD REPLY
0
Entering edit mode
24 months ago
Mike Smith ★ 2.1k

The query will also work if you ommit the host argument to useEnsembl().

library(biomaRt)

ensembl <- useEnsembl(biomart = "snps", GRCh = 37, dataset = "hsapiens_snp")

hg19_SNP <- getBM(mart = ensembl,
                  attributes = c("refsnp_id", "chr_name", "chrom_start", "chrom_end"), 
                  filters = "snp_filter",
                  values = "rs12081925")

hg19_SNP
#>    refsnp_id chr_name chrom_start chrom_end
#> 1 rs12081925        1   214472184 214472184

In your example the host argument overrides the GRCh argument (because they're incomatible), and you're actually querying the latest GRCh38 data in Ensembl.


As for why that breaks, it looks like Matthias Zepper has done some nice digging into the actual XML being created by the biomaRt package. A brief eyeball sugests that interface = "default" is the only difference. Since the query clearly works with some older versions of Ensembl, I wonder if something has recently changed with the Ensembl BioMart server that has mandated this change.

I shall try to investigate and try to fix the biomaRt behaviour, but it won't be for at least a few weeks as I'm on parental leave.

ADD COMMENT
0
Entering edit mode

I did some more digging into this behaviour, and it was because the host argument was provided without a subdomain i.e. ensembl.org rather than www.ensembl.org or grch37.ensembl.org etc. This causes the Ensembl site to return a "page not found" type response rather than what's expected.

I've updated {biomaRt} to warn about this occurance and in the original example it will use the GRCh = 37 argument by default e.g.

ensembl <- useEnsembl(biomart = "snps", GRCh = 37, dataset = "hsapiens_snp",
                       host = "https://ensembl.org")
#> Warning message:
#> In useEnsembl(biomart = "snps", GRCh = 37, dataset = "hsapiens_snp",  :
#>   You cannot use the host 'ensembl.org'.
#> Please provide a subdomain e.g. www.ensembl.org or use one of the 'mirror', 'version', 'GRCh' arguments
martHost(ensembl)
#> [1] "https://grch37.ensembl.org:443/biomart/martservice"
ADD REPLY

Login before adding your answer.

Traffic: 1511 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6