Question: Mysterious genes in my Biomart results (genes that were not part of original query)
1
gravatar for adam.faranda
5 weeks ago by
adam.faranda10
adam.faranda10 wrote:

I'm using the R package 'biomaRt' to retrieve ensembl ID's and descriptions for a list of gene symbols (eg "Sf1", "Rhox7a" etc. . . ). My query consists of 41203 symbols; biomart returns a result set with 30774 records corresponding to the gene symbols recognized by ensembl. The 30774 records returned included four genes that were not part of the original query.

My first thought was that the four 'mystery' genes were synonyms for something in my original query. I've since verified that none of the synonyms of these genes are in my query.

I am querying the mouse data set, and using the attribute 'external_gene_name' as my filter column Code used to query biomaRt

# 'gq': list of unique 'GeneID' submitted as biomart query
   gq<-unique(dg$gene)

# attributes used for query
  attr<-c("ensembl_gene_id", "external_gene_name", "description",
            "ensembl_gene_id_version", "chromosome_name", 
            "gene_biotype"
    )

# Query Submission
  mart<-useMart(biomart="ensembl", dataset="mmusculus_gene_ensembl")
  result<-getBM(mart=mart, 
                attributes=attr, 
                filters='external_gene_name',
                values=gq
    )

The mystery genes are:

setdiff(result$external_gene_name, gq)
[1] "Trdd2"   "Trdv4"   "Trdd1"   "SPATA24"

Where "gq" is the list of genes submitted to ensembl. None of the above genes, nor any of their synonyms (synonyms recognized by ensembl at least) are in my original query. If anyone is willing to help me troubleshoot, I would be happy to send them the gene list I'm querying with.

package('biomart') biomart R • 131 views
ADD COMMENTlink modified 4 weeks ago • written 5 weeks ago by adam.faranda10

That's very strange. Could you please send the list to helpdesk [at] ensembl.org and my colleagues and I will take a look at it.

ADD REPLYlink written 5 weeks ago by Emily_Ensembl18k
1
gravatar for Mike Smith
4 weeks ago by
Mike Smith1.2k
EMBL Heidelberg / de.NBI
Mike Smith1.2k wrote:

"SPATA24" doesn't look like a normal MGI symbol since it's all in caps, so I wouldn't be suprised if your query contains "Spata24" and the all caps version is retrieved too. Is it possible your gene list include capitalised versions of the 'Trdd1' etc? I don't think BioMart is case senstive and will still retrieve results for them e.g.

getBM(mart=mart, 
      attributes=attr, 
      filters='external_gene_name',
      values="TRDV4"
)
     ensembl_gene_id external_gene_name
1 ENSMUSG00000076867              Trdv4

If that's not it, my advice would be to break your query down into smaller chunks and submit this independently, to try and narrow down where the unexpected entries are being introduced. Happy to try and identify if it's a problem in biomaRt, email address is on the biomaRt landing page (https://bioconductor.org/packages/biomaRt/)

ADD COMMENTlink modified 4 weeks ago • written 4 weeks ago by Mike Smith1.2k
0
gravatar for adam.faranda
4 weeks ago by
adam.faranda10
adam.faranda10 wrote:

Thank you both for your prompt responses. Mike's answer was correct -- this appears to have been an issue with capitalization.

"SPATA24" doesn't look like a normal MGI symbol since it's all in caps, so I wouldn't be suprised if your query contains "Spata24" and the all caps version is retrieved too. Is it possible your gene list include capitalised versions of the 'Trdd1' etc? I don't think BioMart is case senstive and will still retrieve results for them e.g.

ADD COMMENTlink written 4 weeks ago by adam.faranda10
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 504 users visited in the last hour