Question: How Do I Use Biomart'S Getbm On A List Of Known And Unknown Agilent Probes To Convert It To A List Of Gene Ids With Spacing For Missing Values?
4
gravatar for goldexperience
6.4 years ago by
goldexperience50 wrote:

I'm trying to use biomaRt to convert a list of more than 90k probe IDs to the gene symbols, but am having problems. Using the getBM function, I can see that only 22k of those have corresponding gene symbols, but the output is a vector of length 22k, and I am unable to see the correspondence to the initial probe ID list. Additionally, I think that some of these probe IDs don't correspond to agilent probes known by biomart (using other attributes such as "chromosome_name" gives me nothing for some of the probe IDs.Using getBMlist, I can get an output with na values specified for those probes that don't match, but the function gives a warning message that getBMlist isn't for large lists, and the entire process takes too long. How do I get an output of 90k gene symbols and na values?

The probes are mostly off the Agilent-014850 Whole Human Genome Microarray 4x44K G4112F. An example name is "A23P100001". An example probe ID that doesn't give me any attribute in biomaRt is "A23P116864".

The query I'm using is as follows:

affyids = read.csv([data goes here]);
mart<- useDataset("hsapiens_gene_ensembl", useMart("ensembl"));
getBM(uniqueRows = FALSE, filters="efgagilentwholegenome4x44kv1", attributes=c("chromosome_name","start_position","external_gene_id"), values= affyids, mart=mart);

where affyids is of type "list."

R bioconductor biomart • 8.9k views
ADD COMMENTlink modified 4.3 years ago by burghard.christina60 • written 6.4 years ago by goldexperience50
2

1) Why don't you download the probeId-to-gene symbol mappings from the chip manufacturer?

2) If you want people to help you, you have to be more specific. What kind of probes are you talking about, what commands/data objects did you use to do the biomaRt query?

ADD REPLYlink modified 6.4 years ago • written 6.4 years ago by Irsan6.9k

Agreed. Maybe you can add your R code here.

ADD REPLYlink written 6.4 years ago by Obi Griffith18k

dear goldexperience please don't delete questions that have answers, you are taking away other people's ability to get informed

ADD REPLYlink written 6.4 years ago by Istvan Albert ♦♦ 81k

apologies, I just reposted the question with a different format and direction.

ADD REPLYlink written 6.4 years ago by goldexperience50
1

It would be much better if you edited your original question and make it more specific. Most likely, the answer can be adapted with a few edits as well.

ADD REPLYlink written 6.4 years ago by Michael Dondrup46k

ok. I see that is acceptable

ADD REPLYlink written 6.4 years ago by Istvan Albert ♦♦ 81k
6
gravatar for burghard.christina
4.3 years ago by
United States
burghard.christina60 wrote:

This is an old question, but I ran into this problem recently.  If you have a list ids where some values are not recognized, getBM returns a list smaller than the query list.  I wanted a list that included ALL of the original ids in the query order, not just the ones that mapped.  

This means we want a left join between the original ids, and the query results.   The merge() function can do this for us. 

refSeqIds = as.matrix(c("NR_000001,"NR_000002" ... ))
​colnames(refSeqIds) = "refseq_mrna"  #name of column in results to join on

mart <- useDataset("mmusculus_gene_ensembl",useMart("ensembl"))
results <- getBM(filters="refseq_mrna",attributes=c("refseq_mrna","external_gene_name"), values=refSeqIds, mart=mart)
idmap = merge(x = refSeqIds, y = results, by="refseq_mrna",all.x=TRUE)

 

Output:

     refseq_mrna         external_gene_name
1    NR_000001           GeneA
2    NR_000002           <NA>
3    NR_000003           GeneB
4    NR_000004           GeneC

 

 

ADD COMMENTlink written 4.3 years ago by burghard.christina60

This is the correct answer, thank you!

Here is a generic version:

getAllBM <- function(attributes, filters = '', values = '', mart, curl = NULL, checkFilters = TRUE, verbose = FALSE, uniqueRows = FALSE, bmHeader = FALSE) {
    spotty <- getBM(attributes, filters, values, mart, curl, checkFilters, verbose, uniqueRows, bmHeader)
    x <- as.data.frame(values)
    colnames(x) <- filters
    structure(merge(x = x, y = spotty, by = filters, all.x = TRUE), row.names = values)
}
ADD REPLYlink modified 4.0 years ago • written 4.0 years ago by flying-sheep0
4
gravatar for Irsan
6.4 years ago by
Irsan6.9k
Amsterdam
Irsan6.9k wrote:

In R:

library(biomaRt)
mart<-useMart(biomart="ensembl",dataset="hsapiens_gene_ensembl")
getBM(mart=mart,filters="efg_agilent_wholegenome_4x44k_v1",attributes=c("efg_agilent_wholegenome_4x44k_v1","external_gene_id","ensembl_gene_id","description"),values="A_32_P196615")

gives you:

efg_agilent_wholegenome_4x44k_v1 external_gene_id ensembl_gene_id description
1                     A_32_P196615     RP11-449J1.1 ENSG00000225334          NA

This retrieves gene information about one probe-id. Change values="A32P196615" for values=vectorWithIds to do it for multiple probes. The values argument does not have to be a list, only if you use multiple filters

ADD COMMENTlink modified 6.4 years ago • written 6.4 years ago by Irsan6.9k

Thank you for your answer. I'm fairly certain that that's identical to the command I posted in my last comment; unfortunately, it doesn't provide some indication for missing probes. If I give it a list of 100 probes, 80 of which are identified, it gives me a list of 80 gene symbols without indication of the correspondence with the original 100 element list.

ADD REPLYlink written 6.4 years ago by goldexperience50

That is because your command asked for the gene symbol only, my command asked for gene symbol, description, id and the corresponding probeid. In your example (100 probes, 80 can be mapped) there are some probes that cannot be mapped to genes so biomart/ensemble will not find them either. But are you interested in the genes or in the genomic positions of the probes?

ADD REPLYlink written 6.4 years ago by Irsan6.9k

I'm still having issues and have updated the question accordingly. The problem seems to be that some of the IDs aren't recognized by biomaRt at all.

ADD REPLYlink written 6.4 years ago by goldexperience50
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 539 users visited in the last hour