I'm trying to use biomaRt to convert a list of more than 90k probe IDs to the gene symbols, but am having problems. Using the getBM function, I can see that only 22k of those have corresponding gene symbols, but the output is a vector of length 22k, and I am unable to see the correspondence to the initial probe ID list. Additionally, I think that some of these probe IDs don't correspond to agilent probes known by biomart (using other attributes such as "chromosome_name" gives me nothing for some of the probe IDs.Using getBMlist, I can get an output with na values specified for those probes that don't match, but the function gives a warning message that getBMlist isn't for large lists, and the entire process takes too long. How do I get an output of 90k gene symbols and na values?
The probes are mostly off the Agilent-014850 Whole Human Genome Microarray 4x44K G4112F. An example name is "A23P100001". An example probe ID that doesn't give me any attribute in biomaRt is "A23P116864".
The query I'm using is as follows:
affyids = read.csv([data goes here]); mart<- useDataset("hsapiens_gene_ensembl", useMart("ensembl")); getBM(uniqueRows = FALSE, filters="efgagilentwholegenome4x44kv1", attributes=c("chromosome_name","start_position","external_gene_id"), values= affyids, mart=mart);
where affyids is of type "list."