I have a vector (in R) of probes from an Affymetrix microarray. I would like to find the Ensembl ID, the gene name (hgnc), the gene length and the GC-content using the library BiomaRt in R. In order to do it, I do:
# Finding Ensembl IDs data <- useMart(biomart="ensembl", dataset="hsapiens_gene_ensembl") ensemblids <- getBM(attributes=c("ensembl_gene_id"), filters=c("affy_hg_u133a"), values=probes, mart=data) # Finding gene name (hgnc), gene length and GC-content dframe <- getBM(attributes=c("hgnc_symbol", "percentage_gc_content"), filters=c("ensembl_gene_id"), values=ensemblids, mart=data)
However, as you see, I only obtain the gene name and the GC content because I do not find any attribute related in obtaining the gene length. Do you know how to solve this?
Another thing. In my vector I have 22.000 genes, but in
ensemblids there are 16.000 Ensembl IDs. Why is it?
Thanks in advance.