Entering edit mode
20 months ago
imaparna27 • 0
I am trying to assign bio-types to genes that I obtained after differential expression analysis. I used biomart for this purpose, but it retrieves only 54000 genes, whereas my expression data contains 57000 genes.
library(biomaRt) ensembl = useEnsembl("ensembl",dataset="hsapiens_gene_ensembl", mirror = "useast") genes <- df_RNA$gene_id G_list <- getBM(filters= "ensembl_gene_id", attributes= c("ensembl_gene_id", "entrezgene_id", "hgnc_symbol", "gene_biotype"), values=substr(genes, 1, 15), mart=ensembl)
What can be the possible error here, also what alternative to biomart can be used?
Did you use the same Ensembl version for both BioMart and to analyse differential expression?
Yes, GRCh38.p13 for expression analysis as well as for bio-type assignment.
The gene assembly has been 38.p13 since September 2019 but the gene annotation has been updated several times since then. What was the Ensembl version?
Version 17 was from about 2002
Sorry about the previous details, I re-confirmed, I obtained FPKM data from TCGA database and they've used GENCODEv22 for alignment that I guess corresponds to GRCh38.p2. I used same versions for all annotations. But, I am not sure why still some of the genes being are yet unannotated in my results, however information for few them is present on the Ensembl website.
You'll need to use Ensembl 80 to get GENCODE 22. Just include
Thanks Emily_Ensembl, this really helped.
Can you give some examples of the Gene IDs that are not being annotated?