Question

Error in getBM: accessing Ensembl annotation with biomaRt

0

Entering edit mode

3.2 years ago

celia.escher ▴ 20

Hello!

I am trying to get the gene names and additional features after DESeq2 of human RNA-seq data where I contrast 2 diseases with healthy controls. However, I am stuck with getBM (I am following a tutorial from some years ago and do not know if it is not too updated either...). This is my code:

dds <- DESeq(dds)
res <- results(dds)
res <- results(dds, contrast = c("disease", "LC", "Hc"))

res$ensembl <- sapply(strsplit(rownames(res), split="\\+" ), "[", 1 )
ensembl <- useMart(biomart = "ensembl", dataset = "hsapiens_gene_ensembl")
genemap <- getBM( attributes = c("ensembl_gene_id", "entrezgene_id", "hgnc_symbol", "chromosome_name"),
                  #filters = "ensembl_gene_id",   # with filters it does not work
                  values = res$ensembl,
                  mart = ensembl)
idx <- match(res$ensembl, genemap$ensembl_gene_id)
res$entrez <- genemap$entrezgene_id[idx]
res$gene_name <- genemap$hgnc_symbol[idx]
res$chr <- genemap$chromosome_name[idx]

write.csv( as.data.frame(res), file="results.csv" )

The tutorial recommends this part:

First, we split up the rownames of the results object, which contain ENSEMBL gene ids, separated by the plus sign, +. The following code then takes the first id for each gene by invoking the open square bracket function "[" and the argument, 1.

res$ensembl <- sapply( strsplit( rownames(res), split="\+" ), "[", 1 )

But I see that the ENSEMBL names are ENSG00000281764.1, ENSG00000281299.1, and so on...??? I have also tried to change that part for res$ensembl <- rownames(res) but no improvement...

Thank you so much for your comments!!!

RNA-Seq genome R gene Assembly • 712 views

ADD COMMENT • link 3.2 years ago by celia.escher ▴ 20

score 1 · Answer 1 · 2021-03-01

1

Entering edit mode

3.2 years ago

loughrae ▴ 90

The .1 etc are version numbers for the Ensembl IDs. Assuming your problem is that the code can’t match plain Ensembl IDs from res with versioned ones, you can remove the versions from genemap$ensembl_gene_id using gsub() and then they should match. You can find a regex to do that here:

ADD COMMENT • link 3.2 years ago by loughrae ▴ 90

score 1 · Answer 2 · 2021-03-01

1

Entering edit mode

3.2 years ago

celia.escher ▴ 20

  res$ensembl <- gsub("\\..*","", res$ensembl)

Perfectly worked!!! Many thanks! (instead of sapply)

ADD COMMENT • link 3.2 years ago by celia.escher ▴ 20