Question: R org.Hs.eg.db matching ensembl gene ids with gene symbol
1
gravatar for User6891
3.2 years ago by
User6891250
Europe
User6891250 wrote:

Hi,

I want to add a column with the gene symbol corresponding to the Ensembl Gene ID to a dataframe in R

resOrdered$symbol <- mapIds(org.Hs.eg.db,
                     keys=row.names(resOrdered),
                     column="SYMBOL",
                     keytype="ENSEMBL",
                     multiVals="first")

I'm using org.Hs.eg.db from BioConductor for this.

I get the following error:

Error in .testForValidKeys(x, keys, keytype, fks) : 
  None of the keys entered are valid keys for 'ENSEMBL'. Please use the keys method to see a listing of valid arguments.

I think this is because my row.names from my dataframe resOrdered look like this:

[9997] "ENSG00000100601.5"  "ENSG00000178826.6"  "ENSG00000243663.1"  "ENSG00000138231.8"

I think the problem is that there are ., that signify the version, after the actual ENGS. Is there a way to still find a match with the ENSEMBL key from org.Hs.eg.db?

bioconductor R • 6.8k views
ADD COMMENTlink modified 7 months ago by RamRS21k • written 3.2 years ago by User6891250
3

Otherwise, you can always remove the string after the period.

tmp=gsub("\\..*","",row.names(resOrdered)​)
ADD REPLYlink modified 7 months ago by RamRS21k • written 3.2 years ago by Sukhdeep Singh9.7k

hello Sukhdeep,

I have exactly the same question as User6891 and after i try to remove the decimal i get an error.

Error: unexpected input in "tmp=gsub("\\..*","",row.names(res)�"

Could you please help me with this?

ADD REPLYlink modified 7 months ago by RamRS21k • written 3.0 years ago by saamar.rajput10
1

Command should work, I see you have some unidentified symbol in the command you pasted.

Try to write it and see if it works!

ADD REPLYlink modified 7 months ago by RamRS21k • written 3.0 years ago by Sukhdeep Singh9.7k
tmp=gsub("\\..*","",row.names(res)​)

this is my command ...and it shows a question mark in the error.

Error: unexpected input in "tmp=gsub("\\..*","",row.names(res)�"
ADD REPLYlink modified 7 months ago by RamRS21k • written 3.0 years ago by saamar.rajput10

As I said, the above command should work, unless you have a copy-paste error, or the object res has some issue. Check row.names(res), what does it outputs!

ADD REPLYlink written 3.0 years ago by Sukhdeep Singh9.7k

Its working thanks alot :) and thanks for your patience.

But 1 more question how do i put the edited ENSEMBL id from tmp back to my res column?

I know it is a very basic question but I am new to R.

ADD REPLYlink modified 7 months ago by RamRS21k • written 3.0 years ago by saamar.rajput10

Thanks alot Sukhdeep ...it all worked fine :)

ADD REPLYlink written 3.0 years ago by saamar.rajput10

Great, good luck then!

ADD REPLYlink written 3.0 years ago by Sukhdeep Singh9.7k

How did you eventually add tmp back to the res row.names? The answer is not in this thread and I can't figure it out.

Also, is it possible to edit the gene ids in-place instead of creating 'tmp'?

ADD REPLYlink modified 19 months ago • written 19 months ago by Mthabisi Moyo0

can you explain what does it "\\..*","",

ADD REPLYlink written 24 months ago by krushnach80480

remove the string after the period i.e. delete (technically substitute) everything that follows. See this.

ADD REPLYlink modified 24 months ago • written 24 months ago by genomax65k
4
gravatar for Kevin Blighe
8 months ago by
Kevin Blighe41k
Guy's Hospital, London
Kevin Blighe41k wrote:

Just use biomaRt - it is a lot easier:

require("biomaRt")
mart <- useMart("ENSEMBL_MART_ENSEMBL")
mart <- useDataset("hsapiens_gene_ensembl", mart)

ens <- c("ENSG00000100601.5", "ENSG00000178826.6", "ENSG00000243663.1", "ENSG00000138231.8")
ensLookup <- gsub("\\.[0-9]*$", "", ens)
ensLookup
[1] "ENSG00000100601" "ENSG00000178826" "ENSG00000243663" "ENSG00000138231"

annotLookup <- getBM(
  mart=mart,
  attributes=c("ensembl_transcript_id", "ensembl_gene_id", "gene_biotype", "external_gene_name"),
  filter="ensembl_gene_id",
  values=ensLookup,
  uniqueRows=TRUE)

annotLookup <- data.frame(
  ens[match(annotLookup$ensembl_gene_id, ensLookup)],
  annotLookup)

colnames(annotLookup) <- c(
  "original_id",
  c("ensembl_transcript_id", "ensembl_gene_id", "gene_biotype", "external_gene_name"))

annotLookup

         original_id ensembl_transcript_id ensembl_gene_id         gene_biotype
1  ENSG00000100601.5       ENST00000216489 ENSG00000100601       protein_coding
2  ENSG00000100601.5       ENST00000557057 ENSG00000100601       protein_coding
3  ENSG00000100601.5       ENST00000555100 ENSG00000100601       protein_coding
4  ENSG00000100601.5       ENST00000554097 ENSG00000100601       protein_coding
5  ENSG00000138231.8       ENST00000260803 ENSG00000138231       protein_coding
6  ENSG00000138231.8       ENST00000460271 ENSG00000138231       protein_coding
7  ENSG00000138231.8       ENST00000477557 ENSG00000138231       protein_coding
8  ENSG00000138231.8       ENST00000463982 ENSG00000138231       protein_coding
9  ENSG00000178826.6       ENST00000409102 ENSG00000178826       protein_coding
10 ENSG00000178826.6       ENST00000487419 ENSG00000178826       protein_coding
11 ENSG00000178826.6       ENST00000359333 ENSG00000178826       protein_coding
12 ENSG00000178826.6       ENST00000480421 ENSG00000178826       protein_coding
13 ENSG00000178826.6       ENST00000409244 ENSG00000178826       protein_coding
14 ENSG00000178826.6       ENST00000409541 ENSG00000178826       protein_coding
15 ENSG00000178826.6       ENST00000410004 ENSG00000178826       protein_coding
16 ENSG00000178826.6       ENST00000482420 ENSG00000178826       protein_coding
17 ENSG00000178826.6       ENST00000471161 ENSG00000178826       protein_coding
18 ENSG00000243663.1       ENST00000493072 ENSG00000243663 processed_pseudogene
   external_gene_name
1              ALKBH1
2              ALKBH1
3              ALKBH1
4              ALKBH1
5                DBR1
6                DBR1
7                DBR1
8                DBR1
9             TMEM139
10            TMEM139
11            TMEM139
12            TMEM139
13            TMEM139
14            TMEM139
15            TMEM139
16            TMEM139
17            TMEM139
18           RPS4XP14

...or without ensembl_transcript_id:

annotLookup <- getBM(
  mart=mart,
  attributes=c("ensembl_gene_id", "gene_biotype", "external_gene_name"),
  filter="ensembl_gene_id",
  values=ensLookup,
  uniqueRows=TRUE)

annotLookup <- data.frame(
  ens[match(annotLookup$ensembl_gene_id, ensLookup)],
  annotLookup)

colnames(annotLookup) <- c(
  "original_id",
  c("ensembl_gene_id", "gene_biotype", "external_gene_name"))

annotLookup
    original_id ensembl_gene_id         gene_biotype external_gene_name
1 ENSG00000100601.5 ENSG00000100601       protein_coding             ALKBH1
2 ENSG00000138231.8 ENSG00000138231       protein_coding               DBR1
3 ENSG00000178826.6 ENSG00000178826       protein_coding            TMEM139
4 ENSG00000243663.1 ENSG00000243663 processed_pseudogene           RPS4XP14
ADD COMMENTlink modified 24 days ago • written 8 months ago by Kevin Blighe41k

Hi,

Thanks for your comment. I need your guide. I have "original_id" column and alse "gene_name"(e.g. ENSG00000100601.5 and ALKBH1) and I need their "Entrez ID". could you please guide me how do I get "Entrez ID" by biomaRt or other package from "original_id" ? I appreciate if you share your comment with me. Best Regards

ADD REPLYlink written 6 months ago by modarzi70
1

Take a look at this example, which will obtain Entrez IDs for you:

require("biomaRt")
mart <- useMart("ENSEMBL_MART_ENSEMBL")
mart <- useDataset("hsapiens_gene_ensembl", mart)

ens <- c("ENSG00000100601.5", "ENSG00000178826.6", "ENSG00000243663.1", "ENSG00000138231.8")
ensLookup <- gsub("\\.[0-9]*$", "", ens)


annotLookup <- getBM(
  mart=mart,
  attributes=c("ensembl_transcript_id", "ensembl_gene_id", "gene_biotype", "external_gene_name", "entrezgene"),
  filter="ensembl_gene_id",
  values=ensLookup,
  uniqueRows=TRUE)

annotLookup <- data.frame(
  ens[match(annotLookup$ensembl_gene_id, ensLookup)],
  annotLookup)

colnames(annotLookup) <- c(
  "original_id",
  c("ensembl_transcript_id", "ensembl_gene_id", "gene_biotype", "external_gene_name", "EntrezID"))

annotLookup
ADD REPLYlink modified 24 days ago • written 6 months ago by Kevin Blighe41k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1239 users visited in the last hour