Question: R org.Hs.eg.db matching ensembl gene ids with gene symbol
0
gravatar for User6891
2.2 years ago by
User6891200
Europe
User6891200 wrote:

Hi,

I want to add a column with the gene symbol corresponding to the Ensembl Gene ID to a dataframe in R

resOrdered$symbol <- mapIds(org.Hs.eg.db,
                     keys=row.names(resOrdered),
                     column="SYMBOL",
                     keytype="ENSEMBL",
                     multiVals="first")

I'm using 'org.Hs.eg.db' from BioConductor for this. 

I get the following error:

Error in .testForValidKeys(x, keys, keytype, fks) : 
  None of the keys entered are valid keys for 'ENSEMBL'. Please use the keys method to see a listing of valid arguments.

I think this is because my row.names from my dataframe resOrdered look like this:

[9997] "ENSG00000100601.5"  "ENSG00000178826.6"  "ENSG00000243663.1"  "ENSG00000138231.8" 

I think the problem is that there are '.', that signify the version, after the actual ENGS. Is there a way to still find a match with the 'ENSEMBL' key from 'org.Hs.eg.db'?

 

 

 

 

bioconductor R • 3.5k views
ADD COMMENTlink modified 2.2 years ago by Biostar ♦♦ 20 • written 2.2 years ago by User6891200
3

Otherwise, you can always remove the string after the period.

tmp=gsub("\\..*","",row.names(resOrdered)​)
ADD REPLYlink written 2.2 years ago by Sukhdeep Singh9.3k

hello Sukhdeep,

I have exactly the same question as User6891 and after i try to remove the decimal i get an error.

Error: unexpected input in "tmp=gsub("\..*","",row.names(res)�"

Could you please help me with this?

ADD REPLYlink written 2.0 years ago by saamar.rajput10
1

Command should work, I see you have some unidentified symbol in the command you pasted. Try to write it and see if it works!

ADD REPLYlink written 2.0 years ago by Sukhdeep Singh9.3k

tmp=gsub("\..*","",row.names(res)​)

this is my command ...and it shows a question mark in the error.

Error: unexpected input in "tmp=gsub("\..*","",row.names(res)�"

ADD REPLYlink modified 2.0 years ago • written 2.0 years ago by saamar.rajput10

As I said, the above command should work, unless you have a copy-paste error, or the object res has some issue. Check row.names(res), what does it outputs!

ADD REPLYlink written 2.0 years ago by Sukhdeep Singh9.3k

Its working thanks alot :) and thanks for your patience. But 1 more question how do i put the edited ENSEMBL id from tmp back to my res column? I know it is a very basic question but I am new to R.

ADD REPLYlink written 2.0 years ago by saamar.rajput10

Thanks alot Sukhdeep ...it all worked fine :)

ADD REPLYlink written 2.0 years ago by saamar.rajput10

Great, good luck then!

ADD REPLYlink written 2.0 years ago by Sukhdeep Singh9.3k

How did you eventually add tmp back to the res row.names? The answer is not in this thread and I can't figure it out.

Also, is it possible to edit the gene ids in-place instead of creating 'tmp'?

ADD REPLYlink modified 8 months ago • written 8 months ago by Mthabisi Moyo0

can you explain what does it "\\..*","",

ADD REPLYlink written 13 months ago by krushnach80290

remove the string after the period i.e. delete (technically substitute) everything that follows. See this.

ADD REPLYlink modified 13 months ago • written 13 months ago by genomax48k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 644 users visited in the last hour