Question

Convert gene id to gene name with org.Hs.eg

0

Entering edit mode

3.6 years ago

yakumo • 0

Hi everyone,

I am trying to convert gene ids to gene names from RNAseq data of six samples. I have initially completed aligning and mapping with Galaxy, and calculated counts with featurecounts in Galaxy.

Now I am trying to convert the gene ids to names with the orgs.HS.eg package in bioconductor.

I have a merged the data from the six samples to one file ('count file'), where one column indicates gene_id and every other column is for each of the 6 samples (i.e. 7 columns in total).

        SRR1554534 SRR1554536 SRR1554539 SRR1554537 SRR1554566 SRR1554568
NA.            211        416        381       2460       2721       2213
NA..1           22          6          6          2          2          6
NA..2          733        325       1240       1687       2235       1741
NA..3          333        104        723       1390       1807       1165
NA..4           40         17        112        757       1005        872
NA..5          453        219        198        110        143         82

Now I try to convert to gene names with this:

for (i in 1:nrow(count_file)){count_file[i,1] = lookUp(toString(count_file[i,1]), 'org.Hs.eg', 'SYMBOL')}
rownames(count_file) = make.names(count_file[,1], unique=TRUE)
count_file[,1] = NULL
count_file

I receive the following result:

              SRR1554536 SRR1554539 SRR1554537 SRR1554566 SRR1554568
ALAS1            416        381       2460       2721       2213
ABCB7              6          6          2          2          6
C8G              325       1240       1687       2235       1741
APLP1            104        723       1390       1807       1165
ASIC2             17        112        757       1005        872
ASS1P8           219        198        110        143         82
...

My question: I am wondering why every time my first sample is missing - there are always only 5 instead of 6 samples.

Do you have any idea where the mistake is?

Thanks a lot!

RNA-Seq Bioconductor annotation R • 1.3k views

ADD COMMENT • link updated 3.6 years ago by zx8754 11k • written 3.6 years ago by yakumo • 0

0

Entering edit mode

You say you have 7 columns, but your initial data frame has only 6. Where is the gene_id column? Your current gene symbols are obtained by treating SRR1554534's counts as Entrez IDs.

ADD REPLY • link 3.6 years ago by Ram 43k

0

Entering edit mode

Thank you for your help!

I have corrected the tabular list now to the following:

      Geneid SRR1554534 SRR1554536 SRR1554539 SRR1554537 SRR1554566 SRR1554568
1   ENSG00000000003.10        211        416        381       2460       2721       2213
2    ENSG00000000005.5         22          6          6          2          2          6
3    ENSG00000000419.8        733        325       1240       1687       2235       1741
4    ENSG00000000457.9        333        104        723       1390       1807       1165
5   ENSG00000000460.12         40         17        112        757       1005        872
6    ENSG00000000938.8        453        219        198        110        143         82

Using the same code as above...

for (i in 1:nrow(count_file)){count_file[i,1] = lookUp(toString(count_file[i,1]), 'org.Hs.eg', 'SYMBOL')}
rownames(count_file) = make.names(count_file[,1], unique=TRUE)
count_file[,1] = NULL
count_file

... I receive the following result when running 'org.Hs.eg': Now with 6 sample columns, but without a proper gene name:

            SRR1554534 SRR1554536 SRR1554539 SRR1554537 SRR1554566 SRR1554568
NA.            211        416        381       2460       2721       2213
NA..1           22          6          6          2          2          6
NA..2          733        325       1240       1687       2235       1741
NA..3          333        104        723       1390       1807       1165

Can you spot an error in my code? Or is org.Hs.eg maybe not suitable for this?

Many thanks!

ADD REPLY • link 3.6 years ago by yakumo • 0

0

Entering edit mode

What is lookUp doing? The fact that the code worked when you were using Entrez IDs but it stopped now that you're using ENSEMBL Gene IDs should tell you where the problem lies.

ADD REPLY • link 3.6 years ago by Ram 43k

0

Entering edit mode

Thanks!

I was able to surpass this problem by using an annotation file with gene names.

I haven't altered the code to the initial one - it seems to me that 'org.Hs.eg' seems to expect EntrezID in the first column, rather than the ENSEMBL Gene ID - is it possible that I need to use a different code for ENSEMBL IDs?

ADD REPLY • link 3.6 years ago by yakumo • 0

0

Entering edit mode

You can use select or mapIds to get from one ID to another. Use the appropriate keyType argument.

ADD REPLY • link 3.6 years ago by Ram 43k