Converting mouse Ensembl ID's to gene names in a data frame
1
0
Entering edit mode
10 months ago
lmck0705 • 0

I have recently produced an unsupervised hierarchical cluster heat map using 30 different RNA-seq samples. The x axis is labelled as the name of each sample, and the y axis displays the 100 most variable genes presented as mouse Ensembl ID's (e.g. ENSMUSG00000020573).

I was just wondering if there was a way for me to replace the Ensembl ID's with gene names (e.g. Pik3cg) before I input it into the pheatmap() function.

My input table is:

                        WT_Animal1      WT_Animal2     WT_Animal3
ENSMUSG00000094652      0.03463869       0.7333992     -0.29986091
ENSMUSG00000006356     -0.64264559      -0.5609578     -0.06037522
ENSMUSG00000019897      0.09159506      -0.1133322     -0.12974861
ENSMUSG00000027790     -0.25124228       1.2871582     -0.92491260
ENSMUSG00000054999     -0.58618795       1.2079283     -0.89929279
ENSMUSG00000072573      0.16812802       0.1058453     -0.16593449

I changed the column names manually using colnames(mat) <- c() but I want to know how to change the row names (Ensembl ID's) using a different function so I can reproduce it in further plots.

I have tried to read up on using biomaRt and other packages but can't seem to work out a way to do it.

Any help would be much appreciated!

RNA-Seq R biomart pheatmap clustering • 716 views
ADD COMMENT
0
Entering edit mode
10 months ago

Hi,

You can use the org.Mm.eg.db package, in this case. However, it is likely that some of your Ensembl gene IDs will map to the same gene symbol, and that some will not map to any symbol. Here is a quick example:

require(org.Mm.eg.db)

ens <- c('ENSMUSG00000094652','ENSMUSG00000006356','ENSMUSG00000019897',
  'ENSMUSG00000027790','ENSMUSG00000054999','ENSMUSG00000072573')

mapIds(org.Mm.eg.db, keys = ens,
  column = c('SYMBOL'), keytype = 'ENSEMBL')
'select()' returned 1:1 mapping between keys and columns
ENSMUSG00000094652 ENSMUSG00000006356 ENSMUSG00000019897 ENSMUSG00000027790 
                NA            "Crip2"           "Ccdc59"              "Sis" 
ENSMUSG00000054999 ENSMUSG00000072573 
        "Naaladl1"                 NA

select(org.Mm.eg.db, keys = ens, keytype = 'ENSEMBL',
  columns = c('SYMBOL', 'GENENAME', 'ENSEMBL'))
'select()' returned 1:1 mapping between keys and columns
             ENSEMBL   SYMBOL
1 ENSMUSG00000094652     <NA>
2 ENSMUSG00000006356    Crip2
3 ENSMUSG00000019897   Ccdc59
4 ENSMUSG00000027790      Sis
5 ENSMUSG00000054999 Naaladl1
6 ENSMUSG00000072573     <NA>
                                             GENENAME
1                                                <NA>
2                             cysteine rich protein 2
3                    coiled-coil domain containing 59
4              sucrase isomaltase (alpha-glucosidase)
5 N-acetylated alpha-linked acidic dipeptidase-like 1
6                                                <NA>

I am sure that you can take it from here.

Kevin

ADD COMMENT

Login before adding your answer.

Traffic: 2551 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6