Question: Problem with Ensembl version identifiers after running DESeq2
0
gravatar for Lila M
17 months ago by
Lila M 460
UK
Lila M 460 wrote:

Hi everybody, I have a problem with my Ensembl ID after running DESEq2 (I'm using hg38 genome):

dds <- DESeq(ds_matrix)
res <- results(dds)

                   baseMean log2FoldChange     lfcSE      stat       pvalue         padj  
ENSG00000176124.11  168.67880  4.991104 0.2797296 17.842601 3.299728e-71 6.057971e-67

As you could see, the identifiers are ENSG00000176124.11, for example, so when I've tried to annotate the genes using,

library("AnnotationDbi")
library("org.Hs.eg.db")
res$symbol <- mapIds(org.Hs.eg.db,
                     keys = row.names(res),
                     column = "SYMBOL",
                     keytype = "ENSEMBL",
                     multiVals = "first")

or using gage, the ID with the dots and the number after it are not recognized and can be match. So does anyone know how to deal with this problem?

Thanks

ADD COMMENTlink modified 17 months ago • written 17 months ago by Lila M 460
3
gravatar for andrew.j.skelton73
17 months ago by
London
andrew.j.skelton735.5k wrote:

As far as I'm aware, the . at the end of an Ensembl gene ID denotes the version. If you omit the version and try to search for ENSG00000176124, that should fix the issue. The bigger question is why you have the version in the gene IDs in the first place...

ADD COMMENTlink written 17 months ago by andrew.j.skelton735.5k

Because the counts where done using salmon and the original files included the identifiers with the dot, so I don't know if I have to remove it from the original files or there is other way to do that... because the ID with the dot are not recognized.

ADD REPLYlink written 17 months ago by Lila M 460
2

It depends on how the salmon index was generated. Generally you can just strip the .xx extension from your IDs to make it work. keys = gsub("\\..*$", "",row.names(res)),

ADD REPLYlink written 17 months ago by andrew.j.skelton735.5k

Hey, I ran into the same issue after also using salmon to quanitfy against a gencode index. This seemed to work for me, but a lot of the mapped ID's refer to genes with NA values. Is there a way to limit the analysis to well annotated genes? I'm not sure what to make of these differences exactly, seeing these huge fold changes but mostly for things that I don't know what they are.

> row.names(YCNT.05_Subset) = gsub("\\..*", "",row.names(YCNT.05_Subset)) 

> YCNT.05_Subset$genename <- mapIds(org.Mm.eg.db,keys = row.names(YCNT.05_Subset), column = "SYMBOL", keytype = "ENSEMBL", multiVals = "first")

'select()' returned 1:1 mapping between keys and columns

> YCNT.05_Subset

log2 fold change (MLE): condition YB6CNT vs YBJCNT 
Wald test p-value: condition YB6CNT vs YBJCNT 
DataFrame with 24 rows and 7 columns
                    baseMean log2FoldChange     lfcSE      stat       pvalue         padj    genename
                   <numeric>      <numeric> <numeric> <numeric>    <numeric>    <numeric> <character>
ENSMUSG00000110704 22.424026      -45.31601  5.115848 -8.662495 4.615481e-18 8.555948e-14          NA
ENSMUSG00000082016 21.264101      -30.86202  6.149671 -4.855873 1.198576e-06 2.613953e-03          NA
ENSMUSG00000094568  9.902831      -30.47642  6.151707 -4.791584 1.654700e-06 3.228843e-03          NA
ENSMUSG00000103651 52.698629      -29.23280  6.149360 -4.591178 4.407518e-06 7.781368e-03          NA
ENSMUSG00000059195 57.152734      -27.95013  5.706919 -4.722360 2.331231e-06 4.321520e-03          NA
...                      ...            ...       ...       ...          ...          ...         ...
ENSMUSG00000005800  21.84068       22.60947  4.501899  4.800077 1.586044e-06 3.228843e-03        Mmp8
ENSMUSG00000022026  14.05982       26.72935  6.134014  4.194537 2.734298e-05 4.223921e-02       Olfm4
ENSMUSG00000084936  55.66428       28.41237  3.742702  7.324221 2.402904e-13 1.781754e-09          NA
ENSMUSG00000093752  19.27306       33.77075  4.913983  6.668876 2.577693e-11 1.194600e-07          NA
ENSMUSG00000074555  11.50912       36.31998  5.086493  6.943876 3.814853e-12 2.357261e-08     Gm10714
ADD REPLYlink modified 12 months ago • written 12 months ago by pvd21070
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1254 users visited in the last hour