Question

What codes do I use to generate gene cluster vectors with entrez gene id on clusterProfiler.

0

Entering edit mode

4.0 years ago

tpm ▴ 30

Hi guys, I am very new on R.

I have 4 csv files with gene sets X1, X2, X3, X4 and their corresponding logFC for E coil K12 strain under different conditions. X1, X2, X3 and X4 are gene IDs.

May you please advise/provide me with codes I can to use to generate a concatenated vector of entrez gene id for these gene sets along with their corresponding cutoff logFC at +/-0.5 (for upregulated and down regulated genes). The used example from manual is at this code: lapply(gcSample, head) is on this link below

https://bioconductor.statistik.tu-dortmund.de/packages/3.6/bioc/vignettes/clusterProfiler/inst/doc/clusterProfiler.html#biological-theme-comparison:

I need help.

Thank you in advance

clusterProfiler enrichGO DEP R BIOCONDUCTOR • 1.8k views

ADD COMMENT • link updated 3.0 years ago by Kevin Blighe 87k • written 4.0 years ago by tpm ▴ 30

0

Entering edit mode

Please provide some sample input data.

ADD REPLY • link 3.9 years ago by Kevin Blighe 87k

0

Entering edit mode

For each set (X1 to X4) is the gene and the corresponding logFC. For example, this is an example of my dataset if I take just a few samples from it:

X1

galF    0.039335412
adhE    -0.407887182
ribE    0.039104767
mlaC    -0.183376255
cspE    0.020263717
insC1   1.275285226
dps -0.229873398
hha -0.676726393
sufS    1.311929312
csdA    -0.524188066
bamA    0.180521981
uspA    -0.093163475

X2

galF    -0.380046456
adhE    -0.325012556
fliC    0
ribE    0.336109582
mlaC    -0.199362884
cspE    -0.019096892
insC1   0.595535487
dps -0.169085093
hha -0.003482051
insH21  0.182836948
sufS    1.059447578
csdA    1.032406516

X3

galF    -0.063032692
adhE    -0.064043689
fliC    6.663534385
ribE    0.268164765
mlaC    -0.288447139
cspE    -0.039093428
insC1   0
dps -0.241911025
hha 0.044170924
insH21  0.36099054
sufS    0
csdA    1.11192159
bamA    0.061329771
uspA    0.505757213
ompR    -0.220925793
dksA    -0.342553774
chrR    -0.095235864

X4

glnD    5.237895088
eutC    4.64146896
mscM    3.947759653
tatA    3.920706166
elaB    3.785411053
fliI    3.548765407
sapD    3.388648775
ppnP    3.345254427
ybhA    3.190667532
ilvE    2.900593133
tatE    2.865852846
oppC    2.853396894
glsA1   2.672349905
rnpA    2.563564924
ratB    2.473440589
ftsI    2.418999465
galP    2.410974203
fpr 2.379620962
mntR    2.364759177
ygiS    2.337045006
speC    2.252678046
srlA    2.227433038

ADD REPLY • link 3.9 years ago by tpm ▴ 30

score 2 · Answer 1 · 2020-05-24

Great, so, you can just use the Escherichia coli K-12 org.db that is in Bioconductor:

library(org.EcK12.eg.db)

genes <- c('glnD','eutC','mscM','tatA','elaB','fliI','sapD','ppnP',
  'ybhA','ilvE','tatE','oppC')

mapIds(org.EcK12.eg.db, keys = genes,
  column = 'ENTREZID', keytype = 'SYMBOL')
    glnD     eutC     mscM     tatA     elaB     fliI     sapD     ppnP 
"944863" "946925" "948676" "948321" "946751" "946457" "946203" "945048" 
    ybhA     ilvE     tatE     oppC 
"945372" "948278" "945228" "945810"

With the Entrez IDs mapped, we can look up other stuff (and you can use clusterProfiler):

genes_entrez <- mapIds(org.EcK12.eg.db, keys = genes,
  column = 'ENTREZID', keytype = 'SYMBOL')

keytypes(org.EcK12.eg.db)
 [1] "ACCNUM"      "ALIAS"       "ENTREZID"    "ENZYME"      "EVIDENCE"   
 [6] "EVIDENCEALL" "GENENAME"    "GO"          "GOALL"       "ONTOLOGY"   
[11] "ONTOLOGYALL" "PATH"        "PMID"        "REFSEQ"      "SYMBOL"     

annotTable <- select(org.EcK12.eg.db, keys = genes_entrez,
  columns = c('ENTREZID', 'ALIAS', 'ENZYME', 'SYMBOL', 'GENENAME', 'PATH'))
head(annotTable)
  ENTREZID   ALIAS   ENZYME SYMBOL
1   944863 ECK0165 2.7.7.59   glnD
2   944863   glnD5 2.7.7.59   glnD
3   944863    glnD 2.7.7.59   glnD
4   946925 ECK2435  4.3.1.7   eutC
5   946925    eutC  4.3.1.7   eutC
6   948676 ECK4155     <NA>   mscM
                                          GENENAME  PATH
1 PII uridylyltransferase/uridylyl removing enzyme 02020
2 PII uridylyltransferase/uridylyl removing enzyme 02020
3 PII uridylyltransferase/uridylyl removing enzyme 02020
4          ethanolamine ammonia-lyase subunit beta 00564
5          ethanolamine ammonia-lyase subunit beta 00564
6    miniconductance mechanosensitive channel MscM  <NA>

Kevin