Entering edit mode
                    4.0 years ago
        Zahra
        
    
        ▴
    
    110
    Hi all,
As mentioned earlier in this post, I tried to convert the Ensembl gene ids to the Gene symbol. I didn't receive any error by the code below but the nrow of ens_to_symbol_biomart is 55605 and the length of ens is 55602, and I do not understand the reason for this difference. Would you mind helping me, please?
Here is my code:
 query <- GDCquery(project = "TCGA-COAD", data.category = "Transcriptome Profiling" ,
                      data.type = "Gene Expression Quantification",
                      workflow.type = "HTSeq - Counts" , 
                      experimental.strategy = "RNA-Seq")
    GDCdownload(query)
    query.counts.colon <- GDCprepare(query)
    Colon.Matrix <- as.data.frame(SummarizedExperiment::assay(query.counts.colon ))
    ens <- Colon.Matrix$ENS.ID
head(ens)
[1] "ENSG00000000003" "ENSG00000000005" "ENSG00000000419" "ENSG00000000457"
[5] "ENSG00000000460" "ENSG00000000938"
require (org.Hs.eg.db)
ens_to_symbol <- mapIds(
  org.Hs.eg.db,
  keys = ens,
  column = 'SYMBOL',
  keytype = 'ENSEMBL')
head(ens_to_symbol)
ENSG00000000003 ENSG00000000005 ENSG00000000419 ENSG00000000457 ENSG00000000460 
       "TSPAN6"          "TNMD"          "DPM1"         "SCYL3"      "C1orf112" 
ENSG00000000938 
          "FGR"
mart <- useDataset('hsapiens_gene_ensembl', useMart('ensembl'))
ens_to_symbol_biomart <- getBM(
  filters = 'ensembl_gene_id',
  attributes = c('ensembl_gene_id', 'hgnc_symbol'),
  values = ens,
  mart = mart)
ens_to_symbol_biomart <- merge(
  x = as.data.frame(ens),
  y =  ens_to_symbol_biomart ,
  by.y = 'ensembl_gene_id',
  all.x = TRUE,
  by.x = 'ens')
head(ens_to_symbol_biomart)
        ens hgnc_symbol
1 ENSG00000000003      TSPAN6
2 ENSG00000000005        TNMD
3 ENSG00000000419        DPM1
4 ENSG00000000457       SCYL3
5 ENSG00000000460    C1orf112
6 ENSG00000000938         FGR
Dear Hamid, Thanks a lot