I have added gene name and Entrez ID in my DESeq2 result. The commands I have used are:
res$hgnc_symbol <- convertIDs(gsub("\\..*","", row.names(res)), "ENSEMBL", "SYMBOL", org.Hs.eg.db) res$entrezgene <- convertIDs(gsub("\\..*","", row.names(res)), "ENSEMBL", "ENTREZID", org.Hs.eg.db) resOrdered <- res[order(res$pvalue),]
After, checking the object resOrdered:
I am getting like this:
gene_id stat pvalue padj <numeric> <numeric> <numeric> ENSG00000280228.1 -5.9096792673878 3.42774467723773e-09 1.53944643600176e-05 ENSG00000225555.1 -5.88657749721615 3.94274922781857e-09 1.53944643600176e-05 ENSG00000234616.7 -5.77542188212913 7.67605073778235e-09 1.99807600704475e-05 ENSG00000058866.13 -4.88480635581578 1.03530552769589e-06 0.00163524543935493 ENSG00000180152.3 -4.8645382294294 1.14724332367248e-06 0.00163524543935493 ENSG00000244968.5 -4.84652440600431 1.25643137868224e-06 0.00163524543935493 hgnc_symbol entrezgene <character> <character> ENSG00000280228.1 NA NA ENSG00000225555.1 NA NA ENSG00000234616.7 JRK 8629 ENSG00000058866.13 DGKG 1608 ENSG00000180152.3 NA NA ENSG00000244968.5 LIFR-AS1 100506495
Here, I am not getting some gene names and Entrez ID and it is showing NA.
I have aligned my data to GRCh38 (took GTF from same assembly) with STAR and count file were created using htseq-count.
What could be the reason of these. Please put your valuable suggestion how I should go forward with this?
Did you check your GTF file? Instead of using
gene_idas identifier, you need to use
gene_nameto get the desired output (gene names) when you do the counting. No mapping would be needed in that case. See an example line of annotation below.
gene_id "ENSG00000223972.5" = gene_name "DDX11L1"
gene_id "ENSG00000280228.1"= gene_name "AC079753.1"
Thank you so much. If I'm not wrong then org.Hs.eg.db package has no record for these IDs. The gene name exist in the GTF files when I checked it manually. Yes that's a great idea. I should Use gene_name instead of id while counting with htseq.