Question

Annotate ouput file from Deseq2

0

Entering edit mode

8.6 years ago

BM ▴ 70

I am trying to annotate the results output file from Desq2 so it contains gene names and symbols. The RNA-seq count file I have used comes from Dexseq and contains ensembl transcript ID:

ENSMUSG00000000001:001
ENSMUSG00000000001:002
ENSMUSG00000000001:003

etc.

I have tried various methods to annotate the results.

1. downloaded annotation from Biomart.

> library(DESeq2)

counts = read.delim("3mTA2.txt", header=T, row.names=1)
sample <- read.delim("~/sample.txt")
count.data.set <- DESeqDataSetFromMatrix(countData=counts, colData=sample,design= ~ genotype)
dds<-DESeq(count.data.set)
res <- results(dds)
annotation <- read.delim("mouse.annt.txt") # load annotation file from Biomart
res$EnsemblID <- row.names(res)
res <- merge(res, annotation, by = 'EnsemblID', all.x = TRUE)

It adds column to the output file but values are blank.

2. Also used AnnotationDbi

library("AnnotationDbi")
library("org.Mmu.eg.db")
res$symbol <- mapIds(org.Mmu.eg.db,
+                      keys=row.names(res),
+                      column="SYMBOL",
+                      keytype="ENSEMBL",
+                      multiVals="first")
Error in .testForValidKeys(x, keys, keytype) :
None of the keys entered are valid keys for 'ENSEMBL'. Please use the keys method to see a listing of valid arguments.

Any suggestions?

Deseq2 Biomart map IDs RNA-Seq Annotate • 5.9k views

ADD COMMENT • link updated 19 months ago by Ram 43k • written 8.6 years ago by BM ▴ 70

Ram · Answer 1 · 2015-09-04

0

Entering edit mode

8.6 years ago

James Ashmore ★ 3.4k

Those IDs you have listed are Ensembl gene IDs, not transcript IDs. I'm also not sure why they have the ':001' string after them? If you try the BioMart id conversion tool you can see that if you remove this last part and convert the ID to a gene name you get a result e.g. ENSMUSG00000000001 = GNAI3. This Ensembl tutorial may help you discern between the different IDs.

ADD COMMENT • link updated 19 months ago by Ram 43k • written 8.6 years ago by James Ashmore ★ 3.4k

0

Entering edit mode

ENSMUSG00000000001:001; ENSMUSG00000000001:002 - these refer to the the different exons of the gene.

So the question I suppose is how to combine or merge the different exon counts for the same gene into one count for the gene?

Can this be done in Dexseq or Deseq2?

ADD REPLY • link 8.6 years ago by BM ▴ 70

0

Entering edit mode

You don't want to do that, since doing so will double count a number of things. Just run either htseq-count or featureCounts (this is much faster) and directly get gene level metrics.

ADD REPLY • link 8.6 years ago by Devon Ryan 104k

0

Entering edit mode

The initial analysis was performed elsewhere. So I only have the Dexseq count file with ensemble ids of all the different exons of a gene. How can i use this file to proceed - either by annotating exons ids into a gene or using the file in Deseq2 and then annotate ?

ADD REPLY • link 8.6 years ago by BM ▴ 70

0

Entering edit mode

That's unfortunate, particularly if you don't have the BAM or fastq files. Indeed, the best you can do is just remove the :E??? from the names, sum over the results and use that. Note that the results will then be approximate. You could do that with awk.

ADD REPLY • link 8.6 years ago by Devon Ryan 104k