Annotate ouput file from Deseq2
1
0
Entering edit mode
6.1 years ago
BM ▴ 70

I am trying to annotate the results output file from Desq2 so it contains gene names and symbols. The RNA-seq count file I have used comes from Dexseq and contains ensembl transcript ID:

ENSMUSG00000000001:001

ENSMUSG00000000001:002

ENSMUSG00000000001:003

etc.

I have tried various methods to annotate the results.

1. downloaded annotation from Biomart.

> library(DESeq2)

> counts = read.delim("3mTA2.txt", header=T, row.names=1)

> sample <- read.delim("~/sample.txt")

> count.data.set <- DESeqDataSetFromMatrix(countData=counts, colData=sample,design= ~ genotype)

> dds<-DESeq(count.data.set)

> res <- results(dds)

> annotation <- read.delim("mouse.annt.txt") # load annotation file from Biomart

> res$EnsemblID <- row.names(res)

> res <- merge(res, annotation, by = 'EnsemblID', all.x = TRUE)

 

It adds column to the output file but values are blank.

 

2. Also used AnnotationDbi

 

> library("AnnotationDbi")

> library("org.Mmu.eg.db")

> res$symbol <- mapIds(org.Mmu.eg.db,

+                      keys=row.names(res),

+                      column="SYMBOL",

+                      keytype="ENSEMBL",

+                      multiVals="first")

 

Error in .testForValidKeys(x, keys, keytype) :

None of the keys entered are valid keys for 'ENSEMBL'. Please use the keys method to see a listing of valid arguments.

 

Any sugestions?

                             
                               
                               
                               
                               
                               
   
RNA-Seq Deseq2 Annotate Biomart map IDs • 4.5k views
ADD COMMENT
0
Entering edit mode
6.1 years ago
James Ashmore ★ 3.2k

Those IDs you have listed are Ensembl gene IDs, not transcript IDs. I'm also not sure why they have the ':001' string after them? If you try the BioMart id conversion tool you can see that if you remove this last part and convert the ID to a gene name you get a result e.g. ENSMUSG00000000001 = GNAI3. This Ensembl tutorial may help you discern between the different IDs.

ADD COMMENT
0
Entering edit mode

ENSMUSG00000000001:001; ENSMUSG00000000001:002 - these refer to the the different exons of the gene.

So the question I suppose is how to combine or merge the different exon counts for the same gene into one count for the gene?

Can this be done in Dexseq or Deseq2?

ADD REPLY
0
Entering edit mode

You don't want to do that, since doing so will double count a number of things. Just run either htseq-count or featureCounts (this is much faster) and directly get gene level metrics.

ADD REPLY
0
Entering edit mode

The initial analysis was performed elsewhere. So I only have the Dexseq count file with ensemble ids of all the different exons of a gene. How can i use this file to proceed - either by annotating exons ids into a gene or using the file in Deseq2 and then annotate ?

ADD REPLY
0
Entering edit mode

That's unfortunate, particularly if you don't have the BAM or fastq files. Indeed, the best you can do is just remove the :E??? from the names, sum over the results and use that. Note that the results will then be approximate. You could do that with awk.

ADD REPLY

Login before adding your answer.

Traffic: 1193 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6