Question: Annotate ouput file from Deseq2
0
gravatar for BM
3.7 years ago by
BM40
United Kingdom
BM40 wrote:

I am trying to annotate the results output file from Desq2 so it contains gene names and symbols. The RNA-seq count file I have used comes from Dexseq and contains ensembl transcript ID:

ENSMUSG00000000001:001

ENSMUSG00000000001:002

ENSMUSG00000000001:003

etc.

I have tried various methods to annotate the results.

1. downloaded annotation from Biomart.

> library(DESeq2)

> counts = read.delim("3mTA2.txt", header=T, row.names=1)

> sample <- read.delim("~/sample.txt")

> count.data.set <- DESeqDataSetFromMatrix(countData=counts, colData=sample,design= ~ genotype)

> dds<-DESeq(count.data.set)

> res <- results(dds)

> annotation <- read.delim("mouse.annt.txt") # load annotation file from Biomart

> res$EnsemblID <- row.names(res)

> res <- merge(res, annotation, by = 'EnsemblID', all.x = TRUE)

 

It adds column to the output file but values are blank.

 

2. Also used AnnotationDbi

 

> library("AnnotationDbi")

> library("org.Mmu.eg.db")

> res$symbol <- mapIds(org.Mmu.eg.db,

+                      keys=row.names(res),

+                      column="SYMBOL",

+                      keytype="ENSEMBL",

+                      multiVals="first")

 

Error in .testForValidKeys(x, keys, keytype) :

None of the keys entered are valid keys for 'ENSEMBL'. Please use the keys method to see a listing of valid arguments.

 

Any sugestions?

                             
                               
                               
                               
                               
                               
   
ADD COMMENTlink modified 3.7 years ago by James Ashmore2.6k • written 3.7 years ago by BM40
0
gravatar for James Ashmore
3.7 years ago by
James Ashmore2.6k
UK/Edinburgh/MRC Centre for Regenerative Medicine
James Ashmore2.6k wrote:

Those IDs you have listed are Ensembl gene IDs, not transcript IDs. I'm also not sure why they have the ':001' string after them? If you try the BioMart id conversion tool you can see that if you remove this last part and convert the ID to a gene name you get a result e.g. ENSMUSG00000000001 = GNAI3. This Ensembl tutorial may help you discern between the different IDs.

ADD COMMENTlink modified 3.7 years ago • written 3.7 years ago by James Ashmore2.6k

ENSMUSG00000000001:001; ENSMUSG00000000001:002 - these refer to the the different exons of the gene.

So the question I suppose is how to combine or merge the different exon counts for the same gene into one count for the gene?

Can this be done in Dexseq or Deseq2?

ADD REPLYlink written 3.7 years ago by BM40

You don't want to do that, since doing so will double count a number of things. Just run either htseq-count or featureCounts (this is much faster) and directly get gene level metrics.

ADD REPLYlink written 3.7 years ago by Devon Ryan90k

The initial analysis was performed elsewhere. So I only have the Dexseq count file with ensemble ids of all the different exons of a gene. How can i use this file to proceed - either by annotating exons ids into a gene or using the file in Deseq2 and then annotate ?

ADD REPLYlink written 3.7 years ago by BM40

That's unfortunate, particularly if you don't have the BAM or fastq files. Indeed, the best you can do is just remove the :E??? from the names, sum over the results and use that. Note that the results will then be approximate. You could do that with awk.

ADD REPLYlink written 3.7 years ago by Devon Ryan90k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 923 users visited in the last hour