I am now using CummeRbund in R (version 3.2.4) to visualize my Cuffdiff (version 2.2.1) outputs.
I was hoping to get short names such as
MMP9... for my genes. However, I can only get NCBI Reference Sequence such as
NM_001002930 NM_001002932 NM_001002938...etc.
Here is my R script:
cuff_data <- readCufflinks(dir=refCuffdiff,rebuild=T,gtfFile=gtfFilePath,genome=genomePath) diffGeneIDs <- getSig(cuff_data,level="genes",alpha=0.01) diffGenes<-getGenes(cuff_data,diffGeneIDs) featureNames(diffGenes)
Here is part of my output:
840 XLOC_024368 <NA> 841 XLOC_024378 <NA> 842 XLOC_024418 <NA> 843 XLOC_024432 <NA> 844 XLOC_024434 <NA> 845 XLOC_024442 <NA> 846 XLOC_024444 NM_001003241 847 XLOC_024451 NM_001003219 848 XLOC_024474 NM_001197143 849 XLOC_024482 <NA> 850 XLOC_024503 NR_128749
(1) I was wondering why most of are <na>. Is it caused by the lack of annotation in dog genome (CanFam3.1.gtf)? Or it has something to do with the
(2) For the sample that has an ID, I was hoping to get short names instead of the NCBI Reference Sequence. How can I achieve my goal?
I have checked this post on SEQanswer but it couldn't solve my problem.
Thanks a lot!!!
The problem is coming from your annotation file that you provided to Cuffdiff. Check whether it has the gene names there? For a quick test, do
head gene_exp.diffinside the folder where you store the cuffdiff results, I reckon there are no gene names there only.
Yes. You are right. I have been working with dog genome, and I downloaded the gtf file from UCSC. The
head gene_exp.diffresult looks like this:
Does that mean most of the gene have not been annotated?