CummeRbund: unable to use getGenes to retrieve short names in R
Entering edit mode
7.1 years ago
CandiceChuDVM ★ 2.4k

Hi all,

I am now using CummeRbund in R (version 3.2.4) to visualize my Cuffdiff (version 2.2.1) outputs.

I was hoping to get short names such as MMP9... for my genes. However, I can only get NCBI Reference Sequence such as NM_001002930 NM_001002932 NM_001002938...etc.

Here is my R script:

cuff_data <- readCufflinks(dir=refCuffdiff,rebuild=T,gtfFile=gtfFilePath,genome=genomePath)
diffGeneIDs <- getSig(cuff_data,level="genes",alpha=0.01)    

Here is part of my output:

840  XLOC_024368                                   <NA>  
841  XLOC_024378                                   <NA>  
842  XLOC_024418                                   <NA>  
843  XLOC_024432                                   <NA>  
844  XLOC_024434                                   <NA>  
845  XLOC_024442                                   <NA>  
846  XLOC_024444                           NM_001003241  
847  XLOC_024451                           NM_001003219  
848  XLOC_024474                           NM_001197143  
849  XLOC_024482                                   <NA>  
850  XLOC_024503                              NR_128749

(1) I was wondering why most of are <na>. Is it caused by the lack of annotation in dog genome (CanFam3.1.gtf)? Or it has something to do with the addFeatures function?

(2) For the sample that has an ID, I was hoping to get short names instead of the NCBI Reference Sequence. How can I achieve my goal?

I have checked this post on SEQanswer but it couldn't solve my problem.

Thanks a lot!!!

RNA-Seq cummeRbund R • 2.2k views
Entering edit mode

The problem is coming from your annotation file that you provided to Cuffdiff. Check whether it has the gene names there? For a quick test, do head gene_exp.diff inside the folder where you store the cuffdiff results, I reckon there are no gene names there only.

Entering edit mode

Yes. You are right. I have been working with dog genome, and I downloaded the gtf file from UCSC. The head gene_exp.diff result looks like this:

 test_id    gene_id gene    locus   sample_1    sample_2    status  value_1 value_2 log2(fold_change)   test_stat   p_value q_value significant
 XLOC_000001    XLOC_000001 -   chr1:722268-735387  rapid   slow    OK  8.01499 9.6448  0.267051    0.375532    0.52065 0.998058    no
 XLOC_000002    XLOC_000002 -   chr1:758468-805972  rapid   slow    OK  8.42177 10.402  0.304661    0.388298    0.49805 0.998058    no
 XLOC_000003    XLOC_000003 -   chr1:2710923-2742056    rapid   slow    OK  3.31617 2.22304 -0.576982   -0.664651   0.25485 0.998058    no
 XLOC_000004    XLOC_000004 -   chr1:2837400-2921732    rapid   slow    OK  7.15783 7.10424 -0.0108414  -0.0171568  0.9757  0.998058    no
 XLOC_000005    XLOC_000005 -   chr1:3053815-3066829    rapid   slow    OK  5.6806  4.76451 -0.253717   -0.370415   0.5147  0.998058    no
 XLOC_000006    XLOC_000006 -   chr1:3275469-3391717    rapid   slow    OK  7.30296 6.84825 -0.0927457  -0.104054   0.8612  0.998058    no
 XLOC_000007    XLOC_000007 -   chr1:4304020-4312108    rapid   slow    OK  5.7179  5.45483 -0.067951   -0.0962702  0.8682  0.998058    no
 XLOC_000008    XLOC_000008 NM_001193298    chr1:5070802-5112269    rapid   slow    OK  11.1779 16.6043 0.57091 0.452823    0.4478  0.998058    no
 XLOC_000009    XLOC_000009 -   chr1:8181111-8305073    rapid   slow    OK  7.05367 5.78846 -0.285193   -0.407552   0.5128  0.998058    no

Does that mean most of the gene have not been annotated?

Entering edit mode
7.1 years ago

The GTF file you used with Cuffdiff, lacks this information. Use this annotation file from Ensembl, and redo the cuffdiff step, will solve your problem.

More information about Dog genome:


Login before adding your answer.

Traffic: 1673 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6