There is no gene information in RSEM output
1
0
Entering edit mode
6.2 years ago
John ▴ 270

Hello scientists,

I ran RSEM to calculate gene and isoform expression level,

Code to Prepare reference:

rsem-prepare-reference --gtf mm9.gtf  --transcript-to-gene-map knownIsoforms.txt  --bowtie2 mm9.fa musmus
  1. Downloaded the fasta file from: http://hgdownload.soe.ucsc.edu/goldenPath/mm9/chromosomes/

  2. Known isoforms.txt from: http://hgdownload.soe.ucsc.edu/goldenPath/mm9/database/

  3. gtf file from UCSC table browser.

code to calculate expression:

rsem-calculate-expression  --paired-end --bowtie2  forward_1.fastq reverse_2.fastq ref/musmus  cellnumber1

Results:

Following output I got from cellnumber1.genes.results file.

gene_id transcript_id(s)    length  effective_length    expected_count  TPM FPKM
1   uc007aet.1,uc007aeu.1   3621.00 3338.70 0.00    0.00    0.00
10  uc011whv.1  26.00   0.00    0.00    0.00    0.00
100 uc007amd.1,uc007ame.1   4355.00 4072.70 1.80    0.32    0.17
1000    uc007dac.1  1403.00 1120.70 0.00    0.00    0.00
10000   uc008ajp.1,uc012ajs.1   1415.50 1133.20 0.00    0.00    0.00
10001   uc008ajq.1  2046.00 1763.70 0.00    0.00    0.00
10002   uc008ajr.1,uc008ajs.1,uc008ajt.1,uc008aju.1,uc012ajt.1  6290.60 6008.64 0.00    0.00    0.00

And I don't see any gene name in the gene_id column, rather it shows only numbers! I don't know why!, Is this a correct output? how do I get gene information! (In some tutorials the output looks different from this)

thanks in advance! please help!

rsem RNA-Seq rna-seq • 2.8k views
ADD COMMENT
1
Entering edit mode

But you have transcript Ids right, e.g. uc007aet.1 and uc008ajq.1. Those are Knowngene identifiers, corresponding to the knowngene transcriptome you downloaded.

ADD REPLY
0
Entering edit mode

Yeah WouterDeCoster, But I want to do differential expression analysis, so I want them as gene name, I may map them to gene name (using some tools/ucsc table browser) but a single line contains multiple transcript ID which is separated by comma. I don't know how to do !

p.s I have 70 sequences, If it is not working, I should redo with ensembl reference! please help me

thanks for your response

ADD REPLY
0
Entering edit mode
6.2 years ago

I'm not sure what you are aiming for, here, but you have a file with gene_id in the first column. That gene_id looks like it is an Entrez Gene ID. You can perform your differential expression analysis and at whatever point is convenient, map that Entrez ID to the HGNC symbol. There are many resources for doing so.

Your column with comma-separated transcripts comes about because genes often have multiple transcripts. For the purposes of differential gene expression analysis, you can probably just ignore that detail and focus on the gene_id.

ADD COMMENT

Login before adding your answer.

Traffic: 2566 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6