How to use RSEM to get the expression level for each gene?
2
0
Entering edit mode
7.8 years ago
moxu ▴ 510

I used rem-calculate-expression as the following:

rsem-calculate-expression -p 8 --paired-end PE1.fastq PE2.fastq hg19_ref my_test_sample

But it only generated a collective sum for each of the chrs like the following:

gene_id transcript_id(s) length effective_length expected_count TPM FPKM

chr1 chr1 249250621.00 249250465.90 1323595.26 3.82 0.12

chr10 chr10 135534747.00 135534591.90 869923.40 4.62 0.14

chr11 chr11 135006516.00 135006360.90 1454485.33 7.76 0.24

chr12 chr12 133851895.00 133851739.90 728502.18 3.92 0.12

While what I want is the expression level for each gene (or each isoform).

Should I use another rsem program or different parameters for rem-calculate-expression?

Thank you!

-- m.x.

RNA-Seq next-gen gene • 3.1k views
ADD COMMENT
1
Entering edit mode
7.8 years ago
igor 13k

RSEM should give you gene values, but it depends on how you generated the reference with rsem-prepare-reference. Can you share that command as well?

ADD COMMENT
0
Entering edit mode

rsem-prepare-reference --bowtie reference_hg19.fa hg19_ref

Thanks!

ADD REPLY
0
Entering edit mode

When you ran rsem-prepare-reference, you did not set the --gtf option:

--gtf <file>

If this option is on, RSEM assumes that 'reference_fasta_file(s)' contains the sequence of a genome, and will extract transcript reference sequences using the gene annotations specified in <file>, which should be in GTF format.

If this option is off, RSEM will assume 'reference_fasta_file(s)' contains the reference transcripts. In this case, RSEM assumes that name of each sequence in the Multi-FASTA files is its transcript_id.

If you don't specify the GTF, RSEM does not know what your genes are. The hg19 FASTA file is just the sequence of the entire human genome.

ADD REPLY
0
Entering edit mode

Thanks a lot. I have tried your suggestion and waiting for the expression results -- it looks like it's now taking much longer to process. Is this expected?

ADD REPLY
0
Entering edit mode

Thanks for the --gtf suggestion. It now worked.

ADD REPLY
1
Entering edit mode
7.8 years ago

If you extract the isoform sequences first and run Bowtie separately prior to RSEM quantification, then you can probably decrease the runtime (with varying extent, depending upon the Bowtie parameters). However, if if you run Bowtie with default parameters, you might notice that the quantification is very different.

ADD COMMENT

Login before adding your answer.

Traffic: 2678 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6