Question

using rsem sample.genome.STAR.genome.bam for calculating expression

3

Entering edit mode

2.5 years ago

Jorge ▴ 30

Hello, I have run rsem-calculate-expression on pair end sequencing. From the resulting BAM file mapped to the genome (sample.STAR.genome.bam), I would like to use that as in input to calculate expression of genes in the sample. When I try to use it as input in rsem-calculate-expression, it fails because the chromosome entries dont appear as a reference sequence( expected). Any help is appreciated!

rsem rna-seq • 3.7k views

ADD COMMENT • link 2.5 years ago by Jorge ▴ 30

3

Entering edit mode

If you could post what exact command you used and error you are getting will help more. One possibility may be reference fasta file used in generating bam files and using with rsem-calculate-expression are different in chromosome name.

ADD REPLY • link 2.5 years ago by kashiff007 ★ 1.9k

0

Entering edit mode

Sorry for not being clear ! There is no "error", the sample.STAR.genome.bam is part of the output rsem-calculate-expression --star-output-genome-bam. I use this file to filter reads and want to reuse this file for gene quantification, so when I run rsem-calculate-expression -p 32 --paired-end --alignments my_bam.genome.bam GRCh38_gencode dedup_sample

I get the error (which makes sense, one is in genomic coordinates, and this requires it to be in transcriptome coordinates):

RSEM can not recognize reference sequence name chr1!

From the documentation, it says STAR uses the genomic coordinate one, to establish concordance, this is the step im missing. I looked into running STAR with --quantMode TranscriptomeSAM, the problem being that the input cant be a bam file

ADD REPLY • link 2.5 years ago by Jorge ▴ 30

0

Entering edit mode

You cannot use the STAR genome bam to generate counts. You need to FASTQ files (which you can get by extracting reads from the STAR aligned BAM file) as input to rsem-calculate-expression.

ADD REPLY • link 2.5 years ago by Ram 43k

score 0 · Answer 1 · 2021-11-11

0

Entering edit mode

2.5 years ago

swbarnes2 14k

You need to post your command lines. Didn't running calculate-expression already give you the gene count files? And are you sure that genome alignments, and not transcriptome alignments are what's wanted as input?

See the examples here, for instance

https://ycl6.gitbook.io/rna-seq-data-analysis/rna-seq_analysis_workflow/quantification_using_rsem1

ADD COMMENT • link 2.5 years ago by swbarnes2 14k

0

Entering edit mode

yes, rsem already gives out the sample.genes.results the exact command is in the second comment. I know that if I add the transcript alignments it work. But what I actually want is to use the genome coordinates, since I use this alignment to filter reads. In the documentation of STAR it says it first uses the genomic alignment to to then match it to the transcriptome, its this step thats needed. Seems the solution is to just split them into fastq files and remap though.

ADD REPLY • link 2.5 years ago by Jorge ▴ 30