I have run
rsem-calculate-expression on pair end sequencing. From the resulting BAM file mapped to the genome (
sample.STAR.genome.bam), I would like to use that as in input to calculate expression of genes in the sample. When I try to use it as input in
rsem-calculate-expression, it fails because the chromosome entries dont appear as a reference sequence( expected). Any help is appreciated!
If you could post what exact command you used and error you are getting will help more. One possibility may be reference fasta file used in generating bam files and using with rsem-calculate-expression are different in chromosome name.
Sorry for not being clear ! There is no "error", the
sample.STAR.genome.bamis part of the output
rsem-calculate-expression --star-output-genome-bam. I use this file to filter reads and want to reuse this file for gene quantification, so when I run
rsem-calculate-expression -p 32 --paired-end --alignments my_bam.genome.bam GRCh38_gencode dedup_sample
I get the error (which makes sense, one is in genomic coordinates, and this requires it to be in transcriptome coordinates):
RSEM can not recognize reference sequence name chr1!
From the documentation, it says STAR uses the genomic coordinate one, to establish concordance, this is the step im missing. I looked into running
--quantMode TranscriptomeSAM, the problem being that the input cant be a bam file
You cannot use the STAR genome bam to generate counts. You need to FASTQ files (which you can get by extracting reads from the STAR aligned BAM file) as input to