Question

Consistency in the GTF files used for the index building of STAR alignment and RSEM

0

Entering edit mode

16 months ago

Xunzhi.Zhang • 0

Hello, I want to use RSEM to get the transcript counts from STAR alignment results. I have built the STAR indices with the GTF from UCSC and got the BAM files after alignment. I wonder if I could still use these BAM files if I want to use the Gencode GTF in rsem-prepare-reference.

Furthermore, when are the GTF files used in STAR index building and further feature quantification (like RSEM and featureCounts) interchangeable?

Thank you very much!

RSEM STAR alignment rna-seq gtf • 827 views

ADD COMMENT • link updated 16 months ago by Ram 43k • written 16 months ago by Xunzhi.Zhang • 0

score 0 · Answer 1 · 2022-12-30

I don't think you can use different GTF files with STAR --runMode genomeGenerate and RSEM rsem-calculate-expression. rsem-prepare-reference internally calls STAR --runMode genomeGenerate with the sjdbGTFfile param set to your GTF input. This will create a *.transcripts.fa file with the transcript identifiers from the GTF file, and downstream alignment will align to this transcriptome, which means your transcriptome BAM will contain these transcripts as the contigs, thus rendering the BAM useless unless you extract the reads and realign hem to the other transcriptome - it'd be easier to just run rsem-calculate-expression against the new transcriptome.

Furthermore, when are the GTF files used in STAR index building and further feature quantification (like RSEM and featureCounts) interchangeable?

What do you mean by "when"? I don't see the time component to this question. Even if you're asking for specific conditions, the question does not make sense. Pipelines need to consistently use the same set of reference files such as FASTA and GTF.