Should I use a bad/ mediocre gene model as input for STAR RNA-seq alignment?
In the STAR RNA-seq aligner manual I read that a gene model should be used when indexing a reference genome before alignment. https://github.com/alexdobin/STAR/blob/master/doc/STARmanual.pdf
--sjdbGTFfile species the path to the file with annotated transcripts in the standard GTF format. STAR will extract splice junctions from this file and use them to greatly improve accuracy of the mapping. While this is optional, and STAR can be run without annotations, using annotations is highly recommended whenever they are available. Starting from 2.4.1a, the annotations can also be included on the fly at the mapping step.
I would like to know if this is still recommended when working with non model organism species that has a bad or mediocre gene model.
Is there a risk that RNA-seq reads will be aligned in the wrong location because of a mistakes in the gene model? How big is this risk?
A goal or the RNA-seq alignment is to improve / curate the gene model.
Do I need to rerun all my RNA-seq alignment (ie. re-create RNA-seq BAM files) every time I have slightly or substantially improved my gene model?
Or does 2-pass mapping mode already reduce the need for re-running after having upgraded the gene model?
For the most sensitive novel junction discovery,I would recommend running STAR in the 2-pass mode. It does not increase the number of detected novel junctions, but allows to detect more splices reads mapping to novel junctions. The basic idea is to run 1st pass of STAR mapping with the usual parameters, then collect the junctions detected in the rst pass, and use them as "annotated" junctions for the 2nd pass mapping.
Should I always do multi-sample 2-pass mapping? Also if the RNA-seq samples are from multiple different projects / experiments?
For a study with multiple samples, it is recommended to collect 1st pass junctions from all samples. 1. Run 1st mapping pass for all samples with "usual" parameters. Using annotations is recom- mended either a the genome generation step, or mapping step. 2. Run 2nd mapping pass for all samples , listing SJ.out.tab files from all samples in --sjdbFileChrStartEnd /path/to/sj1.tab /path/to/sj2.tab ....
Does this mean that I should always re-align all RNA-seq samples from fastq after having received new RNA-seq samples? (because the new RNA-seq samples might cause a new splice junction to be considered for alignment of the existing RNA-seq samples?)