Hi, I have a tumour RNA-seq data, which I have aligned using STAR with a human reference genome, using the basic default options. However, I am confused with the variety of options (chimeric junctions, 2-pass mapping, readCounts, etc) available in the STAR manual and I would like to know which of these options are more suitable for doing an analysis to identify somatic mutations.
In general, I use and recommend others use STAR with the default settings unless there is a specific reason to change them. In my experience many options are great for fine-tuning your analysis to suit the particulars of your sequencing runs (trying to account for low depth/short-reads/etc) but I rarely start out with these options, rather I use them as I encounter problems. In many other cases these options are a matter of preference for how you want your files to look (sorted/bam/unmapped-output/etc). Things like 2-pass are useful options if you are trying to do a splicing analysis where you wish to identify junctions that are unannotated, but I don't think it is recommended for other analysis. If you are looking for chromosomal rearrangements in your analysis chimSegmentMin (which seems likely since you are using RNA-seq to detect mutations) would be absolutely something you would want to use, and 2-pass mode would likely improve detection of these rearrangements as they are effectively unannotated junctions. I personally prefer some of the other methods for inferring read-counts (kallisto/HTSeq) but if you want them quantified by star then use the readCounts option.
I have to admit that I am mostly using what has been recommended, more so than testing with several projects.
I think I actually have a similar answer as has already been provided. However, to put it a slightly different way:
I would usually run STAR with
--twopassMode Basic --outSAMstrandField intronMotif. I think the 2nd parameter is supposed to help with certain programs, like cufflinks, since you don't specify strandedness for the library.
If I want to output files for gene fusion analysis, then I would add
--chimSegmentMin 12 --chimJunctionOverhangMin 12 --chimOutType Junctions.
For specific projects, I have probably made other changes. For example, last time I checked, PacBio had recommendations for aligning IsoSeq CCS data. However, I think the most important (extra) parameters are
--outSAMattributes All --readNameSeparator space --seedPerReadNmax 10000.