I have 2x75b TruSeq stranded RNA Seq data from rat samples and collected on an Illumina NextSeq machine. I have removed adapters from the FASTQ files and quality trimmed them using trimmomatic. I'd like to align them using STAR, and generate counts matrices for downstream differential expression analysis. I am confused about the options to use during the STAR alignment.
Here is what I have:
STAR --genomeDir $STARINDICES/ \ --readFilesIn sample1_read1.fq.gz sample1_read2.fq.gz \ --outFileNamePrefix out_ \ --runThreadN 4 \ --outSAMattrRGline ID:"sample1" SM:"sample1" LB:"sample1" PL:"ILLUMINA" \ --outBAMsortingThreadN 4 \ --outSAMtype BAM SortedByCoordinate \ --outSAMunmapped Within \ --outSAMstrandField intronMotif \ --outFilterIntronMotifs RemoveNoncanonicalUnannotated \ --readFilesCommand zcat \ --chimSegmentMin 20 \ --genomeLoad NoSharedMemory
Specifically, am I correct to select these three options?
--outSAMunmapped Within \ # outputs unmapped reads within the main SAM file. --outSAMstrandField intronMotif \ # strand derived from the intron motif. Reads with inconsistent and/or non-canonical introns are filtered out. --outFilterIntronMotifs RemoveNoncanonicalUnannotated \ # filter out alignments that contain non-canonical unannotated junctions when using annotated spice junctions database. The annotated non-canonical junctions will be kept.
I will be using htseq-count or featureCounts (but may use Cufflinks as well) to generate expression counts.
Have I missed anything? And do I need to modify the resulting BAM file in any way before using it as input for htseq-count / featureCounts?