Hello everyone! I am new in the use of this tool and therefore I have some questions. I am working with RNA paired end read sequences. My goal is to find SNPs in 2 strains of trout to be able to compare them, from each strain I have 8 fish sampled, each fish with 6 different tissues. I already went through the STAR manual but some questions came up. My steps are the following:
STAR --runMode genomeGenerate --genomeDir GenomeDir –genomeFastaFiles ReferenceGenome/genome.fna --sjdbGTFfile ReferenceGenome/Annotations.gff --limitGenomeGenerateRAM 140000000000 --runThreadN 8
Since my annotations file is .gff, should I be using –sjdbGTFtagExonParentTranscript instead of -- sjdbGTFfile?
STAR --runMode alignReads --genomeDir GenomeDir --readFilesCommand zcat --readFilesIn Forelle/${R1} Forelle/${R2} --outFileNamePrefix ${sample}_ --outFilterMismatchNmax 3 --runThreadN 8
I understand that I have to do the previous step for every tissue independently. Therefore I would be generating 96 SJ.out.tab files.
For the re-generation of the genome I have to merge all the SJ.out.tab files and filter the junctions. However I am really confused about which filters to use for this step. Any recommendations?
STAR --runMode genomeGenerate --genomeDir GenomeDir2 --genomeFastaFiles ReferenceGenome/genome.fna --sjdbGTFfile ReferenceGenome/Annotations.gff --sjdbFileChrStartEnd *SJ.out.tab --outSJFilterReads All --sjdbOverhang 64 --limitGenomeGenerateRAM 140000000000 --runThreadN 8
STAR --genomeDir GenomeDir2 --readFilesCommand zcat --readFilesIn Forelle/$R1 Forelle/$R2 --outFileNamePrefix ${sample}_2pass_ --sjdbOverhang 64 --outFilterMismatchNmax 3 --runThreadN 8
For the latest STAR release, the recommended 2-pass method is to pass all junctions files separately, not to concatenate them: