Question

2-pass mapping with STAR

0

Entering edit mode

6.2 years ago

Lidia • 0

Hello everyone! I am new in the use of this tool and therefore I have some questions. I am working with RNA paired end read sequences. My goal is to find SNPs in 2 strains of trout to be able to compare them, from each strain I have 8 fish sampled, each fish with 6 different tissues. I already went through the STAR manual but some questions came up. My steps are the following:

STAR --runMode genomeGenerate --genomeDir GenomeDir –genomeFastaFiles ReferenceGenome/genome.fna --sjdbGTFfile ReferenceGenome/Annotations.gff --limitGenomeGenerateRAM 140000000000 --runThreadN 8

Since my annotations file is .gff, should I be using –sjdbGTFtagExonParentTranscript instead of -- sjdbGTFfile?

STAR --runMode alignReads --genomeDir GenomeDir --readFilesCommand zcat --readFilesIn Forelle/${R1} Forelle/${R2} --outFileNamePrefix ${sample}_ --outFilterMismatchNmax 3 --runThreadN 8

I understand that I have to do the previous step for every tissue independently. Therefore I would be generating 96 SJ.out.tab files.

For the re-generation of the genome I have to merge all the SJ.out.tab files and filter the junctions. However I am really confused about which filters to use for this step. Any recommendations?

STAR --runMode genomeGenerate --genomeDir GenomeDir2 --genomeFastaFiles ReferenceGenome/genome.fna --sjdbGTFfile ReferenceGenome/Annotations.gff --sjdbFileChrStartEnd *SJ.out.tab --outSJFilterReads All --sjdbOverhang 64 --limitGenomeGenerateRAM 140000000000 --runThreadN 8 

STAR --genomeDir GenomeDir2 --readFilesCommand zcat --readFilesIn Forelle/$R1 Forelle/$R2 --outFileNamePrefix ${sample}_2pass_ --sjdbOverhang 64 --outFilterMismatchNmax 3 --runThreadN 8

RNA-Seq STAR spliced junctions • 7.2k views

ADD COMMENT • link updated 6.2 years ago by Santosh Anand 5.7k • written 6.2 years ago by Lidia • 0

2

Entering edit mode

For the re-generation of the genome I have to merge all the SJ.out.tab files and filter the junctions.

For the latest STAR release, the recommended 2-pass method is to pass all junctions files separately, not to concatenate them:

Run 2nd mapping pass for all samples , listing SJ.out.tab files from all samples in

--sjdbFileChrStartEnd /path/to/sj1.tab /path/to/sj2.tab ...

ADD REPLY • link 6.2 years ago by h.mon 35k

score 2 · Answer 1 · 2018-01-26

SINCE MY ANNOTATIONS FILE IS .gff, SHOULD I BE USING --sjdbGTFtagExonParentTranscript INSTEAD OF --sjdbGTFfile?

Yes. But if it creates a problem, try converting gff to gtf. See https://groups.google.com/forum/#!searchin/rna-star/grant%7Csort:date/rna-star/yl6JRltAuG4/bbuKHQM4AgAJ

HOWEVER I AM REALLY CONFUSED ABOUT WHICH FILTERS TO USE FOR THIS STEP. ANY RECOMMENDATIONS?

The Filters were used in old protocol when all the SJ.tabs were merged in a single file. From old STAR manual manual v2.4.0.1

Collect all junctions detected in the 1st pass by merging SJ.out.tab files from all runs. Filter the junctions by removing likelie false positives, e.g. junctions in the mitochondrion genome, or non-canonical junctions supported by a few reads. If you are using annotations, only novel junctions need to be considered here, since annotated junctions will be re-used in the 2nd pass anyway

The newer version suggests listing SJ.out.tab files from all samples in --sjdbFileChrStartEnd /path/to/sj1.tab /path/to/sj2.tab

(see section 8 of the latest STAR manual and @h.mon comment above)