STAR_2PASS for SNP calling from RNA seq data
0
0
Entering edit mode
9.5 years ago
thjnant ▴ 160

Hello,

I am going through the STAR_2PASS of the GATK pipeline to get SNPs out of RNA-seq data.

I have run the first round of alignment for my 6 samples, now I am in the second round that I must run this command:

genomeDir=/path/to/hg19_2pass
mkdir $genomeDir
STAR --runMode genomeGenerate --genomeDir $genomeDir --genomeFastaFiles hg19.fa \
    --sjdbFileChrStartEnd /path/to/1pass/SJ.out.tab --sjdbOverhang 75 --runThreadN <n>

For this option:

--sjdbFileChrStartEnd /path/to/1pass/SJ.out.tab

Should I use the SJ.out.tab file of only one of my samples and use that for others or should I use the one for each sample?

Thanks in advance

RNA-Seq star • 3.0k views
ADD COMMENT
0
Entering edit mode

I would think that you'd get the best results from merging the tab files and then using the result.

ADD REPLY
1
Entering edit mode

Or by running STAR on a large subset of your entire dataset (FASTQ files from multiple representative (or all) samples) on the first-pass.

ADD REPLY
0
Entering edit mode

Yup and that'd probably be a bit faster since you don't need all of the instances to run to completion. Do you happen to know if anyone's looked for an optimal subset percentage? While the real value will vary, I expect there's a decent ball-park starting place to be found (perhaps as a function of total number of reads).

ADD REPLY
0
Entering edit mode

If you believe the old RUM paper, perhaps 40-100M reads will get you the vast, vast majority of splice junctions that are available in a dataset. One can always test by simply staging the analysis. Run 5%, 10%, 15%, etc. to see where the return plateaus, but that is probably overkill.

ADD REPLY
0
Entering edit mode

The rarefaction curve route would end up taking as long as just processing everything at once (well, unless you really had a LOT of samples). 40-100M reads seems reasonable.

ADD REPLY

Login before adding your answer.

Traffic: 3230 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6