Question: STAR-2 PASS raw read mapping for SNP calling using GATK
0
gravatar for Bioinfonext
2.5 years ago by
Bioinfonext160
Korea
Bioinfonext160 wrote:

I want to use STAR 2-pass alignment steps for SNP detection in RNAseq data:

But I am getting very confused, I using STAR 2.5.3a version:

I can understand that there 4 steps need to perform in STAR 2- pass mapping.

1) 1st Genome generator

2) ButI can't able to understand how to run 1st pass aligner for all sample together or separately.

3) Genome generator again.

4) After 1st pass aligner how to specify all tab files in 2nd aligner, what should be the parameter to filter the SJ.out. tab files need to be considered? how to prefix the SJ.out.tab with different name?

Command line which I am using to perform all four steps:

1) 1st Genome Generator

/STAR --runThreadN 6 --runMode genomeGenerate --genomeDir /data/SNU_work/genome --sjdbGTFfile CDS_123.cds.gtf --genomeFastaFiles Rs.R1_R9.fasta

1st read mapping

2)

 /home/yog/software/STAR-2.5.3a/source/STAR --genomeDir /data/SNU_work/genome --readFilesIn 216_R1.fq 216_R2.fq --runThreadN 6

2nd Genome generator:

3)

/STAR --runThreadN 6 --runMode genomeGenerate --genomeDir /data/SNU_work/genome --sjdbOverhang 124 --sjdbFileChrStartEnd /data/SNU_work/SJ.out.tab --genomeFastaFiles Rs.R1_R9.fasta`

Now here I am confused how to generator all Sj.out.tab altogether or should generator one by one but how to mention different name according to RNAseq library?

4) again star aligner

Please look into command line also and suggest if I am making all correct or not

I want to ask you that If I have many samples, How I can create one common set of novel junctions for all samples by merging them. Then you generate a new genome using annotated junctions and the common set of novel junctions, and re-run all the samples with this new genome - this would be the 2-nd pass

rna-seq • 2.2k views
ADD COMMENTlink modified 2.5 years ago by Devon Ryan92k • written 2.5 years ago by Bioinfonext160
4
gravatar for Devon Ryan
2.5 years ago by
Devon Ryan92k
Freiburg, Germany
Devon Ryan92k wrote:

Have a look at section 8.1 in the STAR documentation, it covers exactly this.

In short, you don't need to run genomeGenerate again or merge the splice junction files, just run 1-pass on each sample and then map them each again with --sjdbFileChrStartEnd sample1.tab sample2.tab ... where sampleX.tab is the splice junction file produced per sample.

ADD COMMENTlink written 2.5 years ago by Devon Ryan92k

Note for OP: you don't have to explicitly name each .tab file in your command, you can use a wildcard such as --sjdbFileChrStartEnd sample*.tab

ADD REPLYlink modified 2.5 years ago • written 2.5 years ago by WouterDeCoster41k

Hi,

My SJ.out.tab file name for multiple sample is like this:

216_5W_CaSJ.out.tab

216_7W_CaSJ.out.tab

218_5W_CaSJ.out.tab

216_7W_CaSJ.out.tab

216_7W_CoSJ.out.tab

216_7W_CoSJ.out.tab

will it work if I use --sjdbFileChrStartEnd   *.out.tab
ADD REPLYlink modified 2.5 years ago • written 2.5 years ago by Bioinfonext160
1

That looks fine to me, I expect it to work. (Except if you have other files in the same directory matching *.out.tab). To make sure you can use:

ls *.out.tab

To check if the files you want are matched (and no files you don't want).
You can easily check if the number of files is correct using

ls *.out.tab | wc -l
ADD REPLYlink written 2.5 years ago by WouterDeCoster41k

Thank a lot WouterDeCoster! I will try it once the mapping is complete for all file.

ADD REPLYlink written 2.5 years ago by Bioinfonext160
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1799 users visited in the last hour