Dear Biostars, Hi,
I have the RNA-seq data of a fish (3 cond1 and 3 cond2 as biological replicates) and I have done Trinity de novo assembly and DEG analysis on these data. Now the draft genome of that species have released. I want to run a genome-guided DEG analysis, too, to compare the results.
At the first step, I have indexed my genome:
./hisat2-build -p 6 '/home/salmon-genome-2018/GCF_SSa_v1.0_genomic.fna' ht2_base_salmon_genome
BUT it seems that there is several options/switches I can add to HISAT2 mapping script:
My first script for one of the replicates (C1) was as:
./hisat2 -p 6 -x ht2_base_salmon_genome -1 '/RNA_Seq_Data/C1_clean_left.fq' -2 '/RNA_Seq_Data/C1_clean_right.fq' -S '/RNA_Seq_Data/C1.sam' &> C1.sam.info
and 6 SAM files have been created, But then I found in the StringTie that
"be sure to run HISAT2 with the
--dtaoption for alignment, or your results will suffer."
I have asked here and @Vijay Lakhujani believed that using
--dta is a better idea.
Then I used this script and re-run all 6 mapping, again:
./hisat2 -p 6 -x --dta ht2_base_salmon_genome -1 '/RNA_Seq_Data/C1_clean_left.fq' -2 '/RNA_Seq_Data/C1_clean_right.fq' -S '/RNA_Seq_Data/C1.sam' &> C1.sam.info
Now, there is another comment/hint in StringTie manual as:
It is highly recommended to use the reference annotation information when mapping the reads, which can be either embedded in the genome index (built with the --ss and --exon options, see HISAT2 manual), or provided separately at run time (using the --known-splicesite-infile option of HISAT2).
Q: What is the standard/preferred script for HISAT2 program for mapping? What must I do now? re-run all 6 mapping adding
--ss and --exon to my previous script? How I can find splice site information of this newly released genome?